Investigating Joint Attention Mechanisms through Spoken Human-Robot Interaction
Cognition Volume 120, Number 2, ISSN 0010-0277
Referential gaze during situated language production and comprehension is tightly coupled with the unfolding speech stream (Griffin, 2001; Meyer, Sleiderink, & Levelt, 1998; Tanenhaus, Spivey-Knowlton, Eberhard, & Sedivy, 1995). In a shared environment, utterance comprehension may further be facilitated when the listener can exploit the speaker's focus of (visual) attention to anticipate, ground, and disambiguate spoken references. To investigate the dynamics of such gaze-following and its influence on utterance comprehension in a controlled manner, we use a human-robot interaction setting. Specifically, we hypothesize that referential gaze is interpreted as a cue to the speaker's referential intentions which facilitates or disrupts reference resolution. Moreover, the use of a dynamic and yet extremely controlled gaze cue enables us to shed light on the simultaneous and incremental integration of the unfolding speech and gaze movement. We report evidence from two eye-tracking experiments in which participants saw videos of a robot looking at and describing objects in a scene. The results reveal a quantified benefit-disruption spectrum of gaze on utterance comprehension and, further, show that gaze is used, even during the initial movement phase, to restrict the spatial domain of potential referents. These findings more broadly suggest that people treat artificial agents similar to human agents and, thus, validate such a setting for further explorations of joint attention mechanisms. (Contains 11 figures and 7 tables.)
Staudte, M. & Crocker, M.W. (2011). Investigating Joint Attention Mechanisms through Spoken Human-Robot Interaction. Cognition, 120(2), 268-291.