我们的论文被CHI 2026接收

我们的论文 How Humans Naturally Refer to Targets: Understanding Multimodal Instruction Patterns in Human–Robot Interaction 已被人机交互顶会 CHI 2026 接收。

论文题目
How Humans Naturally Refer to Targets: Understanding Multimodal Instruction Patterns in Human–Robot Interaction
作者
Lesong Jia, Makayla Chang, Na Du
摘要
Current multimodal instruction-recognition algorithms in human-robot interaction, developed largely from a purely technical perspective, remain rigid and incomplete in their use of human communicative cues. Therefore, a full understanding of how humans naturally refer to targets in interaction is central to enabling robots to interpret and act on user instructions. To investigate this, we collected multimodal behavior data from 30 participants who naturally instructed a robot for household tasks while we systematically varied target distance, direction, and local referent complexity. Our results show that speech instructions were often vague and lacked explicit target-position information. To resolve this ambiguity, multimodal cues are essential: gaze direction provides an order-of-magnitude improvement in target-localization accuracy, while hand pointing, head turns, and speech onset offer reliable temporal anchors for identifying target-directed gaze. We also found that speech patterns varied with distance and local referent complexity, whereas multimodal behaviors shifted with target direction, underscoring the need for context-adaptive recognition and interface design.