As dialog systems become ubiquitous, we must learn how to detect when a system is spoken to, and avoid mistaking human-human speech as computer-directed input. In this talk I will discuss approaches to addressee detection in this human-human-machine dialog scenario, based on what is being said (lexical information), how it is being said (acoustic-prosodic properties), and non-speech multimodal and contextual information. I will present experimental results showing that a combination of these cues can be used effectively for human/computer address classification in several dialog scenarios.
展开▼