A novel probabilistic framework is proposed for inferring gaze patterns and the structure of conversation in face-to-face multiparty communication, based on head directions and the presence/absence of utterances of participants. First, we define three classes of conversational regimes, which are characterized by the topology of the gaze pattern; we assume that they indicate the structure of the conversation, i.e. who is talking to whom. Next, the problem is formulated as joint estimation of both regime state from the gaze pattern and utterance, and the gaze pattern from head directions. We then devise a dynamic Bayesian network, called the Markov-switching model. The regime changes over time are based on Markov transitions, and controls the dynamics of the gaze patterns and utterances. Furthermore, Bayesian estimation of regime, gaze pattern, and model parameters are implemented using a Markov chain Monte Carlo method. Experiments on four-person conversations confirm accurate gaze estimation and the effectiveness of the framework toward identification of the conversation structures.
展开▼