In this work, we make a contribution to developing turn-taking mechanism in spoken dialogue systems. We focus on modelling the turn-taking behavior in human-human conversations. The proposed models are tested on the Switchboard corpus which contains conversations annotated at the utterance level. Several experiments were performed to analyze the salience of different features that are associated with the preceding utterances for the task of predicting whether there will be a change in speaker. The impact of the n-gram sequential modelling on turn-taking is studied. Machine learning techniques are also employed to perform this prediction task. Results from the experiments suggest that a combination of the preceding dialogue sequence, previous changes in speaker information and duplicating the sequences by replacing speaker IDs plays an important role in modelling turn-taking. Utterance sequences of length 3 in N-grams resulted in higher predictability for this task. Experiments suggest that a machine learning technique with 4-grams of a combination of all these features is effective for predicting speaker changes.
展开▼