In spoken dialog systems for humanoid robots, smooth turn-taking function is one of the most important factors to realize natural interaction with users. Speech collisions often occur when a user and the dialog system speak simultaneously. This study presents a method to generate fillers at the beginning of the system utterances to indicate an intention of turn-taking or turn-holding just like human conversations. To this end, we analyzed the relationship between a dialog context and fillers observed in a human-robot interaction corpus, where a user talks with a humanoid robot remotely operated by a human. At first, we annotated dialog act tags in the dialog corpus and analyzed the typical type of a sequential pair of dialog acts, called a DA pair. It is found that the typical filler forms and their occurrence patterns are different according to the DA pairs. Then, we build a machine learning model to predict occurrence of fillers and its appropriate form from linguistic and prosodic features extracted from the preceding and the following utterances. The experimental results show that the effective feature set also depends on the type of DA pair.
展开▼