Utterance durations just before and after a pause have been considered to be the only factors affecting pause duration (Preboundary and Postboundary Effects). Recently, using an “XY utterance phrase” composed of two words, we discovered that the ratio of the two utterance durations before and after a pause affects pause duration (Pre-postboundary effect). However, it is not obvious whether such effects are useful for speech processing applications. In this research, we developed a generative model of pause duration based on multiple regression analysis from our experimental data (Primal Model), and derived two additional models with different parameters. Furthermore, we evaluated them, comparing them to a model whose pause duration is constant (Constant Model). The result was that the subjects' impressions, such as “natural,” “like,” and “familiar,” of the Primal Model were more positive than those of the Constant Model. Moreover, when compared with the two additional pause duration models, the Primal Model gave the best results. From these results, we discuss the validity of the Primal Model and the relationship between the parameters and the subjective evaluation.
展开▼