An algorithm is presented which uses the F/sub 0/ tracings of a connected-speech utterance as input and performs speaker-independent segmentation into prosodically defined information units. Two global declination lines are computed by the linear regression method, which approximate the trends in time of the peaks (topline) and valleys (baseline) of F/sub 0/ across the utterance. Computation is reiterated every time the Pearson product moment correlation coefficient for these declination lines drops below the present level of acceptability. Segmentation is thus performed without prior knowledge of higher level linguistic information, with the termination of one unit being determined by the general resetting of the intonation contour wherever in the utterance it may occur. The structure of the algorithm is described and its performance evaluated on three medium-sized Swedish texts read by four native speakers of standard Swedish.
展开▼
机译:提出了一种算法,该算法使用已连接语音发声的F / sub 0 /跟踪作为输入,并执行与说话者无关的分割,将其分割为语音定义的信息单元。通过线性回归方法计算了两条全局磁偏角线,它们近似于整个发声期间F / sub 0 /的峰(顶线)和谷(基线)的时间趋势。每当这些偏角线的皮尔逊乘积矩相关系数下降到当前的可接受水平以下时,都会重复计算。因此,在没有高级语言信息的先验知识的情况下进行分段,通过语调轮廓的一般重置来确定一个单元的终止,无论该语调可能在发声中的任何位置发生。描述了该算法的结构,并在由四个标准瑞典语母语者阅读的三个中型瑞典语文本上评估了该算法的性能。
展开▼