首页> 外文期刊>Circuits, systems, and signal processing >A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features
【24h】

A Pre-classification-Based Language Identification for Northeast Indian Languages Using Prosody and Spectral Features

机译:基于韵律和谱特征的东北印度语言基于分类的语言识别

获取原文
获取原文并翻译 | 示例

摘要

This paper is aimed at developing a two-stage language identification (LID) system for Northeast Indian languages. In the first stage, languages are pre-classified into tonal and non-tonal categories, and in the second stage, individual languages are identified from languages of the corresponding category. In this work, new parameters to model the prosodic characteristics of the speech signal have been proposed for pre-classification as well as individual language identification. Also, the effectiveness of spectral features, namely Mel-frequency cepstral coefficient (MFCC) and their combination with prosodic features, has been studied for pre-classification task. The usefulness of MFCC with their delta and acceleration coefficients in combination with prosodic features has been investigated for individual language identification. The performance of the system is analyzed for the features extracted of different analysis units, such as syllable, disyllable, word, and utterance. Comparative performance analysis of three different classifiers, namely artificial neural network (ANN), Gaussian mixture model-Universal background model (GMM-UBM), and i-vector based support vector machine (i-vector based SVM), has been made for pre-classification as well as individual language identification. A new database, NIT Silchar language database (NITS-LD), has been developed for seven NE Indian languages using All India Radio broadcast news. The experimental analysis suggests that the parameters proposed to represent the prosodic characteristics help to improve the performance of both the stages and show improvements over existing parameters by as much as 7.4%, 11.9%, and 9.1% for 30 s, 10 s, and 3 s test data, respectively, in the pre-classification stage. Of the baseline single-stage systems, GMM-UBM provides the highest accuracies of 80%, 76.8%, and 72% for 30 s, 10 s, and 3 s test data, respectively. In the proposed system, the combination of the ANN model in pre-classification stage and the GMM-UBM model in individual language identification stage provides the highest accuracies, and it shows the improvements over the baseline system by 7.2%, 7%, and 4.9% for 30 s, 10 s, and 3 s test data. For OGI-Multilingual (OGI-MLTS) database, improvements of 8.1%, 7.4%, and 5.7% for 30 s, 10 s, and 3 s test data, respectively, are observed over the baseline LID system.
机译:本文旨在为东北印度语言开发两阶段语言识别(LID)系统。在第一阶段,将语言预先分类为音调和非声调类别,在第二阶段,从相应类别的语言中识别单个语言。在这项工作中,已经提出了用于对语音信号的韵律特性进行建模的新参数,以便进行预分类以及对单个语言进行识别。此外,已经针对预分类任务研究了频谱特征(即梅尔频率倒谱系数(MFCC)及其与韵律特征的组合)的有效性。已经研究了MFCC及其增量和加速系数以及韵律特征的组合,可用于识别单个语言。针对从不同分析单元提取的特征(例如音节,双音节,单词和话语)分析系统的性能。预先对人工神经网络(ANN),高斯混合模型-通用背景模型(GMM-UBM)和基于i向量的支持向量机(基于i向量的SVM)这三种不同的分类器进行了比较性能分析。 -分类以及个人语言识别。利用全印度广播电台的广播新闻,已经为七种东北印度语言开发了一个新的数据库,即NIT Silchar语言数据库(NITS-LD)。实验分析表明,建议用来代表韵律特征的参数有助于改善两个阶段的性能,并且在30 s,10 s和3的情况下,与现有参数相比,分别提高了7.4%,11.9%和9.1%。的测试数据分别处于预分类阶段。在基准单级系统中,GMM-UBM在30 s,10 s和3 s的测试数据中分别提供了80%,76.8%和72%的最高准确度。在所提出的系统中,预分类阶段的ANN模型和单个语言识别阶段的GMM-UBM模型的组合提供了最高的准确性,与基准系统相比,改进了7.2%,7%和4.9。 %表示30 s,10 s和3 s测试数据。对于OGI多语言(OGI-MLTS)数据库,在基准LID系统上,分别在30 s,10 s和3 s的测试数据上分别观察到8.1%,7.4%和5.7%的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号