首页> 外文会议>International conference on Asian language processing >Isolated digit filipino speech recognition through spectrogram image classification: Towards application in a disaster preparedness participatory toolkit
【24h】

Isolated digit filipino speech recognition through spectrogram image classification: Towards application in a disaster preparedness participatory toolkit

机译:通过频谱图图像分类实现的孤立数位菲律宾语音识别:在防灾参与工具包中的应用

获取原文

摘要

In this paper, we present our work on isolated digit speech recognition: by classifying spectrogram images and for use in a disaster preparedness participatory toolkit. To achieve higher inclusivity, we included a voice component for a wider coverage of respondents especially those who have low literacy and those vision impaired individuals. Our methodology is through speech recognition which is a deviation from usual approaches which normally work on acoustic coefficients and features. As our initial test bed, we focused on the Filipino language - a member of the Malayo-Polynesian language family and is the national language in the Philippines. Our data covers 4,297 utterances of the Filipino digits 0 to 9 collected from 262 speakers, and divided the data into 3 parts: 70% for training, 20% for testing, and 10% for validation. We applied short-time Fourier transform on our training data and we used convolution neural networks in MatLab to classify the spectrogram images. The lowest accuracy rate during our tests is 93.02%. Analyses of the results show that background noises are the cause of the misclassified utterances which will further discussed on this paper. While the results are promising, the work can be extended to include closely related languages.
机译:在本文中,我们介绍了我们在隔离数字语音识别方面的工作:通过对频谱图图像进行分类并将其用于防灾参与工具包中。为了获得更高的包容性,我们加入了语音组件,以覆盖更广泛的受访者,尤其是那些识字能力低和视力障碍者。我们的方法是通过语音识别,这与通常在声学系数和特征上起作用的常规方法有所不同。作为最初的测试平台,我们重点研究菲律宾语言-马来语-波利尼西亚语家族的成员,并且是菲律宾的国家语言。我们的数据涵盖了从262位演讲者那里收集的4,297菲律宾数字0至9语音,并将数据分为3个部分:70%用于训练,20%用于测试和10%用于验证。我们在训练数据上应用了短时傅立叶变换,并在MatLab中使用了卷积神经网络对光谱图图像进行分类。在我们的测试中,最低的准确率为93.02 \%。结果分析表明,背景噪声是发声错误分类的原因,本文将对此进行进一步讨论。虽然结果令人鼓舞,但可以将工作扩展到包括紧密相关的语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号