首页> 外文会议>IEEE Student Conference on Research and Development >On the Identification of FOSD-based Non-zero Onset Speech Dataset
【24h】

On the Identification of FOSD-based Non-zero Onset Speech Dataset

机译:基于FOSD的非零发作语音数据集的识别

获取原文

摘要

Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and testing must be aligned. This is to ensure the developed engines achieve the desired conversion quality. In order to align speech and text, an audio alignment tool should be used. In such tools, often onset detection algorithms are utilized for labeling the audio file’s speech start and end times. This information is then stored together with the file’s transcript. In this work, an open nonzero onset Vietnamese speech dataset is provided. This dataset contains 348 audio files filtered from over 25,000 (approximately 30-hours) Vietnamese speech records released publicly by FPT Corporation, Vietnam in 2018. This amount of labeled data is considered to be more than sufficient for a typical onset detection algorithm researches.
机译:语音机器人和聊天机器人应用程序开发的最新趋势已使语音到文本(STT)和文本到语音(TTS)生成技术的利用成为可能。为了开发这样的TTS或STT引擎,文本和用于训练,验证和测试的音频文件中的相应录制语音必须对齐。这是为了确保开发的发动机达到所需的转换质量。为了对齐语音和文本,应使用音频对齐工具。在此类工具中,通常使用起步检测算法来标记音频文件的语音开始和结束时间。然后,此信息将与文件的成绩单一起存储。在这项工作中,提供了一个开放的非零开始越南语语音数据集。该数据集包含348个音频文件,这些音频文件是从越南FPT公司于2018年公开发布的25,000多个(约30小时)越南语音记录中过滤掉的。被标记的数据量被认为足以进行典型的发作检测算法研究。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号