首页> 外文会议>UKSim-AMSS European Modelling Symposium on Computer Modelling and Simulation >Reintroducing KAPD as a Dataset for Machine Learning and Data Mining Applications
【24h】

Reintroducing KAPD as a Dataset for Machine Learning and Data Mining Applications

机译:重新引入KAPD作为计算机学习和数据挖掘应用的数据集

获取原文

摘要

KACST Arabic Phonetic Database (KAPD) has been in use by researchers for around fifteen years since its initial release. Researches in acoustics and phonetics have benefited from its phonetically rich content. In fact, KAPD has the potential to go further steps with the research community. In this work, KAPD is subject to enhancements and improvements in order to serve as dataset for machine learning and data mining application. This work involves refining and reviewing the already existing metadata of KAPD and adding new material that are necessary for machine learning and data mining applications. The updated phoneme statistics after the corpus upgrade are presented from different perspectives. Data format and time units are made compatible with those of HTK. The paper discusses the potential of KAPD to serve as either a balanced or an imbalanced dataset.
机译:KACST阿拉伯语音数据库(KAPD)已被研究人员使用,自最初发布以来大约十五年。声学和语音学的研究从其语音富含含量中受益。事实上,KAPD有可能与研究界进行进一步的步骤。在这项工作中,KAPD旨在提高和改进,以便成为机器学习和数据挖掘应用的数据集。这项工作涉及炼油和审查已现有的KAPD元数据并添加机器学习和数据挖掘应用所需的新材料。语料库升级后更新的音素统计信息从不同的角度出现。数据格式和时间单位与HTK的数据格式和时间单位兼容。本文讨论了KAPD作为平衡或不平衡数据集的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号