首页> 外文会议>LREC-2012 >Development of Text And Speech Database For Hindi And Indian English Specific To Mobile Communication Environment
【24h】

Development of Text And Speech Database For Hindi And Indian English Specific To Mobile Communication Environment

机译:开发后印度和印度英语的文本和语音数据库特定于移动通信环境

获取原文

摘要

This paper describes the method and experiences of text and speech data collection in mobile communication in Indian English Hindi. The primary data collection is done in the form of large number of messages as part of Personal communication among natives of Hindi language and Indian speakers of English. To gather the versatility of mobile communication database among Hindi and English, 12 domains were identified for collection of text corpus from speaking population belonging to deferent age groups, sex and dialects. The text obtained in raw form based on slangs and unconventional grammar were cleaned using on language grammar rules and then tagged and expanded to explain context specific meaning of the words. Texts of 1163 participants from Hindi speaking regions and 1405 English, users were taken for creating 13 prompt sheets; containing 630 phonetically rich sentences created using a special software. Each prompt sheet was recorded by at least 7 users simultaneously in three channels and recorded by a total of 100 speakers and annotated. The work is a step forward in the direction of development of standards for mobile text and speech data collection for Indian languages.
机译:本文介绍了印度英语印地语中移动通信中文本和语音数据收集的方法和经验。主要数据收集是以大量消息的形式完成,作为印地语语言和印度扬声器英语的当地人之间的个人沟通的一部分。为了聚集印地文和英语中的移动通信数据库的多功能性,识别出12个域名,用于收集文本语料库,从呼气为蔡而龄群,性别和方言。使用语言语法规则清除基于SLANG和非常规语法的原始形式获得的文本,然后标记并扩展以解释语言的语明特定含义。 1163名从印地语讲台和1405名英语的参与者的文本,用于创建13个提示表;包含使用特殊软件创建的630句话。每个提示表在三个通道中同时记录至少7个用户,并记录了100个扬声器并注释。这项工作是在印度语言的移动文本和语音数据收集标准的发展方向前进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号