首页> 外文会议>International Conference on speech and computer >First Insight into the Processing of the Language Consulting Center Data
【24h】

First Insight into the Processing of the Language Consulting Center Data

机译:对语言咨询中心数据处理的初步了解

获取原文

摘要

In this paper, we describe the initial stages of the project "Access to a Linguistically Structured Database of Enquiries from the Language Consulting Center". This project is attempting to provide an improved access to the large archives of mainly telephone conversations collected continuously by the Institute of the Czech Language. The main goal is to open up the unique Czech data acquired from the queries to the Language Consulting Center and to build the semi-automatic system that will facilitate searching and categorizing of these queries. For this purpose, the Automatic Speech Recognizer (ASR) and the language processing methods are being designed. The vocabulary used in such queries contains many unusual words unlike the common speech (e.g. linguistic terms). In order to train the ASR system, it is necessary to manually transcribe a large amount of speech data, identify the appropriate vocabulary, and obtain relevant text for language modeling purposes. In this paper, the proposed telephone system for recording the new data and the baseline speech recognition on these data is described. The first experiments with the topic detection on these data aimed at discovering what can be found in them and also how to preprocess them is also described.
机译:在本文中,我们描述了“从语言咨询中心访问语言结构的查询数据库”项目的初始阶段。该项目试图使人们更方便地访问捷克语协会不断收集的主要是电话交谈的大型档案。主要目标是向语言咨询中心开放从查询中获取的唯一捷克语数据,并建立半自动系统,以方便对这些查询进行搜索和分类。为此,正在设计自动语音识别器(ASR)和语言处理方法。在此类查询中使用的词汇表包含许多不寻常的词,与普通语音(例如语言术语)不同。为了训练ASR系统,有必要手动转录大量语音数据,识别适当的词汇表并获取用于语言建模目的的相关文本。在本文中,描述了建议的电话系统,用于记录新数据和这些数据上的基线语音识别。还介绍了针对这些数据进行主题检测的第一个实验,目的是发现可以在其中找到的内容以及如何对其进行预处理。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号