首页> 外文学位 >Language identification for Instant Chat translation.
【24h】

Language identification for Instant Chat translation.

机译:即时聊天翻译的语言识别。

获取原文
获取原文并翻译 | 示例

摘要

If two users speaking different languages wish to communicate with each other using an internet chat program, a machine translation system must be present, and a means of identifying the languages of both users must be provided for this machine translation system. This thesis presents the Instant Chat Translator system, which fulfills these two requirements in a unified manner. The task is difficult because language identification in a chat environment has three issues not typical of language identification in general: the texts are very short, the channel is noisy, and nonnative character sets are used. The Instant Chat Translator system combines a novel high-quality language identification system with three existing software packages: the D-Bus interprocess communication system, the Pidgin chat system, and the Moses machine translation system. The overall system catches messages received as text input by the chat system, identifies the language of these messages, translates them if necessary, and presents the possibly translated messages to the user. It is presented as a proof-of-concept work to demonstrate the feasibility of providing an instant translator for a chat system. Testing demonstrated that very high levels of identification accuracy are obtained even when dealing with tiny amounts of often noisy input text. An average accuracy of 99.61% was obtained for identifying sentences 10 words in length across 7 languages. For the same 7 languages, the accuracy of identifying the language of individual words was 75%. Another goal of this research was to assess training using text solely from conventional corpora versus a combination of such texts with some from noisy channel environments. Experiments showed that the latter may lead to higher accuracy.
机译:如果说不同语言的两个用户希望使用Internet聊天程序相互通信,则必须提供机器翻译系统,并且必须为此机器翻译系统提供识别两个用户的语言的手段。本文提出了即时聊天翻译系统,该系统以统一的方式满足了这两个要求。这项任务之所以很困难,是因为聊天环境中的语言识别存在三个通常不是语言识别所特有的问题:文本非常短,通道嘈杂,并且使用了非本地字符集。 Instant Chat Translator系统将新颖的高质量语言识别系统与三个现有软件包结合在一起:D-Bus进程间通信系统,Pidgin聊天系统和Moses机器翻译系统。整个系统捕获由聊天系统输入为文本的消息,识别这些消息的语言,如有必要,将其翻译,然后将可能翻译的消息呈现给用户。它以概念验证的形式呈现,以演示为聊天系统提供即时翻译器的可行性。测试表明,即使处理少量经常有噪音的输入文本,也能获得很高的识别精度。识别7种语言长度为10个单词的句子时,平均准确度为99.61%。对于相同的7种语言,识别单个单词的语言的准确性为75%。这项研究的另一个目标是评估仅使用来自常规语料库的文本与将这些文本与嘈杂的渠道环境中的文本相结合的培训。实验表明,后者可能会导致更高的精度。

著录项

  • 作者

    Bailey, Robert Bruce.;

  • 作者单位

    The University of Regina (Canada).;

  • 授予单位 The University of Regina (Canada).;
  • 学科 Artificial Intelligence.;Computer Science.
  • 学位 M.Sc.
  • 年度 2011
  • 页码 87 p.
  • 总页数 87
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号