首页> 外文期刊>Language Resources and Evaluation >Lexicon+TX: rapid construction of a multilingual lexicon with under-resourced languages
【24h】

Lexicon+TX: rapid construction of a multilingual lexicon with under-resourced languages

机译:Lexicon + TX:使用资源不足的语言快速构建多语言词典

获取原文
获取原文并翻译 | 示例
           

摘要

Most efforts at automatically creating multilingual lexicons require input lexical resources with rich content (e.g. semantic networks, domain codes, semantic categories) or large corpora. Such material is often unavailable and difficult to construct for under-resourced languages. In some cases, particularly for some ethnic languages, even unannotated corpora are still in the process of collection. We show how multilingual lexicons with under-resourced languages can be constructed using simple bilingual translation lists, which are more readily available. The prototype multilingual lexicon developed comprise six member languages: English, Malay, Chinese, French, Thai and Iban, the last of which is an under-resourced language in Borneo. Quick evaluations showed that 91.2 % of 500 random multilingual entries in the generated lexicon require minimal or no human correction.
机译:自动创建多语言词典的大多数工作都要求输入的词典资源具有丰富的内容(例如语义网络,域代码,语义类别)或大型语料库。对于资源贫乏的语言,此类材料通常不可用且难以构建。在某些情况下,尤其是对于某些民族语言,即使没有注释的语料库也仍在收集中。我们展示了如何使用简单的双语翻译列表来构建资源不足的语言的多语言词典,而这些列表更容易获得。开发的原型多语言词典包含六种成员语言:英语,马来语,中文,法语,泰语和伊班语,最后一种是婆罗洲资源贫乏的语言。快速评估显示,在生成的词典中,有500个随机多语言条目中的91.2%需要极少或不需要人工纠正。

著录项

  • 来源
    《Language Resources and Evaluation》 |2014年第3期|479-492|共14页
  • 作者单位

    School of Engineering, Science and Technology, KDU College Penang, 32 Jalan Anson, 10400 Georgetown, Penang, Malaysia ,Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100 Cyberjaya, Selangor, Malaysia;

    Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100 Cyberjaya, Selangor, Malaysia;

    Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100 Cyberjaya, Selangor, Malaysia;

    Linton University College, Persiaran UTL, Bandar Universiti Teknologi Legenda, Batu 12, 71700 Mantin, Negeri Sembilan, Malaysia;

    Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Multilingual lexicon; Under-resourced languages; Malay; Iban;

    机译:多语言词典;资源不足的语言;马来语;伊班;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号