...
首页> 外文期刊>Information Technology Journal >Unicode Aided Language Identification across Multiple Scripts and Heterogeneous Data
【24h】

Unicode Aided Language Identification across Multiple Scripts and Heterogeneous Data

机译:跨多个脚本和异构数据的Unicode辅助语言识别

获取原文

摘要

With growing explosion of multi-lingual data on the Internet and other informational and communicational fields, the requirement of having effective automated language identifiers has increased further. More information finds its way into the computer systems and the web and using manual methods to categorize the information is becoming increasingly in-feasible. In this study we discuss improvements we have achieved in existing language identification methods. Couple of new areas that were not explored before is the inclusion of non-Roman scripts and active usage of Unicode information about scripts to enhance the language detection process.
机译:随着因特网以及其他信息和通信领域中多语言数据的爆炸式增长,具有有效的自动语言标识符的需求进一步增加。更多信息进入计算机系统和Web,并且使用手动方法对信息进行分类变得越来越不可行。在这项研究中,我们讨论了在现有语言识别方法上已经取得的进步。以前未探索的几个新领域是包含非罗马脚本以及积极使用有关脚本的Unicode信息以增强语言检测过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号