首页> 外文期刊>ACM transactions on Asian language information processing >Toward a Professional Platform for Chinese Character Conversion
【24h】

Toward a Professional Platform for Chinese Character Conversion

机译:建立专业的汉字转换平台

获取原文
获取原文并翻译 | 示例
       

摘要

Increasing communication among Chinese-speaking regions using respectively traditional and simplified Chinese character systems has highlighted the subtle-yet-extensive differences between the two systems, which can lead to unexpected hindrance in converting characters from one to the other. This article proposes a new priority-based multi-data resources management model, with a new algorithm called Fused Conversion algorithm from Multi-Data resources (FCMD), to ensure more context-sensitive, human controllable, and thus more reliable conversions, by drawing on reverse maximum matching, n-gram-based statistical model and pattern-based learning and matching. After parameter training on the Tagged Chinese Giga-word corpus, its conversion precision reaches 91.5% in context-sensitive cases, the most difficult part in the conversion, with an overall precision rate at 99.8%, a significant improvement over the state-of-the-art models. The conversion platform based on the model has extra features such as data resource selection and n-grams self-learning ability, providing a more sophisticated tool good especially for high-end professional uses.
机译:分别使用繁体和简体汉字系统的华语地区之间的交流日益增多,突显了这两种系统之间的微妙但广泛的差异,这可能导致在将字符从一种转换为另一种方面出乎意料的障碍。本文提出了一种新的基于优先级的多数据资源管理模型,该模型具有一种称为多数据资源(FCMD)的融合转换算法的新算法,可通过绘制来确保上下文更敏感,人为控制并因此获得更可靠的转换反向最大匹配,基于n-gram的统计模型和基于模式的学习与匹配。在对标记的中文千兆字语料库进行参数训练之后,在上下文相关的情况下,其转换精度达到91.5%,这是转换中最困难的部分,总精度达到99.8%,比状态状态显着提高最先进的模型。基于该模型的转换平台具有额外的功能,例如数据资源选择和n-grams自学习能力,从而提供了更复杂的工具,特别适合高端专业用途。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号