首页> 外文会议>International Conference on Computer Processing of Oriental Languages >A Hybrid Approach to Chinese Abbreviation Expansion
【24h】

A Hybrid Approach to Chinese Abbreviation Expansion

机译:杂交方法缩写扩张

获取原文

摘要

This paper presents a hybrid approach to Chinese abbreviation expansion. In this study, each short-form in Chinese text is assumed to be created by the method of reduction and the method of elimination or generalization, respectively. A mapping table between short words and long words and a dictionary of non-reduced short-form/full-form pairs are thus applied to generate the respective expansion candidates. Then, a hidden Markov model (HMM) based disambiguation is employed to rank these candidates and select a proper expansion for each ambiguous abbreviation. In order to improve expansion accuracy, some linguistic knowledge like discourse information and abbreviation patterns are further employed to double-check the expanded results and revise some error expansions if any. The proposed approach was evaluated on an abbreviation-expanded corpus built from the Peking University Corpus. The results showed that a recall of 83.8% and a precision of 86.3% can be achieved on average for different types of Chinese abbreviations.
机译:本文提出了一种杂交方法,缩写扩张。在本研究中,假设中文文本中的每条短文分别由减少方法和消除或泛化方法产生。因此,应用短词和长词之间的映射表以及非减少的短窗体/全形对词典以产生相应的扩展候选。然后,采用基于隐马尔可夫模型(HMM)的消歧来对这些候选者进行排名,并为每个模糊的缩写选择适当的扩展。为了提高扩展准确性,还采用了一些语言信息和缩写模式的语言知识来仔细检查扩展结果,并在任何情况下修改一些错误扩展。拟议的方法是对从北京大学语料库建造的缩写扩展的语料库中进行了评估。结果表明,对于不同类型的中国缩写,可以实现83.8%的召回量为83.8%和86.3%的精确度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号