首页> 外文期刊>International Journal of Computer Processing of Oriental Languages >Automatic Expansion of Abbreviations in Chinese News Text: A Hybrid Approach
【24h】

Automatic Expansion of Abbreviations in Chinese News Text: A Hybrid Approach

机译:中文新闻文字缩写的自动扩展:一种混合方法

获取原文
获取原文并翻译 | 示例
           

摘要

Chinese news texts often contain a number of abbreviations without explicitly defining their full-forms. Therefore, expanding abbreviations to their original full-forms plays an important role in improving accuracy of the information extraction and retrieval systems for Chinese. In this paper, we present a hybrid approach to automatic expansion of abbreviations in Chinese news texts. Generally, Chinese abbreviations are produced from their original full-forms via reduction, elimination or generalization. To ensure every abbreviation can successfully be expanded, each abbreviation under expansion is assumed to be created by these three methods, respectively. Based on this assumption, a mapping table between shortened words and their matrix words, and a dictionary of short-form/full-form pairs are used to generate all possible expansions for abbreviations. For an ambiguous abbreviation with mutiple expansion candidates, then hidden Markov models are employed to rank all its expansion candidates and select a proper one with the maximum score. In order to further improve expansion performance, some linguistic knowledge like discourse information and abbreviation patterns are utilized to correct possible expansion errors. Evaluation on an abbreviation-expanded corpus built from the Peking University Corpus showed that our approach can achieve 86.3% and 83.8% on average in precision and recall respectively for various types of abbreviations in Chinese news texts.
机译:中文新闻文本通常包含许多缩写,但没有明确定义其完整形式。因此,将缩写词扩展到其原始形式对于提高中文信息提取和检索系统的准确性起着重要作用。在本文中,我们提出了一种混合方法来自动扩展中文新闻文本中的缩写。通常,中文缩写是通过减少,消除或泛化从其原始完整格式中产生的。为了确保每个缩写都可以成功扩展,假设扩展中的每个缩写分别由这三种方法创建。基于此假设,缩短的单词和它们的矩阵单词之间的映射表以及简短/完整形式对的字典用于生成所有可能的缩写扩展。对于具有多个扩展候选词的不明确缩写,则使用隐藏的马尔可夫模型对所有扩展候选词进行排名,并选择一个具有最大分数的合适候选词。为了进一步提高扩展性能,一些语言知识(如话语信息和缩写模式)用于纠正可能的扩展错误。对北京大学语料库构建的缩略语料库的评估表明,对于中文新闻文本中各种类型的缩略语,我们的方法在准确度和查全率上分别平均可以达到86.3%和83.8%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号