首页> 外文会议>;42nd Annual Meeting of the Association for Computational Linguistics >A Preliminary Study on Probabilistic Models for Chinese Abbreviations
【24h】

A Preliminary Study on Probabilistic Models for Chinese Abbreviations

机译:汉语缩略词概率模型的初步研究

获取原文

摘要

Chinese abbreviations are widely used inthe modern Chinese texts. They are aspecial form of unknown words, includingmany named entities. This results indifficulty for correct Chinese processing.In this study, the Chinese abbreviationproblem is regarded as an error recoveryproblem in which the suspect root wordsare the "errors" to be recovered from a setof candidates. Such a problem is mappedto an HMM-based generation model forboth abbreviation identification and rootword recovery, and is integrated as part ofa unified word segmentation model whenthe input extends to a complete sentence.Two major experiments are conducted totest the abbreviation models. In the firstexperiment, an attempt is made to guessthe abbreviations of the root words. Anaccuracy rate of 72% is observed. Incontrast, a second experiment isconducted to guess the root words fromabbreviations. Some submodels couldachieve as high as 51% accuracy with thesimple HMM-based model. Somequantitative observations against heuristicabbreviation knowledge about Chineseare also observed.
机译:中文缩写广泛用于 现代中文文本。他们是一个 未知词的特殊形式,包括 许多命名实体。这导致 正确中文处理的困难。 在本研究中,中文缩写 问题被视为错误恢复 怀疑词根词的问题 是从集合中恢复的“错误” 的候选人。这样的问题已经映射 基于HMM的生成模型 缩写标识和根 单词恢复,并作为一部分集成 一个统一的分词模型 输入扩展为完整的句子。 进行了两个主要实验 测试缩写模型。在第一 实验,尝试猜测 词根的缩写。一个 观察到准确率为72%。在 相比之下,第二个实验是 进行猜测词根 缩写。一些子模型可以 达到51%的精度 简单的基于HMM的模型。一些 反对启发式的定量观察 关于中文的缩写知识 也观察到。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号