A Unified Model for Solving the OOV Problem of Chinese Word Segmentation

XIAOQING LI; CHENGQING ZONG; KEH-YIH SU

首页> 外文期刊>ACM transactions on Asian language information processing >A Unified Model for Solving the OOV Problem of Chinese Word Segmentation

【24h】

A Unified Model for Solving the OOV Problem of Chinese Word Segmentation

机译：解决中文分词OOV问题的统一模型

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This article proposes a unified, character-based, generative model to incorporate additional resources for solving the out-of-vocabulary (OOV) problem of Chinese word segmentation, within which different types of additional information can be utilized independently in corresponding submodels. This article mainly addresses the following three types of OOV: unseen dictionary words, named entities, and suffix-derived words, none of which are handled well by current approaches. The results show that our approach can effectively improve the performance of the first two types with positive interaction in F-score. Additionally, we also analyze reason that suffix information is not helpful. After integrating the proposed generative model with the corresponding discriminative approach, our evaluation on various corpora-including SIGHAN-2005, CIPS-SIGHAN-2010, and the Chinese Treebank (CTB)-shows that our integrated approach achieves the best performance reported in the literature on all testing sets when additional information and resources are allowed.

机译：本文提出了一个统一的，基于字符的生成模型，该模型包含用于解决中文分词的词外（OOV）问题的其他资源，其中可以在相应的子模型中独立利用不同类型的附加信息。本文主要介绍以下三种OOV类型：看不见的字典单词，命名实体和后缀派生的单词，当前方法都无法很好地处理它们。结果表明，我们的方法可以有效改善前两种类型的性能，并且在F评分中具有正向交互作用。此外，我们还分析了后缀信息无用的原因。在将提出的生成模型与相应的判别方法相结合之后，我们对各种语料库（包括SIGHAN-2005，CIPS-SIGHAN-2010和中国树库（CTB））的评估显示，我们的集成方法取得了文献报道的最佳性能在允许附加信息和资源的情况下在所有测试集中进行。

著录项

来源
《ACM transactions on Asian language information processing》 |2015年第3期|12.1-12.29|共29页
作者
XIAOQING LI; CHENGQING ZONG; KEH-YIH SU;
展开▼
作者单位

Institute of Automation, Chinese Academy of Sciences, No. 95, Zhongguancun East Road, Haidian District, Beijing, 100190, China;

Institute of Automation, Chinese Academy of Sciences, No. 95, Zhongguancun East Road, Haidian District, Beijing, 100190, China;

Academia Sinica;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Chinese word segmentation; out-of-vocabulary words; model integration; domain adaptation;

机译：中文分词;词汇外的单词;模型整合;领域适应;

相似文献

外文文献
中文文献
专利

1. Modelling Semantic Context of OOV Words in Large Vocabulary Continuous Speech Recognition [J] . Imran Sheikh, Dominique Fohr, Irina Illina, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2017,第3期

机译：大词汇量连续语音识别中OOV词的语义上下文建模
2. Handling OOV Words in Mandarin Spoken Term Detection with an Hierarchical n-Gram Language Model [J] . WANG Xuyang1, ZHANG Pengyuan1, NA Xingyu1, 电子学报：英文版 . 2017,第006期

机译：用分层N-GRAM语言模型处理普通话语言术语检测的OOV字
3. A Unified Character-Based Tagging Framework for Chinese Word Segmentation [J] . HAI ZHAO, CHANG-NING HUANG, MU LI, ACM transactions on Asian language information processing . 2010,第2期

机译：统一的基于字符的中文分词标记框架
4. A Unified Model for Joint Chinese Word Segmentation and POS Tagging with Heterogeneous Annotation Corpora [C] . Zhao Jiayi, Qiu Xipeng, Huang Xuanjing International Conference on Asian Language Processing . 2013

机译：异质注释语料库联合中文分词和POS标记的统一模型
5. Word segmentation, word recognition, and word learning: A computational model of first language acquisition. [D] . Daland, Robert. 2009

机译：分词，单词识别和单词学习：母语习得的计算模型。
6. Speculation Detection for Chinese Clinical Notes: Impacts of Word Segmentation and Embedding Models [O] . Shaodian Zhang, Tian Kang, Xingting Zhang, -1

机译：中医临床笔记的推测检测：分词和嵌入模型的影响
7. Subword-based modeling for handling OOV words inkeyword spotting [O] . Yanzhang He, Brian Hutchinson, Peter Baumann, 2014

机译：基于子字的ov单词inkeyword发现的模型
8. From Word-Spotting to OOV Modeling [R] . Fitzpatrick, P. 2001

机译：从Word-spotting到OOV modeling

A Unified Model for Solving the OOV Problem of Chinese Word Segmentation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅