Chinese New Word Finding Using Character-Based Parsing Model

机译：使用基于角色的解析模型的中国新词发现

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The new word finding is a difficult and indispensable task in Chinese segmentation. The traditional methods used the string statistical information to identify the new words in the large-scale corpus. But it is neither convenient nor powerful enough to describe the words' internal and external structure laws. And it is even the less effective when the occurrence frequency of the new words is very low in the corpus. In this paper, we present a novel method of using parsing information to find the new words. A character level PCFG model is trained by People Daily corpus and Penn Chinese Treebank. The characters are inputted into the character parsing system, and the words are determined by the parsing tree automatically. Our method describes the word-building rules in the full sentences, and takes advantage of rich context to find the new words. This is especially effective in identifying the occasional words or rarely used words, which are usually in low frequency. The preliminary experiments indicate that our method can substantially improve the precision and recall of the new word finding process.

机译：新的单词发现是中文分割中的一个困难而不可或缺的任务。传统方法使用字符串统计信息来识别大规模语料库中的新单词。但它既不方便也不足以描述单词“内部和外部结构法”。当语料库中新单词的发生频率非常低时，它甚至是较小的。在本文中，我们提出了一种使用解析信息来查找新单词的新方法。一个字符级PCFG模型由人们每日语料库和Penn Chinese TreeBank培训。字符被输入到字符解析系统中，并且单词由解析树自动确定。我们的方法描述了完整句子中的文字构建规则，并利用丰富的上下文来查找新单词。这对于识别偶尔单词或很少使用的单词特别有效，这些单词通常处于低频状态。初步实验表明，我们的方法可以大大提高新词发现过程的精度和召回。

著录项

来源
《International Joint Conference on Natural Language Processing》|2005年||共10页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation [J] . KUN WANG, CHENGQING ZONG, KEH-YIH SU ACM transactions on Asian language information processing . 2012,第2期

机译：集成基于生成和判别字符的中文分词模型
2. Character-Level Dependency Model for Joint Word Segmentation, POS Tagging, and Dependency Parsing in Chinese [J] . Zhen GUO, Yujie ZHANG, Chen SU, IEICE transactions on information and systems . 2016,第1期

机译：汉字联合分词，POS标记和依赖解析的字符级依赖模型
3. A Unified Character-Based Tagging Framework for Chinese Word Segmentation [J] . HAI ZHAO, CHANG-NING HUANG, MU LI, ACM transactions on Asian language information processing . 2010,第2期

机译：统一的基于字符的中文分词标记框架
4. Chinese New Word Finding Using Character-Based Parsing Model [C] . International Joint Conference on Natural Language Processing . 2005

机译：使用基于角色的解析模型的中国新词发现
5. Best-first word-lattice parsing: Techniques for integrated syntactic language modeling. [D] . Hall, Keith B. 2005

机译：最佳优先词格解析：集成句法语言建模技术。
6. Chinese Unknown Word Recognition for PCFG-LA Parsing [O] . Qiuping Huang, Liangye He, Derek F. Wong, -1

机译：PCFG-LA解析的中文未知词识别
7. 7Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation [O] . 2015

机译：基于生成和判别字符的中文分词模型的整合

Chinese New Word Finding Using Character-Based Parsing Model

摘要

著录项

相似文献

相关主题

期刊订阅