Parsing-based Chinese word segmentation integrating morphological and syntactic information

机译：结合形态和句法信息的基于解析的中文分词

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The conventional sequence labeling methods for Chinese word segmentation do not fully utilize the linguistic information, which restricts further improvements of the performance. Chinese morphology intensively investigates the constructions and usages of Chinese words, which is helpful to Chinese word segmentation. Furthermore, some word segmentation ambiguities cannot be resolved only by means of the lexical information, and the final disambiguations take place in the parsing process. In this paper, we propose a parsing-based Chinese word segmentation model, which can fully utilize the morphological and syntactic information. Experiments on Penn Chinese Treebank(CTB) 5.0 show that the proposed model obtains competitive performances as the CRFs-based model. To investigate the relationship between our parsing-based model and the CRFs-based model, a maximum entropy model based framework for integrating different knowledge sources is employed. The integrating model obtains an F-measure of 97.9, 25% in segmentation error rate reduction relative to the CRFs-based model, which indicates that the two models are complementary to each other.

机译：传统的中文分词序列标记方法不能充分利用语言信息，从而限制了性能的进一步提高。汉语形态学对汉语单词的结构和用法进行了深入研究，有助于汉语单词的切分。此外，某些单词分割歧义性不能仅通过词法信息来解决，最终歧义歧义发生在解析过程中。本文提出了一种基于解析的中文分词模型，该模型可以充分利用词法和句法信息。在Penn Chinese Treebank（CTB）5.0上进行的实验表明，该模型作为基于CRFs的模型获得了竞争优势。为了研究基于解析的模型与基于CRF的模型之间的关系，采用了基于最大熵模型的框架来集成不同的知识源。集成模型获得的F度量为97.9，相对于基于CRF的模型，分割错误率降低了25％，这表明这两个模型是相互补充的。

著录项

来源
《7th International Conference on Natural Language Processing and Knowledge Engineering》|2011年|p.114-121|共8页
会议地点 Tokushima(JP)
作者
Wu Xihong; Zhang Meng; Lin Xiaojun;
展开▼
作者单位

Speech and Hearing Research Center, Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. The Separability of Morphological Processes from Semantic Meaning and Syntactic Class in Production of Single Words: Evidence from the Hebrew Root Morpheme [J] . Deutsch Avital Journal of psycholinguistic research . 2016,第1期

机译：单个词产生中形态过程与语义和句法类的可分离性：来自希伯来语根词素的证据
2. Integrating Generative and Discriminative Character-Based Models for Chinese Word Segmentation [J] . KUN WANG, CHENGQING ZONG, KEH-YIH SU ACM transactions on Asian language information processing . 2012,第2期

机译：集成基于生成和判别字符的中文分词模型
3. Automatic Extraction Of New Words Based On Google News Corpora For Supporting Lexicon-based Chinese Word Segmentation Systems [J] . Chin-Ming Hong, Chih-Ming Chen, Chao-Yang Chiu Expert systems with applications . 2009,第2p2期

机译：基于Google新闻语料库的自动提取新词以支持基于词典的中文分词系统
4. Parsing-based Chinese word segmentation integrating morphological and syntactic information [C] . Wu Xihong, Zhang Meng, Lin Xiaojun International Conference on Natural Language Processing and Knowledge Engineering . 2011

机译：基于解析的中文词分割整合形态和句法信息
5. Words and subwords: Phonology in a piece-based syntactic morphology. [D] . Shwayder, Kobey. 2015

机译：单词和子单词：基于片段句法形态的语音。
6. Orthographic Transparency Enhances Morphological Segmentation in Children Reading Hebrew Words [O] . Laurice Haddad, Yael Weiss, Tami Katzir, -1

机译：正字法透明度提高了阅读希伯来语单词的儿童的形态学分段
7. Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff [O] . Wei-yun Ma 2003

机译：CKIP中文分词系统的首次国际分词推广

Parsing-based Chinese word segmentation integrating morphological and syntactic information

摘要

著录项

相似文献

相关主题

期刊订阅