Chinese base phrases chunking based on latent semi-CRF model

机译：基于潜在半CRF模型的汉语基础短语分块

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the fields of Chinese natural language processing, recognizing simple and non-recursive base phrases is an important task for natural language processing applications, such as information processing and machine translation. Instead of rule-based model, we adopt the statistical machine learning method, newly proposed Latent semi-CRF model to solve the Chinese base phrase chunking problem. The Chinese base phrases could be treated as the sequence labeling problem, which involve the prediction of a class label for each frame in an unsegmented sequence. The Chinese base phrases have sub-structures which could not be observed in training data. We propose a latent discriminative model called Latent semi-CRF(Latent Semi Conditional Random Fields), which incorporates the advantages of LDCRF(Latent Dynamic Conditional Random Fields) and semi-CRF that model the sub-structure of a class sequence and learn dynamics between class labels, in detecting the Chinese base phrases. Our results demonstrate that the latent dynamic discriminative model compares favorably to Support Vector Machines, Maximum Entropy Model, and Conditional Random Fields(including LDCRF and semi-CRF) on Chinese base phrases chunking.

机译：在中文自然语言处理领域，识别简单和非递归基本短语是自然语言处理应用程序（例如信息处理和机器翻译）的重要任务。代替基于规则的模型，我们采用统计机器学习方法，新提出的Latent semi-CRF模型来解决中文基础短语分块问题。汉语基本短语可以看作是序列标签问题，涉及对未分段序列中的每个帧的类别标签的预测。汉语基本短语具有在训练数据中无法观察到的子结构。我们提出了一种潜在的判别模型，称为潜在半条件CRF（潜在半条件随机场），该模型结合了LDCRF（潜在动态条件随机场）和半CRF的优势，该模型对类序列的子结构进行建模并学习之间的动态关系。类标签，用于检测中文基础短语。我们的研究结果表明，潜在的动态判别模型在中文基本短语分词方面优于支持向量机，最大熵模型和条件随机字段（包括LDCRF和semi-CRF）。

著录项

来源
《International Conference on Natural Language Processing and Knowledge Engineering》|2010年|P.1-7|共7页
会议地点
作者
Sun Xiao; Nan Xiaoli;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词
Chinese Base Phrases Chunking; Latent semi-CRF; Natural Language Processing;

机译：汉语基础短语分块;潜在半CRF;自然语言处理;

相似文献

外文文献
中文文献
专利

1. Detecting New Words from Chinese Text Using Latent Semi-CRF Models [J] . Xiao SUN, Degen HUANG, Fuji REN IEICE transactions on information and systems . 2010,第6期

机译：使用潜在的半CRF模型从中文文本中检测新单词
2. Detecting New Words from Chinese Text Using Latent Semi-CRF Models [J] . Xiao SUN, Degen HUANG, Fuji REN IEICE Transactions on Information and Systems . 2010,第6期

机译：使用潜在的半CRF模型从中文文本中检测新单词
3. A Latent Discriminative Variable Model for Automatic Identification of Chinese Base Phrases [J] . Xiao Sun, Xiaoli Nan Journal of information and computational science . 2010,第7期

机译：汉语基本短语自动识别的潜在判别变量模型
4. Chinese Base Phrases Chunking Based on Latent semi-CRF Model [C] . Xiao SUN, Xiaoli NAN Proceedings of the 6th international conference on natural language processing and knowledge engineering. . 2010

机译：基于潜在半CRF模型的汉语基础短语分词
5. Parameterizing Phrase Based Statistical Machine Translation Models: An Analytic Study. [D] . Cer, Daniel. 2011

机译：参数化基于短语的统计机器翻译模型：分析研究。
6. Phrase Based Topic Modeling for Semantic Information Processing in Biomedicine [O] . Zhiguo Yu, Todd R Johnson, Ramakanth Kavuluru -1

机译：基于短语的主题模型在生物医学语义信息处理中的应用
7. A Chunk-Based Reordering Model for Phrase-Based SMT Systems [O] . Chen Yidong, 陈毅东, Shi Xiaodong, 2008

机译：基于短语的SMT系统的基于块的重排序模型

Chinese base phrases chunking based on latent semi-CRF model

摘要

著录项

相似文献

相关主题

期刊订阅