Combining Machine Learning with Linguistic Heuristics for Chinese Word Segmentation

机译：将机器学习与语言细分的语言启发式相结合

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper describes a hybrid model that combines machine learning with linguistic heuristics for integrating unknown word identification with Chinese word segmentation. The model consists of two components: a position-of-character (POC) tagging component that annotates each character in a sentence with a POC tag that indicates its position in a word, and a merging component that transforms a POC-tagged character sequence into a word-segmented sentence. The tagging component uses a support vector machine based tagger to produce an initial tagging of the text and a transformation-based tagger to improve the initial tagging. In addition to the POC tags assigned to the characters, the merging component incorporates a number of linguistic and statistical heuristics to detect words with regular internal structures, recognize long words, and filter non-words. Experiments show that, without resorting to a separate unknown word identification mechanism, the model achieves an F-score of 95.0% for word segmentation and a competitive recall of 74.8% for unknown word recognition.

机译：本文介绍了一种混合模型，将机器学习与语言启发式相结合，以将未知的单词识别与中文分割集成。该模型由两个组件组成：一个字符位置（PoC）标记组件，标记组件用POC标记注释一个句子中的每个字符，该标签指示其在一个单词中的位置，以及将PoC标记字符序列转换为单词的合并组件一个词分段的句子。标记组件使用基于支持向量机的标记器来生成文本和基于转换的标记器的初始标记，以改善初始标记。除了分配给字符的POC标签之外，合并组件还包含许多语言和统计启发式，以检测具有常规内部结构的单词，识别长单词和过滤非单词。实验表明，在不诉诸单独的未知单词识别机制的情况下，该模型实现了单词分割的F分，对于未知的单词识别的竞争召回，竞争召回为74.8％。

著录项

来源
《International Florida Artificial Intelligence Research Society Conference》|2007年||共6页
会议地点
作者
Xiaofei Lu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Recognizing handwritten Chinese day and month words by combining a holistic method and a segmentation-based method [J] . Chongyang Zhang, Wei Li Neural Computing and Applications . 2013,第6期

机译：结合整体和基于分割的方法识别手写的中文日月单词
2. Recognizing handwritten Chinese day and month words by combining a holistic method and a segmentation-based method [J] . Chongyang Zhang, Wei Li Neural computing & applications . 2013,第6期

机译：结合整体和基于分割的方法识别手写的中文日月单词
3. Combining Machine Learning with Linguistic Heuristics for Chinese Word Segmentation [C] . Xiaofei Lu International Florida Artificial Intelligence Research Society Conference(FLAIRS 2007); 20070507-09; Key West,FL(US) . 2007

机译：将机器学习与语言启发式技术相结合进行中文分词
4. Towards high-performance word sense disambiguation by combining rich linguistic knowledge and machine learning approaches. [D] . Chen, Jinying. 2006

机译：通过将丰富的语言知识和机器学习方法结合起来，实现高性能的单词歧义消除。
5. A combined machine-learning and graph-based framework for the segmentation of retinal surfaces in SD-OCT volumes [O] . Bhavna J. Antony, Michael D. Abràmoff, Matthew M. Harper, 2013

机译：结合机器学习和基于图的框架来分割SD-OCT卷中的视网膜表面
6. Combining Linguistic and Machine Learning Techniques for Word Alignment Improvement [O] . Ayan Necip Fazil 2005

机译：结合语言学和机器学习技术来改善单词对齐

Combining Machine Learning with Linguistic Heuristics for Chinese Word Segmentation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅