A Chinese word segmentation algorithm based on maximum entropy

机译：一种基于最大熵的文字分割算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic word segmentation technology is an important component part of modern Chinese information processing. It is the key technology of the Chinese full-text retrieval. This paper presents a Chinese word segmentation algorithm based on maximum entropy. It uses of part-of-speech tagging and word frequency tagging of corpus to establish maximum entropy model based on mutual information as a word segmentation language model to make word segmentation. At last, the binary model was used to test whether the expansion of the training corpus may impact the word segmentation accuracy, and the relationship curves between the expansion of training corpus and the word segmentation accuracy was obtained.

机译：自动词分割技术是现代中文信息处理的重要组成部分。它是中国全文检索的关键技术。本文介绍了基于最大熵的文字分割算法。它使用语音部分的词性标记和词汇标记来基于相互信息作为单词分段语言模型来建立最大熵模型，以制作字分割。最后，使用二进制模型来测试培训语料库的扩展是否可能影响字分割精度，并且获得了训练语料库的扩展与词分割精度之间的关系曲线。

著录项

来源
《International Conference on Machine Learning and Cybernetics》|2010年||共4页
会议地点
作者
Zhang Li-Yan; Qin Min; Zhang Xue-Mei; Ma Hong-Xia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词
Chinese full text retrieval; Maximum entropy; Word segmentation algorithm;

机译：中文全文检索;最大熵;字分割算法;

相似文献

外文文献
中文文献
专利

1. Applying rough sets in word segmentation disambiguation based on maximum entropy model [J] . JIANG Wei, WANG Xiao-long, GUAN Yi Journal of Harbin Institute of Technology . 2006,第1期

机译：基于最大熵模型的粗糙集在分词消歧中的应用
2. Applying rough sets in word segmentation disambiguation based on maximum entropy model [J] . JIANG Wei, WANG Xiao-long, GUAN Yi Journal of Harbin Institute of Technology . 2006,第1期

机译：基于最大熵模型的粗糙集在分词消歧中的应用
3. Applying rough sets in word segmentation disambiguation based on maximum entropy model [J] . 哈尔滨工业大学学报（英文版） . 2006,第001期

机译：基于最大熵模型的粗糙集在分词消歧中的应用
4. A Chinese word segmentation algorithm based on maximum entropy [C] . Zhang Li-Yan, Qin Min, Zhang Xue-Mei, Proceedings of the Ninth International Conference on Machine Learning and Cybernetics . 2010

机译：基于最大熵的中文分词算法
5. Statistical machine translation: Maximum entropy based translation models and search algorithms. [D] . Garcia Varea, Ismael. 2003

机译：统计机器翻译：基于最大熵的翻译模型和搜索算法。
6. Maximum Entropy Word-Frequency Chinese Characters and Multiple Meanings [O] . Xiaoyong Yan, Petter Minnhagen -1

机译：最大熵词频汉字和多种含义
7. Chinese word segmentation with a maximum entropy approach [O] . LOW JIN KIAT 2006

机译：最大熵法的中文分词

A Chinese word segmentation algorithm based on maximum entropy

摘要

著录项

相似文献

相关主题

期刊订阅