A Chinese word segmentation algorithm based on maximum entropy

机译：基于最大熵的中文分词算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic word segmentation technology is an important component part of modern Chinese information processing. It is the key technology of the Chinese full-text retrieval. This paper presents a Chinese word segmentation algorithm based on maximum entropy. It uses of part-of-speech tagging and word frequency tagging of corpus to establish maximum entropy model based on mutual information as a word segmentation language model to make word segmentation. At last, the binary model was used to test whether the expansion of the training corpus may impact the word segmentation accuracy, and the relationship curves between the expansion of training corpus and the word segmentation accuracy was obtained.

机译：自动分词技术是现代汉语信息处理的重要组成部分。它是中文全文检索的关键技术。提出了一种基于最大熵的中文分词算法。它利用语料库的词性标注和词频标注建立基于互信息的最大熵模型作为词分割语言模型进行词分割。最后，采用二元模型检验训练语料库的扩展是否会影响分词精度，并得到训练语料库的扩展与分词精度之间的关系曲线。

著录项

来源
《Proceedings of the Ninth International Conference on Machine Learning and Cybernetics》|2010年|1264-1267|共4页
会议地点
作者
Zhang Li-Yan; Qin Min; Zhang Xue-Mei; Ma Hong-Xia;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动推理、机器学习;
关键词
Chinese full text retrieval; Maximum entropy; Word segmentation algorithm;

机译：中文全文检索;最大熵;分词算法;

相似文献

外文文献
中文文献
专利

1. Applying rough sets in word segmentation disambiguation based on maximum entropy model [J] . JIANG Wei, WANG Xiao-long, GUAN Yi Journal of Harbin Institute of Technology . 2006,第1期

机译：基于最大熵模型的粗糙集在分词消歧中的应用
2. Applying rough sets in word segmentation disambiguation based on maximum entropy model [J] . JIANG Wei, WANG Xiao-long, GUAN Yi Journal of Harbin Institute of Technology . 2006,第1期

机译：基于最大熵模型的粗糙集在分词消歧中的应用
3. Applying rough sets in word segmentation disambiguation based on maximum entropy model [J] . 哈尔滨工业大学学报（英文版） . 2006,第001期

机译：基于最大熵模型的粗糙集在分词消歧中的应用
4. A Chinese word segmentation algorithm based on maximum entropy [C] . Zhang Li-Yan, Qin Min, Zhang Xue-Mei, International Conference on Machine Learning and Cybernetics . 2010

机译：一种基于最大熵的文字分割算法
5. Statistical machine translation: Maximum entropy based translation models and search algorithms. [D] . Garcia Varea, Ismael. 2003

机译：统计机器翻译：基于最大熵的翻译模型和搜索算法。
6. Maximum Entropy Word-Frequency Chinese Characters and Multiple Meanings [O] . Xiaoyong Yan, Petter Minnhagen -1

机译：最大熵词频汉字和多种含义
7. Chinese word segmentation with a maximum entropy approach [O] . LOW JIN KIAT 2006

机译：最大熵法的中文分词

A Chinese word segmentation algorithm based on maximum entropy

摘要

著录项

相似文献

相关主题

期刊订阅