A Chinese Word Segmentation Based on Machine Learning

机译：基于机器学习的中文分词

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Different from English, there are no interval marks between words in Chinese. Segmenting Chinese text to words is the first job for every kind of Chinese information processing, so Chinese word segmentation is a basal and difficult issue in the field of Chinese information processing. Traditional word segmentation systems have to establish the dictionary and add unknown words out of the dictionary with manual work. This paper proposes a new Chinese word segmentation model which can automatically establish a dictionary, gradually update it and perfect it based on machine learning. Four modules of the machine learning model for Chinese word segmentation system are introduced in detail and some improvements of the algorithms are made on some module to improve system performance. After the test of closed corpus and open corpus, the results show that the method alleviates the workload of building and maintaining the dictionary, furthermore, it resolves the issues of ambiguity processing and unknown words recognition.

机译：与英语不同，中文单词之间没有间隔标记。将中文文本分割为单词是每种中文信息处理的首要工作，因此中文单词分割是中文信息处理领域的基础难题。传统的分词系统必须建立词典，并通过手工工作将未知单词添加到词典之外。本文提出了一种新的中文分词模型，该模型可以自动建立字典，然后逐步更新并在机器学习的基础上对其进行完善。详细介绍了中文分词系统的机器学习模型的四个模块，并对某些模块进行了算法改进，以提高系统性能。通过对封闭语料库和开放语料库的测试，结果表明该方法减轻了词典的建立和维护工作量，解决了歧义处理和未知词识别的问题。

著录项

来源
《International Workshop on Education Technology and Computer Science;ETCS 2009》|2009年|610-613|共4页
会议地点
作者
Wang Hongsheng; Cui Mingming;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
learning (artificial intelligence); natural language processing; word processing; Chinese information processing; Chinese word segmentation; English; dictionary; machine learning; ambiguity processing; artificial dictionary; unknown words recognition;

机译：学习（人工智能）;自然语言处理;词处理;中文信息处理;中文分词;英语;词典;机器学习;歧义处理;人工词典;未知词识别;

相似文献

外文文献
中文文献
专利

1. Learning Chinese Word Segmentation Based on Bidirectional GRU-CRF and CNN Network Model [J] . Chenghai Yu, Shupei Wang, Jiajun Guo International journal of technology and human interaction . 2019,第3期

机译：基于双向GRU-CRF和CNN网络模型的中文分词学习
2. Automatic Extraction Of New Words Based On Google News Corpora For Supporting Lexicon-based Chinese Word Segmentation Systems [J] . Chin-Ming Hong, Chih-Ming Chen, Chao-Yang Chiu Expert systems with applications . 2009,第2p2期

机译：基于Google新闻语料库的自动提取新词以支持基于词典的中文分词系统
3. A Chinese word segmentation based on language situation in processing ambiguous words [J] . Zhang MY, Lu ZD, Zou CY Information Sciences: An International Journal . 2004,第3a4期

机译：基于语言环境的歧义词中文分词
4. A Chinese Word Segmentation Based on Machine Learning [C] . Wang Hongsheng, Cui Mingming Education Technology and Computer Science, 2009. ETCS '09 . 2009

机译：基于机器学习的中文分词
5. Experimental comparison of discriminative learning approaches for Chinese word segmentation. [D] . Song, Dong. 2008

机译：判别学习方法对中文分词的实验比较。
6. A Character Level Based and Word Level Based Approach for Chinese-Vietnamese Machine Translation [O] . Phuoc Tran, Dien Dinh, Hien T. Nguyen 2016

机译：基于字符级和基于单词级的汉越机器翻译方法
7. Combination of machine learning methods for optimum chinese word segmentation [O] . Masayuki Asahara, Chooi-ling Goh, Kenta Fukuoka, 2005

机译：结合机器学习方法进行最佳中文分词

A Chinese Word Segmentation Based on Machine Learning

摘要

著录项

相似文献

相关主题

期刊订阅