Impact of Word Segmentation Errors on Automatic Chinese Text Classification

机译：分词错误对中文文本自动分类的影响

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, several sets of experiments were carried out to study the impact of word segmentation errors on automatic Chinese text classification. Comparison experiment of four word-based approaches was first carried out and the results show that the performance was significantly reduced when using automatic word segmentation instead of manual word segmentation which means errors caused by automatic word segmentation have an obvious impact on classification performance. We further conducted the experiment using character-based approach (N-gram). Although N-gram approach produces a large number of ambiguous words, the results show that it performed better than automatic word segmentation.

机译：本文通过几组实验研究了分词错误对中文文本自动分类的影响。首先进行了四种基于单词的方法的比较实验，结果表明，使用自动分词而不是手动分词会显着降低性能，这意味着由自动分词引起的错误对分类性能有明显的影响。我们进一步使用基于字符的方法（N-gram）进行了实验。尽管N-gram方法会产生大量的歧义词，但结果表明它比自动分词要好。

著录项

来源
《Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on》|2012年|p.271- 275|共5页
会议地点 Gold Coast(AU)
作者
Xi Luo;
展开▼
作者单位

Grad. Sch. of Eng., Mie Univ., Tsu, Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. A Study on Improving Word-Segmentation Accuracy in Automatic Chinese Text Processing [J] . LI, Li 上海大学学报：英文版 . 2001,第0z1期

机译：中文自动文本处理中提高分词精度的研究
2. An automatic classification of text documents based on correlative association of words [J] . Agnihotri Deepak, Verma Kesari, Tripathi Priyanka Journal of Intelligent Information Systems . 2018,第3期

机译：基于单词相关联想的文本文档自动分类
3. Automatic Extraction Of New Words Based On Google News Corpora For Supporting Lexicon-based Chinese Word Segmentation Systems [J] . Chin-Ming Hong, Chih-Ming Chen, Chao-Yang Chiu Expert systems with applications . 2009,第2p2期

机译：基于Google新闻语料库的自动提取新词以支持基于词典的中文分词系统
4. Chinese Text Classification without Automatic Word Segmentation [C] . Wei Liu, Ben Allison, David Guthrie, International Conference on Advanced Language Processing and Web Information Technology . 2007

机译：中国文本分类没有自动词分割
5. Defining and automatically identifying words in Chinese. [D] . Xue, Nianwen. 2002

机译：定义并自动识别中文单词。
6. A fine-grained Chinese word segmentation and part-of-speech tagging corpus for clinical text [O] . Ying Xiong, Zhongmin Wang, Dehuan Jiang, 2019

机译：用于临床文本的细粒度中文分词和词性标注语料库
7. Automatic detecting/correcting errors in Chinese text by an approximate word-matching algorithm [O] . Lei Zhang, Changning Huang, Ming Zhou, 2000

机译：近似词匹配算法自动检测/校正中文文本中的错误

Impact of Word Segmentation Errors on Automatic Chinese Text Classification

摘要

著录项

相似文献

相关主题

期刊订阅