A logistic regression-based smoothing method for Chinese text categorization

Show-Jane Yen; Yue-Shi Lee; Jia-Ching Ying; Yu-Chieh Wu

首页> 外文期刊>Expert Systems with Application >A logistic regression-based smoothing method for Chinese text categorization

【24h】

A logistic regression-based smoothing method for Chinese text categorization

机译：基于逻辑回归的中文文本分类平滑方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Automatic Chinese text classification is an important and a well-known technology in the field of machine learning. The first step for solving Chinese text categorization problems is to tokenize the Chinese words from a sequence of non-segmented sentences. However, previous literatures often employ a Chinese word tokenizer that was trained with different sources and then perform the conventional text classification approaches. However, these taggers are not perfect and often provide incorrect word boundary information. In this paper, we propose an N-gram-based language model which takes word relations into account for Chinese text categorization without Chinese word tokenizer. To prevent from out-of-vocabulary, we also propose a novel smoothing approach based on logistic regression to improve accuracy. The experimental result shows that our approach outperforms traditional methods at least 11% on micro-average F-measure.

机译：中文文本自动分类是机器学习领域的一项重要且众所周知的技术。解决中文文本分类问题的第一步是从一系列非分段句子中对中文单词进行标记。但是，以前的文献经常使用经过不同来源训练的中文单词标记器，然后执行常规的文本分类方法。但是，这些标记器并不完美，通常会提供不正确的单词边界信息。在本文中，我们提出了一种基于N元语法的语言模型，该模型考虑了单词关系，而无需使用中文单词分词器就可以对中文文本进行分类。为了防止出现语音偏差，我们还提出了一种新的基于逻辑回归的平滑方法，以提高准确性。实验结果表明，我们的方法在微观平均F度量方面比传统方法至少好11％。

著录项

来源
《Expert Systems with Application》 |2011年第9期|p.11581-11590|共10页
作者
Show-Jane Yen; Yue-Shi Lee; Jia-Ching Ying; Yu-Chieh Wu;
展开▼
作者单位

Department of Computer Science and Information Engineering, Ming Chuan University 5, De-Ming Rd, Gweishan District. Taoyuan 333, Taiwan;

Department of Computer Science and Information Engineering, Ming Chuan University 5, De-Ming Rd, Gweishan District. Taoyuan 333, Taiwan;

Department of Computer Science and Information Engineering, National Cheng-Kung University 1, University Road, Tainan City 701, Taiwan;

Department of Electronic Commerce, Kai-Nan University I, Kainan Road, Luzhu Shiang, Taoyuan 33857, Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
text classification; n-gram-based classification; feature selection; word segmentation; logistic regression;

机译：文本分类;基于n-gram的分类;特征选择;分词;逻辑回归;

相似文献

外文文献
中文文献
专利

1. Arabic Text Categorization Using Logistic Regression [J] . Mayy M. Al-Tahrawi International Journal of Intelligent Systems and Applications . 2015,第6期

机译：使用Logistic回归的阿拉伯文本分类
2. A sparse version of the ridge logistic regression for large-scale text categorization [J] . Sujeevan Aseervatham, Anestis Antoniadis, Eric Gaussier, Pattern recognition letters . 2011,第2期

机译：岭逻辑回归的稀疏版本，用于大规模文本分类
3. Large-Scale Bayesian Logistic Regression for Text Categorization [J] . Alexander GENKIN, David D. LEWIS, David MADIGAN Technometrics . 2007,第3期

机译：大规模贝叶斯逻辑回归用于文本分类
4. Chinese Text Categorization Based on the Binary Weighting Model with Non-binary Smoothing [C] . Xue Dejun, Sun Maosong Advances in Information Retrieval . 2003

机译：基于非二进制平滑的二进制加权模型的中文文本分类
5. A new feature selection method based on support vector machines for text categorization. [D] . Xu, Yaquan. 2006

机译：一种基于支持向量机的文本分类新特征选择方法。
6. Evaluation of Forensic Data Using Logistic Regression-Based Classification Methods and an R Shiny Implementation [O] . Giulia Biosa, Diana Giurghita, Eugenio Alladio, 2020

机译：基于逻辑回归的分类方法和R闪亮实现评估法医数据
7. Evaluation of Forensic Data Using Logistic Regression-Based Classification Methods and an R Shiny Implementation [O] . Giulia Biosa, Diana Giurghita, Eugenio Alladio, 2020

机译：基于逻辑回归的分类方法和R闪亮实现评估法医数据

A logistic regression-based smoothing method for Chinese text categorization

摘要

著录项

相似文献

相关主题

期刊订阅