A multi-class SVM classification system based on learning methods from indistinguishable Chinese official documents

JuiHsi Fu; SingLing Lee

首页> 外文期刊>Expert Systems with Application >A multi-class SVM classification system based on learning methods from indistinguishable Chinese official documents

【24h】

A multi-class SVM classification system based on learning methods from indistinguishable Chinese official documents

机译：基于无法区分中文官方文档学习方法的多类SVM分类系统

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Support Vector Machines (SVM) has been developed for Chinese official document classification in One-against-All (OAA) multi-class scheme. Several data retrieving techniques including sentence segmentation, term weighting, and feature extraction are used in preprocess. We observe that most documents of which contents are indistinguishable make poor classification results. The traditional solution is to add misclassified documents to the training set in order to adjust classification rules. In this paper, indistinguishable documents are observed to be informative for strengthening prediction performance since their labels are predicted by the current model in low confidence. A general approach is proposed to utilize decision values in SVM to identify indistinguishable documents. Based on verified classification results and distinguishability of documents, four learning strategies that select certain documents to training sets are proposed to improve classification performance. Experiments report that indistinguishable documents are able to be identified in a high probability and are informative for learning strategies. Furthermore, LMID that adds both of misclassified documents and indistinguishable documents to training sets is the most effective learning strategy in SVM classification for large set of Chinese official documents in terms of computing efficiency and classification accuracy.

机译：支持向量机（SVM）已开发用于单对所有（OAA）多类方案的中文正式文档分类。预处理中使用了几种数据检索技术，包括句子分段，术语加权和特征提取。我们观察到，大多数内容难以区分的文档的分类结果很差。传统的解决方案是将分类错误的文档添加到训练集中，以调整分类规则。在本文中，由于当前模型以低置信度预测了它们的标签，因此观察到难以区分的文档对于增强预测性能具有指导意义。提出了一种通用方法来利用SVM中的决策值来识别无法区分的文档。基于已验证的分类结果和文档的可区分性，提出了将某些文档选择为训练集的四种学习策略，以提高分类性能。实验报告说，几乎没有区别的文档能够被识别，并且对学习策略很有帮助。此外，就计算效率和分类准确性而言，在大量中文官方文档的SVM分类中，将错误分类的文档和无法区分的文档都添加到训练集的LMID是最有效的学习策略。

著录项

来源
《Expert Systems with Application》 |2012年第3期|p.3127-3134|共8页
作者
JuiHsi Fu; SingLing Lee;
展开▼
作者单位

Department of Computer Science and Information Engineering, National Chung Cheng University, 168 University Road, Minhsiung Township, 62162 Chiayi, Taiwan, ROC;

Department of Computer Science and Information Engineering, National Chung Cheng University, 168 University Road, Minhsiung Township, 62162 Chiayi, Taiwan, ROC;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
support vector machines (SVM); multi-class classification; Chinese official document classification; Indistinguishability identification; Incremental learning;

机译：支持向量机（SVM）;多类分类;中文官方文件分类;不可识别性;增量学习;

相似文献

外文文献
中文文献
专利

1. SVM based adaptive learning method for text classification from positive and unlabeled documents [J] . Tao Peng, Wanli Zuo, Fengling He Knowledge and information systems . 2008,第3期

机译：基于支持向量机的自适应学习方法从正向和未标记文档中进行文本分类
2. SVM based adaptive learning method for text classification from positive and unlabeled documents [J] . Tao Peng, Wanli Zuo, Fengling He Knowledge and Information Systems . 2008,第3期

机译：基于支持向量机的自适应学习方法从正向和未标记文档中进行文本分类
3. Few shot learning for multi-class classification based on nested ensemble DSVM [J] . Wang Wei, Zhang Li, Zhang Mengjun, Ad hoc networks . 2020,第Mara期

机译：基于嵌套集成DSVM的多类别分类的很少镜头学习
4. Classification of campus e-complaint documents using Directed Acyclic Graph Multi-class SVM based on analytic hierarchy process [C] . Cholissodin Imam, Kurniawati Maya, Indriati, International Conference on Advanced Computer Science and Information Systems . 2014

机译：基于层次分析法的有向无环图多类支持向量机分类校园电子投诉文件
5. Classification hierarchique floue basee sur le SVM et son application pour la categorisation des documents. [D] . Guernine, Taoufik. 2010

机译：基于支持向量机的模糊层次分类及其在文档分类中的应用。
6. Correction: A Method of Neighbor Classes Based SVM Classification for Optical Printed Chinese Character Recognition [O] . Jie Zhang, Xiaohong Wu, Yanmei Yu, -1

机译：校正：一种基于邻类的支持向量机分类的光学印刷汉字识别方法
7. Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning [O] . Alberto Fernández, Mara José Deljesus, Francisco Herrera 2010

机译：基于成对学习的基于语言模糊规则的分类系统多类不平衡数据集

A multi-class SVM classification system based on learning methods from indistinguishable Chinese official documents

摘要

著录项

相似文献

相关主题

期刊订阅