首页> 外文会议>International Conference on Universal Digital Library >Research on Example-based Text Categorization
【24h】

Research on Example-based Text Categorization

机译:基于示例的文本分类研究

获取原文

摘要

The goal of text categorization is the automatic classification of documents into predefined categories. Text categorization usually training text corpus to create a classifier by machine learning technology, then analyses and compares the features of unlabeled documents with that of the classes of the classifier to classify it into the most similar category. Some algorithms support this method, such as Nearest Neighbor, Naive Bayes, Support Vector Machine, etc. This method has some disadvantages, such as complicated algorithms, fewer numbers and lower levels of classes. This paper proposes a new method of text categorization from a new angle. It uses manual indexing experiences, coming from some large bibliographic databases, to construct an example base for automatic classification. Each record of the base is an indexing record, including cross concordance of class numbers and strings. It can be used to realize text categorization through computing the similarity between feature strings of unlabeled documents with each indexing examples. Empirical results prove that this method have many advantages, such as simpler computation, more numbers and deeper levers of classes. This paper will introduce its algorithm, method of construction of example base for classification, and performance of system at length.
机译:文本分类的目标是将文档自动分类为预定义类别。文本分类通常培训文本语料库通过机器学习技术创建分类器,然后通过分类器的类分析并将未标记文档的功能进行分析,以将其分类为最相似的类别。一些算法支持这种方法,例如最近的邻居,天真贝叶斯,支持向量机等。该方法具有一些缺点,例如复杂的算法,数量较少,较低的类。本文提出了一种从新角度进行文本分类的新方法。它使用来自一些大型书目数据库的手动索引体验,构建用于自动分类的示例基础。基数的每个记录都是索引记录,包括类编号和字符串的交叉一致性。它可用于通过计算具有每个索引示例的未标记文档的特征字符串之间的相似性来实现文本分类。经验结果证明,这种方法具有许多优点,例如更简单的计算,更多的数量和更深的杠杆。本文将介绍其算法,施工方法的分类底座,以及系统的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号