首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >A New Text Categorization Technique Using Distributional Clustering and Learning Logic
【24h】

A New Text Categorization Technique Using Distributional Clustering and Learning Logic

机译:利用分布聚类和学习逻辑的文本分类新技术

获取原文
获取原文并翻译 | 示例

摘要

Text categorization is continuing to be one of the most researched NLP problems due to the ever-increasing amounts of electronic documents and digital libraries. In this paper, we present a new text categorization method that combines the distributional clustering of words and a learning logic technique, called Lsquare, for constructing text classifiers. The high dimensionality of text in a document has not been fruitful for the task of categorization, for which reason, feature clustering has been proven to be an ideal alternative to feature selection for reducing the dimensionality. We, therefore, use distributional clustering method (IB) to generate an efficient representation of documents and apply Lsquare for training text classifiers. The method was extensively tested and evaluated. The proposed method achieves higher or comparable classification accuracy and {rm F}_1 results compared with SVM on exact experimental settings with a small number of training documents on three benchmark data sets WebKB, 20Newsgroup, and Reuters-21578. The results prove that the method is a good choice for applications with a limited amount of labeled training data. We also demonstrate the effect of changing training size on the classification performance of the learners.
机译:由于电子文档和数字图书馆的数量不断增加,文本分类仍然是NLP研究最多的问题之一。在本文中,我们提出了一种新的文本分类方法,该方法结合了单词的分布聚类和一种称为Lsquare的学习逻辑技术,用于构造文本分类器。文档中文本的高维性对于分类任务并未取得丰硕的成果,因此,事实证明,特征聚类是减少维数的特征选择的理想选择。因此,我们使用分布聚类方法(IB)生成文档的有效表示并将Lsquare用于训练文本分类器。该方法经过了广泛的测试和评估。与SVM相比,该方法在精确的实验设置上与SVM相比,在三个基准数据集WebKB,20Newsgroup和Reuters-21578上具有少量训练文档,与SVM相比,可以实现更高或相当的分类精度和{rm F} _1结果。结果证明,该方法是带有有限数量的标记训练数据的应用程序的不错选择。我们还演示了改变培训规模对学习者分类表现的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号