首页> 外文会议>International conference on Asian digital libraries >A Linear Text Classification Algorithm Based on Category Relevance Factors
【24h】

A Linear Text Classification Algorithm Based on Category Relevance Factors

机译:基于类别相关因子的线性文本分类算法

获取原文

摘要

In this paper, we present a linear text classification algorithm called CRF. By using category relevance factors, CRF computes the feature vectors of training documents belonging to the same category. Based on these feature vectors, CRF induces the profile vector of each category. For new unlabelled documents, CRF adopts a modified cosine measure to obtain similarities between these documents and categories and assigns them to categories that have the biggest similarity scores. In CRF, it is profile vectors not vectors of all training documents that join in computing the similarities between documents and categories. We evaluated our algorithm on a subset of Reuters-21578 and 20_newsgroups text collections and compared it against k-NN and SVM. Experimental results show that CRF outperforms k-NN and is competitive with SVM.
机译:在本文中,我们提出了一种名为CRF的线性文本分类算法。通过使用类别相关因子,CRF计算属于同一类别的培训文档的特征向量。基于这些特征向量,CRF引起每个类别的轮廓矢量。对于新的未标记文档,CRF采用修改的余弦措施,以获得这些文档和类别之间的相似性,并将其分配给具有最大相似性分数的类别。在CRF中,它是个人资料向量,而不是加入在计算文档和类别之间的相似性的所有培训文档的向量。我们在REUTERS-21578和20_NEWSGROUPS文本集合的子集上进行了评估了我们的算法,并将其与K-NN和SVM进行比较。实验结果表明,CRF优于K-NN,与SVM具有竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号