首页> 外文期刊>Journal of Computer and Systems Sciences International >Categorization of Text Documents Taking into Account Some Structural Features
【24h】

Categorization of Text Documents Taking into Account Some Structural Features

机译:考虑到某些结构特征的文本文档分类

获取原文
获取原文并翻译 | 示例
           

摘要

This paper reviews the possibility of upgrading the conventional "bag-of-words" model to reflect the structural features of text documents and take them into account in the process of categorization by means of machine learning theory methods. It is suggested to use these features to characterize the relationships within a set of tokens. It is also proposed to use the names of such relationships as features, along with the names of tokens. The proposed models differ from the traditional approach, which only reflects unary relations. The efficiency of the upgraded methods of machine learning is tested by means of computer experiments run for the Reuters-21578 set classes by using eight common classifiers. The relevance of applying such a modernized approach to categorize text documents with the help of simple classifiers is demonstrated.
机译:本文回顾了升级常规“词袋”模型以反映文本文档的结构特征并在通过机器学习理论方法进行分类的过程中将它们考虑在内的可能性。建议使用这些功能来表征一组令牌中的关系。还建议使用这种关系的名称作为特征,以及标记的名称。所提出的模型不同于传统方法,后者仅反映一元关系。通过使用八个通用分类器对Reuters-21578集类别进行的计算机实验,测试了升级后的机器学习方法的效率。演示了在简单分类器的帮助下应用这种现代化方法对文本文档进行分类的相关性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号