首页> 外国专利> APPARATUS, METHOD AND COMPUTER PROGRAM FOR DOCUMENT CLASSIFICATION USING TERM ASSOCIATION ANALYSIS

APPARATUS, METHOD AND COMPUTER PROGRAM FOR DOCUMENT CLASSIFICATION USING TERM ASSOCIATION ANALYSIS

机译:利用术语关联分析进行文件分类的装置,方法和计算机程序

摘要

A document classification device includes: an input unit which is configured to receive multiple original documents respectively including multiple words; an importance analysis unit which analyzes relatively important words among the words included in the original documents to determine a first word set; a correlation analysis unit which uses the correlation between the words included in the first word set to determine a second word set; and a document classification unit which uses the second word set to classify the original documents. The document classification device uses the word correlation analysis to extract a feature set to be used for classification in order to remove noise terms with low importance and reflect the features of the domains of the original documents in the data analysis, thereby enhancing the performance and processing speed of the document classification compared to the conventional counterpart.
机译:文件分类装置包括:输入单元,被配置为接收分别包括多个单词的多个原始文件;以及重要性分析单元,其对原始文档中包含的单词中相对重要的单词进行分析以确定第一单词集;相关性分析单元,其使用第一单词集合中包括的单词之间的相关性来确定第二单词集合;文档分类单元,其使用第二单词集对原始文档进行分类。文档分类设备使用单词相关性分析来提取要用于分类的特征集,以去除具有低重要性的噪声项并在数据分析中反映原始文档的域的特征,从而提高性能和处理能力与常规副本相比,文档分类的速度更快。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号