【24h】

Design and Implementation of Document Classification using Keyword Frequency and TFIDF

机译:基于关键词频率和TFIDF的文档分类设计与实现

获取原文
获取原文并翻译 | 示例

摘要

An algorithm for classifying documents through a keyword extractor is introduced in this study. The system consists of a document collector, indexer and a document classifier. The conceptual knowledge of the category to be classified is required for classification. The web document collector collects web documents from web directories of internet portal sites and the title, hyperlink and text data are abstracted from these documents to be saved in files. The conceptual knowledge is constructed by applying a method that combines the keyword term-frequency method and TFIDF algorithm through the indexer. Finally, the document classifier applies the classification algorithm and the conceptual knowledge on the documents to be classified for classifying the documents.
机译:本文介绍了一种通过关键字提取器对文档进行分类的算法。该系统由文档收集器,索引器和文档分类器组成。分类需要分类的概念知识。 Web文档收集器从Internet门户网站的Web目录收集Web文档,并且标题,超链接和文本数据从这些文档中提取出来并保存在文件中。通过应用一种通过索引器将关键字词频方法与TFIDF算法相结合的方法来构造概念知识。最后,文档分类器将分类算法和概念知识应用于要分类的文档,以对文档进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号