首页> 中文期刊> 《淮阴工学院学报》 >Web 科技新闻分类抽取算法

Web 科技新闻分类抽取算法

         

摘要

There are a lot of non-scientific-related news on Websites.In order to improve the useful value for the news information, a novel multilevel dichotomous model of text automatic categorization extraction system for technology news based on TF-IDF was designed and implemented.The news offered by government news web-site and Phoenix as the research background in scientific news categorization extraction.Experiments showed a 85 .3 percent accuracy for scientific-related news and 82 .9 percent recognition rate for nonscientific-related news respectively in the test containing two hundred thousand documents and more than four thousand news clas-sifications.The results showed that the proposed method offered a useful reference model on website scientific intelligence.%为了改善从Web上获取的新闻信息的使用价值,针对Web网站存在大量非科技相关新闻的现状,以互联网上政府新闻网站、凤凰网等新闻为研究背景,选取TF-IDF文本加权方法,设计了科技新闻多层次二分类模型,实现了基于TF-IDF的科技新闻文本分类抽取系统,在20万新闻文档和4000多种分类上,实验取得了科技新闻85.3%的识别准确率和非科技新闻82.9%的识别率,为Web科技新闻分类抽取提供有实用价值的参考模型.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号