There are a lot of non-scientific-related news on Websites.In order to improve the useful value for the news information, a novel multilevel dichotomous model of text automatic categorization extraction system for technology news based on TF-IDF was designed and implemented.The news offered by government news web-site and Phoenix as the research background in scientific news categorization extraction.Experiments showed a 85 .3 percent accuracy for scientific-related news and 82 .9 percent recognition rate for nonscientific-related news respectively in the test containing two hundred thousand documents and more than four thousand news clas-sifications.The results showed that the proposed method offered a useful reference model on website scientific intelligence.%为了改善从Web上获取的新闻信息的使用价值,针对Web网站存在大量非科技相关新闻的现状,以互联网上政府新闻网站、凤凰网等新闻为研究背景,选取TF-IDF文本加权方法,设计了科技新闻多层次二分类模型,实现了基于TF-IDF的科技新闻文本分类抽取系统,在20万新闻文档和4000多种分类上,实验取得了科技新闻85.3%的识别准确率和非科技新闻82.9%的识别率,为Web科技新闻分类抽取提供有实用价值的参考模型.
展开▼