首页> 外国专利> Document classification program, server and method based on textual and external features of document information

Document classification program, server and method based on textual and external features of document information

机译:基于文档信息的文本和外部特征的文档分类程序,服务器和方法

摘要

PROBLEM TO BE SOLVED: To provide a document classification program capable of enhancing determination accuracy based on a specific category (e.g., illegality and harmfulness) for Web document information.SOLUTION: Document information is described with sentence information and a markup language. The document classification program causes a computer to function as: document information separation means that separates object document information to be an analysis object into sentence information and markup language information; feature amount generation means that counts the number of times a character strings registered in advance appears for each of the sentence information and the markup language information, and generates a feature amount of a multidimensional vector indicating the number of appearances for every character string element; feature amount determination means that determines whether or not the object feature amount of the object document information falls in a specific range of learning feature amount obtained from a large amount of learning document information included in a specific category; and category classification means that classifies object document information determined to be true by the feature amount determination means as information included in the specific category.
机译:解决的问题:提供一种能够基于Web文档信息的特定类别(例如,违法性和有害性)提高确定准确性的文档分类程序。解决方案:文档信息由句子信息和标记语言描述。文件分类程序使计算机发挥以下作用:文件信息分离装置,将作为分析对象的对象文件信息分离为句子信息和标记语言信息。特征量生成装置是对每个句子信息和标记语言信息计数预先登记的字符串出现的次数,并生成表示每个字符串元素的出现次数的多维矢量的特征量;特征量确定装置确定目标文档信息的目标特征量是否落入从特定类别中包括的大量学习文档信息获得的特定学习特征量范围内;类别分类是将由特征量决定单元判定为正确的对象文档信息分类为特定类别所包含的信息。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号