首页>
外国专利>
DOCUMENT CLASSIFICATION PROGRAM, SERVER AND METHOD BASED ON SENTENCE FEATURES AND PHYSICAL FEATURES OF DOCUMENT INFORMATION
DOCUMENT CLASSIFICATION PROGRAM, SERVER AND METHOD BASED ON SENTENCE FEATURES AND PHYSICAL FEATURES OF DOCUMENT INFORMATION
展开▼
机译:基于文档信息的句子特征和物理特征的文档分类程序,服务器和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
PROBLEM TO BE SOLVED: To provide a document classification program capable of enhancing determination accuracy based on a specific category (e.g., illegality and harmfulness) for Web document information.SOLUTION: Document information is described with sentence information and a markup language. The document classification program causes a computer to function as: document information separation means that separates object document information to be an analysis object into sentence information and markup language information; feature amount generation means that counts the number of times a character strings registered in advance appears for each of the sentence information and the markup language information, and generates a feature amount of a multidimensional vector indicating the number of appearances for every character string element; feature amount determination means that determines whether or not the object feature amount of the object document information falls in a specific range of learning feature amount obtained from a large amount of learning document information included in a specific category; and category classification means that classifies object document information determined to be true by the feature amount determination means as information included in the specific category.
展开▼