首页>
外国专利>
Automatic Text Categorization Method Based on Unsupervised Learning, Using Keywords of Each Category and Measurement of the Similarity between Sentences
Automatic Text Categorization Method Based on Unsupervised Learning, Using Keywords of Each Category and Measurement of the Similarity between Sentences
PURPOSE: A method for automatically categorizing a document is provided to perform the categorization of the document collected from the Internet by automatically creating and learning the data with the keyword input of each category. CONSTITUTION: A preprocessing step(10) regulates the format of the collected document, divides into a sentence unit, and extracts the content words of each sentence through the linguistic analysis. A learning sentence set creating step(20) automatically creates the learning sentence set through the representative sentence extraction from the treated sentence and the similarity measurement between the sentences. A quality extraction and categorization step(30) classifies the input document by extracting and learning the quality with the use of the created learning sentence set. The preprocessing step includes a document regulating procedure(110), a sentence unit division procedure(120), a format element analysis and tagging procedure(130) and a content word extraction procedure(140).
展开▼