首页>
外国专利>
System and Method for Mapping Categories among Heterogeneous Sources by Text Analysis
System and Method for Mapping Categories among Heterogeneous Sources by Text Analysis
展开▼
机译:通过文本分析在异构源之间映射类别的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention provides a system and a method for mapping a category among different kinds of media while keeping a unique category system of an existing medium by reclassifying individual documents in terms of various media and storing these results in a document as a two-dimensional label to propose a logical device capable of searching various documents belonging to the different kinds of media based on the same category as if the documents belong to one medium. The system comprises a topic modeling unit which collects documents from different kinds of media, calculates correspondence between the document and a topic, integrates all collected documents, performs topic modeling, and structures each document; a primary learning and classification unit which classifies unclassified documents by utilizing classification algorithm generated through learning of previously classified documents among the structured documents in the topic modeling unit by using document classification based on semi-supervised learning, and generates primary learning data integrated with the existing previously classified document; and a secondary learning and classification unit which generates secondary learning data by assigning a category to the final target unclassified document by using the primary learning data reinforced through the primary learning and classification unit.
展开▼