首页> 外国专利> System and Method for Mapping Categories among Heterogeneous Sources by Text Analysis

System and Method for Mapping Categories among Heterogeneous Sources by Text Analysis

机译:通过文本分析在异构源之间映射类别的系统和方法

摘要

The present invention provides a system and a method for mapping a category among different kinds of media while keeping a unique category system of an existing medium by reclassifying individual documents in terms of various media and storing these results in a document as a two-dimensional label to propose a logical device capable of searching various documents belonging to the different kinds of media based on the same category as if the documents belong to one medium. The system comprises a topic modeling unit which collects documents from different kinds of media, calculates correspondence between the document and a topic, integrates all collected documents, performs topic modeling, and structures each document; a primary learning and classification unit which classifies unclassified documents by utilizing classification algorithm generated through learning of previously classified documents among the structured documents in the topic modeling unit by using document classification based on semi-supervised learning, and generates primary learning data integrated with the existing previously classified document; and a secondary learning and classification unit which generates secondary learning data by assigning a category to the final target unclassified document by using the primary learning data reinforced through the primary learning and classification unit.
机译:本发明提供了一种系统和方法,该系统和方法用于通过在各种介质上对各个文档进行重新分类并将这些结果作为二维标签存储在文档中,同时在不同类型的介质之间映射类别,同时保持现有介质的唯一类别系统。提出一种逻辑设备,该逻辑设备能够基于同一类别来搜索属于不同种类的介质的各种文档,就好像该文档属于一种介质一样。该系统包括主题建模单元,该主题建模单元从不同种类的媒体收集文档,计算文档与主题之间的对应关系,集成所有收集的文档,执行主题建模,并构造每个文档。初级学习和分类单元,其通过使用基于半监督学习的文档分类,通过利用主题建模单元中的结构化文档中的先前分类文档的学习而生成的分类算法,对未分类文档进行分类,并生成与现有文档集成的初级学习数据先前分类的文件;次级学习和分类单元,其通过使用通过初级学习和分类单元增强的初级学习数据,通过将类别分配给最终目标未分类文档来生成次级学习数据。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号