首页> 外文会议>Database and Expert Systems Applications; Lecture Notes in Computer Science; 4080 >Topic Structure Mining for Document Sets Using Graph-Based Analysis
【24h】

Topic Structure Mining for Document Sets Using Graph-Based Analysis

机译:使用基于图的分析对文档集进行主题结构挖掘

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

This paper proposes a novel text mining method for a document set based on graph-based analysis. Graph-based analysis first identifies the similarity links in the document set and then determines core documents, those that have the highest level of centrality. Each core document represents a different topic. Next, the centrality scores are used together with the graph structure to identify those documents that are associated with the core documents. This process results in a predetermined number of topics. For each topic the user is presented with a set of documents in three-layer structure: core document, supplemental documents (those that are strongly associated with the core document), and subtopic documents (those that are only slightly associated with the core document and supplemental documents). The user can select any the topics and browse the documents related to that topic. Furthermore, the user can select documents according to the level; for example, subtopic documents are assumed to contain information that differs from the topic indicated and so might be interesting. In analyses of a set of newspaper articles, we evaluate "accuracy of topic identification" and "accuracy of document collecting related to the topics". Furthermore, we show an example of document set visualization based on graph structure and centrality score; the results indicate the method's usefulness for browsing and analyzing document sets.
机译:本文提出了一种基于图分析的文本集文本挖掘新方法。基于图的分析首先确定文档集中的相似性链接,然后确定核心文档,即具有最高集中度的文档。每个核心文档代表一个不同的主题。接下来,将中心度得分与图形结构一起使用,以标识与核心文档关联的那些文档。该过程导致预定数量的主题。对于每个主题,向用户显示一组三层结构的文档集:核心文档,补充文档(与核心文档紧密相关的文档)和副主题文档(与核心文档和文档仅稍微相关的文档)补充文件)。用户可以选择任何主题并浏览与该主题相关的文档。此外,用户可以根据级别选择文档。例如,假定子主题文档包含与所指示主题不同的信息,因此可能很有趣。在分析一组报纸文章时,我们评估“主题识别的准确性”和“与主题相关的文档收集的准确性”。此外,我们展示了一个基于图结构和中心度得分的文档集可视化示例;结果表明该方法对浏览和分析文档集很有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号