Semantically-Guided Clustering of Text Documents via Frequent Subgraphs Discovery

机译：通过频繁的子图发现，语义导游文本文档的聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we introduce and analyze two improvements to GDClust [1], a system for document clustering based on the co-occurrence of frequent subgraphs. GDClust (Graph-Based Document Clustering) works with frequent senses derived from the constraints provided by the natural language rather than working with the co-occurrences of frequent keywords commonly used in the vector space model (VSM) of document clustering. Text documents are transformed to hierarchical document-graphs, and an efficient graph-mining technique is used to find frequent subgraphs. Discovered frequent subgraphs are then utilized to generate accurate sense-based document clusters. In this paper, we introduce two novel mechanisms called the Subgraph-Extension Generator (SEG) and the Maximum Subgraph-Extension Generator (MaxSEG) which directly utilize constraints from the natural language to reduce the number of candidates and the overhead imposed by our first implementation of GDClust.

机译：在本文中，我们介绍和分析GdClust [1]的改进，基于频繁子图的共同发生的文档聚类系统。基于GdClust（基于图形的文档群集）与自然语言提供的约束导出的频繁感官，而不是使用文档群集的矢量空间模型（VSM）中通常使用的频繁关键字的共同发生。文本文档被转换为分层文档图形，并且使用高效的图形挖掘技术来查找频繁的子图。然后被发现频繁的子图以产生准确的基于感测的文档集群。在本文中，我们介绍了两个称为子图 - 扩展发生器（SEG）的新机制，以及最大的子图 - 扩展发生器（MAXSEG），它直接利用来自自然语言的约束来减少我们第一次实施的候选者的数量和开销gdclust。

著录项

来源
《International Symposium on Methodologies for Intelligent Systems》|2011年||共11页
会议地点
作者
Rafal A. Angryk; M. Shahriar Hossain; Brandon Norick;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Graph-based data mining; Text clustering; Clustering with semantic constraints;

机译：基于图形的数据挖掘;文本群集;与语义约束群集;

相似文献

外文文献
中文文献
专利

1. Text Document Retrieval through Clustering using Meaningful Frequent Ordered Word Patterns [J] . Pushpalatha K. P., G. Raju International Journal of Applied Engineering Research . 2018,第7aPta2期

机译：通过使用有意义的频繁有序的单词模式来通过聚类来检索文本文档
2. A SOM-Based Document Clustering Using Frequent Max Substrings for Non-Segmented Texts [J] . Todsanai Chumwatana, Kok Wai Wong, Hong Xie Journal of Intelligent Learning Systems and Applications . 2010,第3期

机译：基于SOM的文档聚类，使用非分类文本的最大行数子字符串
3. Text document clustering based on frequent word meaning sequences [J] . Yanjun Li, Soon M. Chung, John D. Holt Data & Knowledge Engineering . 2008,第1期

机译：基于频繁词义序列的文本文档聚类
4. Semantically-Guided Clustering of Text Documents via Frequent Subgraphs Discovery [C] . Rafal A. Angryk, M. Shahriar Hossain, Brandon Norick Foundations of intelligent systems . 2011

机译：通过频繁子图发现对文本文档进行语义指导的聚类
5. RiboFSM: Frequent Subgraph Mining for the Discovery of RNA Structures and Interactions. [D] . Gawronski, Alexander. 2013

机译：RiboFSM：频繁的子图挖掘，用于发现RNA结构和相互作用。
6. RiboFSM: Frequent subgraph mining for the discovery of RNA structures and interactions [O] . Alex R Gawronski, Marcel Turcotte 2014

机译：RiboFSM：频繁的子图挖掘用于发现RNA结构和相互作用
7. Semantically-Guided Clustering of Text Documents via Frequent Subgraphs Discovery [O] . Rafal A. Angryk, M. Shahriar Hossain, Brandon Norick 2011

机译：通过频繁子图发现对文本文档进行语义引导聚类
8. GREWA Scalable Frequent Subgraph Discovery Algorithm. [R] . Kuramochi, M., Karypis, G. 2004

机译：GREWa可扩展频繁子图发现算法。

Semantically-Guided Clustering of Text Documents via Frequent Subgraphs Discovery

摘要

著录项

相似文献

相关主题

期刊订阅