Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence

机译：基于语义词共现图结构的文本聚类算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text theme is the key of text clustering, while the co-occurrence words can be very stronger to express text theme in document. This paper proposes a text clustering algorithm based on the text semantic representation and the graph structure of word co-occurrence on the basis of in-depth studying text theme mining and word co-occurrence. First, the algorithm constructs the text graph-structure according to the co-occurrence of feature words. In other words, it uses the graph structure to represent all texts. Then, it adopts the maximum common sub-graph between two texts to calculate their similarity and combines with K-means clustering algorithm to realize the document clustering. The compared experimental results with hierarchical clustering algorithm show the K-means clustering algorithm based on the graph structures of word co-occurrence greatly reduce the high dimension of text vector and the algorithm complexity, significantly improves the efficiency and accuracy of text clustering, and it can also produce the clustering effect of good quality.

机译：文本主题是文本聚类的关键，而共现词可以更强大地表达文档中的文本主题。在深入研究文本主题挖掘和词共现的基础上，提出了一种基于文本语义表示和词共现图结构的文本聚类算法。首先，该算法根据特征词的共现来构造文本图结构。换句话说，它使用图结构表示所有文本。然后，采用两个文本之间最大的公共子图计算相似度，并结合K-means聚类算法实现文档聚类。与分层聚类算法的比较实验结果表明，基于词共现的图结构的K-means聚类算法大大降低了文本向量的高维和算法复杂度，显着提高了文本聚类的效率和准确性，还可以产生高质量的聚类效果。

著录项

来源
《2016 International Conference on Information Systems and Artificial Intelligence》|2016年|497-502|共6页
会议地点 Hong Kong(CN)
作者
Chun-Xia Jin; Qiu-Chan Bai;
展开▼
作者单位

Fac. of Comput. Software Eng., Huaiyin Inst. of Technol., Huaian, China;

Fac. of Autom., Huaiyin Inst. of Technol., Huaian, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Semantics; Clustering algorithms; Feature extraction; Algorithm design and analysis; Vocabulary; Software algorithms; Complexity theory;

机译：语义;聚类算法;特征提取;算法设计与分析;词汇;软件算法;复杂性理论;

相似文献

外文文献
中文文献
专利

1. Sentiment word co-occurrence and knowledge pair feature extraction based LDA short text clustering algorithm [J] . Wu Di, Yang Ruixin, Shen Chao Journal of Intelligent Information Systems . 2021,第1期

机译：情绪字共有和知识对特征提取基于LDA短文本聚类算法
2. ARABIC TEXT CLUSTERING BASED ON K-MEANS ALGORITHM WITH SEMANTIC WORD EMBEDDING [J] . HASNAA R. H. SOLIMAN, MOHAMED GRIDA, MOHAMED HASSAN Journal of Theoretical and Applied Information Technology . 2019,第21期

机译：基于K-Means算法的语义词嵌入阿拉伯语文本聚类
3. News Keyword Extraction Algorithm Based on Semantic Clustering and Word Graph Model [J] . Ao Xiong, Derong Liu, Hongkang Tian, 清华大学学报（英文版） . 2021,第006期

机译：新闻关键字基于语义聚类和字图模型的提取算法
4. Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence [C] . Chun-Xia Jin, Qiu-Chan Bai International Conference on Information Systems and Artificial Intelligence . 2016

机译：基于语义词的图结构的文本聚类算法
5. Graph-based data structures for refinement and derefinement algorithms based on the longest edge bisection. Applications (Spanish text). [D] . Suarez Rivero, Jose Pablo. 2001

机译：基于图的数据结构，用于基于最长边平分的优化和反优化算法。应用程序（西班牙语文本）。
6. Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models [O] . Hendrik Vankrunkelsven, Steven Verheyen, Gert Storms, 2018

机译：预测词法规范：单词联想模型与基于文本的单词共现模型之间的比较
7. Figure 1: (A) Example of a text-based forma mentis network. A TFMN can be represented either as an edge-coloured graph or as a multiplex network. Positive (negative) words are highlighted in cyan (red). Neutral words are in black. Syntactic links between positive (negative) words are highlighted in cyan (red) too. Syntactic links between positive and negative concepts are in purple. All semantic links of meaning overlap are highlighted in green. (B) Infographics about how a TFMN is assembled. Individuals organise their knowledge and emotional perception of the real world in their mental lexicon (comic clouds). [O] . -1

机译：图1：（a）基于文本的Forma Mentis网络示例。 TFMN可以用作边缘彩色图形或作为多路复用网络表示。在青色（红色）突出显示正（负）单词。中立词是黑色的。在青色（红色）突出显示正（否定）单词之间的句法链接。正面和消极概念之间的句法链接在紫色。含义重叠的所有语义链接都以绿色突出显示。（b）关于TFMN如何组装的信息图表。个人在他们的精神词典（漫画云）中对现实世界组织了他们的知识和情感感知。
8. Semantics-Based Reference Resolution in Technical Text Processing: An Exploration of Using the WordNet Database in the Computerized Comprehensibility System. [R] . Kieras, D. E. 1992

机译：基于语义的技术文本处理参考分辨率：在计算机化可理解系统中使用WordNet数据库的探索。

Text Clustering Algorithm Based on the Graph Structures of Semantic Word Co-occurrence

摘要

著录项

相似文献

相关主题

期刊订阅