An Exploratory Study of Enhancing Text Clustering with Auto-Generated Semantic Tags

机译：增强自动生成语义标记的文本聚类的探索性研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

With the exponentially growing volume of digital documents and internet content, it becomes very challenging to locate right information when desired. We heavily rely on search engines but most existing search tools are key-word based and they often return search results with low precision and recall. The emerging semantic tagging technology provides an automatic way to generate semantic tags from text. It has drawn more and more interest from text mining research communities. It is critical to study how to utilize semantic tags to improve text mining including clustering, which helps users to enhance their experience of searching and browsing documents. Unfortunately, most previous works on text clustering merely based on content information. A few recent researches take user-generated tags into account, however user generated tags are often noisy, inconsistent, redundant and lack of semantic information and hierarchical structure. In this work, we propose a Semantic Text Mining (STeM) framework to generate semantic tags for given documents and then utilize the semantic tags to improve text clustering. Different from the previous works, we represent a document by a combination of domains and high quality noun phrases. We investigate the performance of our methods in two different datasets and the results are evaluated by normalized mutual information. Experiment results demonstrated that our proposed method substantially outperformed the traditional Term Frequency-Inverse Document Frequency (TF-IDF) term vector based clustering. We find that incorporating semantic information into document representation is critical to improve the performance of text clustering.

机译：随着数字文档和互联网内容的指数增长，在需要时定位正确的信息变得非常具有挑战性。我们依赖搜索引擎，但大多数现有搜索工具是基于密钥字的，并且他们经常使用低精度和召回来返回搜索结果。新兴语义标记技术提供了一种自动方法来从文本生成语义标记。它从文本挖掘研究社区中吸引了越来越多的兴趣。研究如何利用语义标签来改善文本挖掘至关重要，包括群集，这有助于用户增强他们的搜索和浏览文档的体验。不幸的是，最先前的基于内容信息的文本聚类工作。最近的一些研究考虑了用户生成的标签，但是用户生成的标签通常是嘈杂的，不一致的，冗余和缺少语义信息和层级结构的。在这项工作中，我们提出了一个语义文本挖掘（Stew）框架来为给定文档生成语义标记，然后利用语义标记来改进文本群集。与以前的作品不同，我们通过域和高质量的名词短语组合代表文档。我们调查我们在两个不同的数据集中的方法的性能，结果通过标准化的相互信息进行评估。实验结果表明，我们所提出的方法基本上超越了传统术语频率 - 逆文档频率（TF-IDF）术语基于载体的聚类。我们发现将语义信息结合到文档表示至关重要，以提高文本群集的性能。

著录项

来源
《International Conference on Semantics, Knowledge and Grids》|2012年||共8页
会议地点
作者
Tang Xuning; Dang Jiangbo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.1-53;
关键词

相似文献

外文文献
中文文献
专利

1. Enhanced cross-domain document clustering with a semantically enhanced text stemmer (SETS) [J] . Ivan Stankov, Diman Todorov, Rossitza Setchi International journal of knowledge-based and intelligent engineering systems . 2013,第2期

机译：使用语义增强的文本词干分析器（SETS）增强的跨域文档聚类
2. An Exploratory Study on the Policy for Facilitating of Health Behaviors Related to Particulate Matter: Using Topic and Semantic Network Analysis of Media Text [J] . Hye Min Byun, You Jin Park, Eun Kyoung Yun Journal of Korean Academy of Nursing . 2021,第1期

机译：促进颗粒物问题促进健康行为的政策探索性研究：使用媒体文本的主题和语义网络分析
3. Extract the Semantic Meaning of Prepositions at Arabic Texts: An Exploratory Study [J] . Mohammad Khaled A. Al-Maghasbeh, Mohd Pouzi Bin Hamzah International Journal of Computer Trends and Technology . 2015,第3期

机译：提取阿拉伯语介词的语义含义：一项探索性研究
4. An Exploratory Study of Enhancing Text Clustering with Auto-Generated Semantic Tags [C] . Tang Xuning, Dang Jiangbo 2012 Eighth International Conference on Semantics, Knowledge and Grids. . 2012

机译：利用自动生成的语义标签增强文本聚类的探索性研究
5. Semantic preserving text representation and its applications in text clustering. [D] . Howard, Michael. 2012

机译：语义保留文本表示及其在文本聚类中的应用。
6. ‘MATRI-SUMAN’ a capacity building and text messaging intervention to enhance maternal and child health service utilization among pregnant women from rural Nepal: study protocol for a cluster randomised controlled trial [O] . Jitendra Kumar Singh, Rajendra Kadel, Dilaram Acharya, 2018

机译：MATRI-SUMAN能力建设和短信干预措施旨在提高尼泊尔农村孕妇的母婴保健服务利用率：一项整群随机对照试验的研究方案
7. Arabic Text Summarization Based on Latent Semantic Analysis to Enhance Arabic Documents Clustering [O] . Hanane Froud, Abdelmonaime Lachkar, Said Alaoui Ouatik 2013

机译：基于潜在语义分析的阿拉伯文文本摘要，增强阿拉伯文档聚类

An Exploratory Study of Enhancing Text Clustering with Auto-Generated Semantic Tags

摘要

著录项

相似文献

相关主题

期刊订阅