Scalable text semantic clustering around topics

Brena Ramon; Ramirez Eduardo

首页> 外文期刊>Journal of intelligent & fuzzy systems: Applications in Engineering and Technology >Scalable text semantic clustering around topics

【24h】

Scalable text semantic clustering around topics

机译：围绕主题的可扩展文本语义聚类

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Detection of topics in Natural Language text collections is an important step towards flexible automated text handling, for tasks like text translation, summarization, etc. In the current dominant paradigm to topic modeling, topics are represented as probability distributions of terms. Although such models are theoretically sound, their high computational complexity makes them difficult to use in very large scale collections. In this work we propose an alternative topic modeling paradigm based on a simpler representation of topics as overlapping clusters of semantically similar documents, that is able to take advantage of highly-scalable clustering algorithms. Our Query-based Topic Modeling framework (QTM) is an information-theoretic method that assumes the existence of a "golden" set of queries that can capture most of the semantic information of the collection and produce models with maximum "semantic coherence". QTM was designed with scalability in mind and was executed in parallel using a Map-Reduce implementation; further, we show complexity measures that support our scalability claims. Our experiments show that the QTM can produce models of comparable or even superior quality than those produced by state of the art probabilistic methods.

机译：检测自然语言文本集合中的主题是迈向灵活的自动文本处理的重要一步，对于文本转换，摘要等所在的任务，在当前的主导范例到主题建模中，主题表示为术语的概率分布。虽然这种模型是理论上的声音，但它们的高计算复杂性使得它们难以在非常大的比例集中使用。在这项工作中，我们提出了一种替代主题建模范式，基于更简单的主题表示作为与语义上类似文档的重叠群体，能够利用高度可扩展的聚类算法。我们基于查询的主题建模框架（QTM）是一种信息 - 理论方法，假设存在一个“金色”的查询集，可以捕获集合的大多数语义信息，并产生最大“语义相干性”的模型。 QTM设计有可扩展性，并使用地图减少实施并行执行;此外，我们展示了支持我们可扩展性索赔的复杂性措施。我们的实验表明，QTM可以产生比通过最先进的概率方法产生的可比甚至优越的型号。

著录项

来源
《Journal of intelligent & fuzzy systems: Applications in Engineering and Technology》 |2019年第5期|共13页
作者
Brena Ramon; Ramirez Eduardo;
展开▼
作者单位

Tecnol Monterrey Av E Garza Sada 2501 Monterrey Mexico;

Tecnol Monterrey Av E Garza Sada 2501 Monterrey Mexico;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化系统;
关键词
Topics NLP clustering queries;

机译：主题NLP集群查询;

相似文献

外文文献
中文文献
专利

1. Scalable text semantic clustering around topics [J] . Brena Ramon, Ramirez Eduardo Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2019,第5期

机译：围绕主题的可扩展文本语义聚类
2. Improving Semantic Coherence of Gujarati Text Topic Model Using Inflectional Forms Reduction and Single-letter Words Removal [J] . Chauhan Uttam, Shah Apurva ACM transactions on Asian and low-resource language information processing . 2021,第1期

机译：使用拐点减少折射形式提高古吉拉特文本主题模型的语义连贯性和单字母单词去除
3. Text summarization using topic-based vector space model and semantic measure [J] . Ramesh Chandra Belwal, Sawan Rai, Atul Gupta Information Processing & Management . 2021,第3期

机译：基于主题的向量空间模型和语义测量的文本摘要
4. Topic Modeling of Russian-Language Texts Using the Parts-of-Speech Composition of Topics (on the Example of Volunteer Movement Semantics in Social Media) [C] . Anna Maltseva, Natalia Shilkina, Evgeniy Evseev, Conference of Open Innovations Association . 2021

机译：俄语文本的主题建模使用主题的语音份数（在社交媒体中的志愿者运动语义的示例中）
5. Semantic preserving text representation and its applications in text clustering. [D] . Howard, Michael. 2012

机译：语义保留文本表示及其在文本聚类中的应用。
6. Towards Semantically Sensitive Text Clustering: A Feature Space Modeling Technology Based on Dimension Extension [O] . Yuanchao Liu, Ming Liu, Xin Wang -1

机译：面向语义敏感的文本聚类：基于维扩展的特征空间建模技术
7. Text mining with semantic annotation : using enriched text representation for entity-oriented retrieval, semantic relation identification and text clustering [O] . Hou Jun 2014

机译：具有语义注释的文本挖掘：使用丰富的文本表示法进行面向实体的检索，语义关系识别和文本聚类
8. Text Clustering for Topic Detection. [R] . K. Sycara Y. Seo 2004

机译：用于主题检测的文本聚类。

Scalable text semantic clustering around topics

摘要

著录项

相似文献

相关主题

期刊订阅