Interesting-Phrase Mining for Ad-Hoc Text Analytics

机译：临时文本分析的有趣短语挖掘

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize the analysis of interesting phrases. These include named entities, important quotations, market slogans, and other multi-word phrases that are prominent in a dynamically derived ad-hoc subset of the corpus, e.g., being frequent in the subset but relatively infrequent in the overall corpus. We develop preprocessing and indexing methods for phrases, paired with new search techniques for the top-k most interesting phrases in ad-hoc subsets of the corpus. Our framework is evaluated using a large-scale real-world corpus of New York Times news articles.

机译：具有新闻，客户邮件和报告或Web 2.0贡献的大型文本语料库具有增强业务智能应用程序的巨大潜力。我们提出了一种框架，用于以通用，高效和可扩展的方式对此类数据执行文本分析。尽管许多现有文献都强调在博客或社会标签社区中挖掘关键字或标签，但我们强调对有趣短语的分析。这些包括命名实体，重要报价单，市场口号和其他多词短语，它们在语料库的动态派生即席子集中比较突出，例如，在子集中很常见，而在整个语料库中相对很少见。我们开发了短语的预处理和索引方法，并结合了新的搜索技术，用于对语料库的即席子集中的前k个最有趣的短语进行搜索。我们的框架是使用《纽约时报》新闻文章的大型真实语料库进行评估的。

著录项

来源
《International conference on very large data bases;VLDB 2010》|2011年|p.1348-1357|共10页
会议地点
作者
Srikanta Bedathur; Klaus Berberich; Jens Dittrich; Nikos Mamoulis; Gerhard Weikum;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. Text Mining and Big Data Analytics for Retrospective Analysis of Clinical Texts from Outpatient Care [J] . Dimitar Tcharaktchiev, Galia Angelova, Svetla Boytcheva, Cybernetics and information technologies: CIT . 2015,第4期

机译：文本挖掘和大数据分析，用于对门诊患者的临床文本进行回顾性分析
2. Leveraging text analytics in patent analysis to empower business decisions - A competitive differentiation of kinase assay technology platforms by I2E text mining software [J] . Yun Yun Yang, Thomas Klose, Jonathan Lippy, World Patent Information . 2014,第deca期

机译：在专利分析中利用文本分析来授权业务决策-I2E文本挖掘软件在激酶测定技术平台方面的竞争优势
3. Discovering latent themes in traffic fatal crash narratives using text mining analytics and network topology [J] . Kwayu Keneth Morgan, Kwigizile Valerian, Lee Kevin, Accident Analysis and Prevention . 2021,第Feba期

机译：使用文本挖掘分析和网络拓扑发现交通致命崩溃叙述中的潜在主题
4. Interesting-Phrase Mining for Ad-Hoc Text Analytics [C] . Srikanta Bedathur, Klaus Berberich, Jens Dittrich, International conference on very large data bases . 2010

机译：有趣的短语挖掘Ad-hoc文本分析
5. Chemical Process Data Analytics via Text Mining and Machine Learning [D] . Zhang, Tong. 2019

机译：化学过程通过文本挖掘和机器学习数据分析
6. Identifying the Uncertainty in Physician Practice Location through Spatial Analytics and Text Mining [O] . Xuan Shi, Bowei Xue, Imam M. Xierali 2016

机译：通过空间分析和文本挖掘识别医师执业地点的不确定性
7. Interesting-Phrase Mining for Ad-Hoc Text Analytics [O] . Srikanta Bedathur, Klaus Berberich, Jens Dittrich, 2010

机译：临时文本分析的有趣短语挖掘

Interesting-Phrase Mining for Ad-Hoc Text Analytics

摘要

著录项

相似文献

相关主题

期刊订阅