Graph-based Algorithms for Keyphrase Extraction in Social Text.

机译：基于图的社交文本中关键词提取算法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The sheer volume of text in the web mandates automated approaches for identifying keyphrases that distinguish documents and help recognize important topics within documents. Automatic extraction of keyphrases is accomplished by designing algorithms capable of quantifying saliency in text. Measuring saliency score of textual units has traditionally used bag-of-words (BOW) approaches where the ranking is measured without considering the context. Such approach has several limitations as in polysemy and synonymy, where it is hard to detect these natural characteristics of text without understanding the context. In contrast, graph-based approaches model the relation between textual units that alleviate the aforementioned problems. In this dissertation, we introduce a collection of novel approaches for graph-based keyphrase ranking.;First, we propose a novel random walk extension for graph-based ranking that can leverage weights on both vertices and edges, called NE-Rank. The ranking algorithm combines additional ranking methods that enhance existing graph-based approaches. Specifically, we combine a discriminative ranking approach as in tf-idf to the co-occurrence ranking in graphs. Moreover, the ranking model uses social tags, as in Twitter's hashtags, and explores leveraging them by boosting their weights for the task of keyphrase extraction in social microposts. Additionally, we propose a lexical graph expansion through social tags for keyphrase extraction. After modeling the textual content of microposts in a lexical graph, we expand the graph by finding more similar content linked by tags. We show a number of different approaches to lexical graph expansion through Twitter hashtags, and show a significant improvement over using the textual content alone.;Second, we propose a new approach for measuring saliency in short documents. We model the textual units in a hypergraph by modeling words as vertices and short documents as hyperedges, and we study a high-order co-occurrence relation that is beyond the pair-wise relation in graphs. Therefore, we propose a novel probabilistic random walk over hypergraphs that captures weights on vertices and hyperedges to rank vertices. We compare our proposed random walk with different random walk approaches for hypergraphs and show the validity of the approach. Finally, we propose a complete ranking framework for extracting keyphrases from short documents using the hypergraph proposed random walk. The ranking takes into account temporal and social attributes that are important for a dynamic genre such as Twitter.

机译：网络中庞大的文本量要求使用自动方法来识别可区分文档并帮助识别文档中重要主题的关键短语。通过设计能够量化文本显着性的算法，可以完成关键词的自动提取。传统上，衡量文本单位的显着性分数使用的是词袋（BOW）方法，在不考虑上下文的情况下衡量排名。这种方法在多义和同义词中有一些局限性，在这种情况下，如果不了解上下文就很难检测文本的这些自然特征。相反，基于图的方法对减轻上述问题的文本单元之间的关系建模。本文介绍了一系列基于图的关键词排序的新方法。首先，我们提出了一种基于图的排序的新型随机游走扩展，它可以利用顶点和边缘上的权重，称为NE-Rank。排序算法结合了其他排序方法，这些方法增强了现有的基于图的方法。具体来说，我们将tf-idf中的判别式排名方法与图中的同时出现排名相结合。此外，排名模型使用社交标签（如Twitter的＃标签），并通过增加社交标签的权重来探索它们的权重，从而利用它们。此外，我们提出了通过社交标签进行词法图扩展以提取关键短语的方法。在对词图中的微博的文本内容进行建模之后，我们通过查找更多由标签链接的相似内容来扩展图。我们展示了许多通过Twitter主题标签扩展词法图的方法，并显示了相对于仅使用文本内容的显着改进。其次，我们提出了一种测量短文档中显着性的新方法。我们通过将单词建模为顶点并将短文档建模为超边来对超图中的文本单位进行建模，并且我们研究了图中的成对关系之外的高阶共现关系。因此，我们提出了一种超概率图上的新型概率随机游走，它捕获了顶点和超边上的权重以对顶点进行排序。我们将我们提出的随机游走方法与针对超图的不同随机游走方法进行比较，并证明了该方法的有效性。最后，我们提出了一个完整的排名框架，用于使用超图提议的随机游走从短文档中提取关键短语。排名考虑了时间和社交属性，这些属性对诸如Twitter之类的动态类型很重要。

著录项

作者
Al-Dhelaan, Mohammed.;
展开▼
作者单位

The George Washington University.;

展开▼
授予单位 The George Washington University.;
学科 Computer science.
学位 Ph.D.
年度 2014
页码 154 p.
总页数 154
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:53:28

相似文献

外文文献
中文文献
专利

1. Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method [J] . Yeom Hongseon, Ko Youngjoong, Seo Jungyun Computer speech and language . 2019,第NOVa期

机译：通过有效结合基于图的模型和改进的C值方法从单个文档中提取基于无监督学习的关键字
2. Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method [J] . Yeom Hongseon, Ko Youngjoong, Seo Jungyun Computer speech and language . 2019,第Nova期

机译：通过基于图形的模型的有效组合和改进的C值方法的无监督学习的基于学习的关键词提取
3. RankUp: Enhancing graph-based keyphrase extraction methods with error-feedback propagation [J] . Gerardo Figueroa, Po-Chi Chen, Yi-Shin Chen Computer speech and language . 2018,第JANa期

机译：RankUp：通过错误反馈传播增强基于图的关键字提取方法
4. Performance Analysis Graph-Based Keyphrase Extraction in Indonesia Scientific Paper [C] . Riris Bayu Asrori, Robert Setyawan, Muljono Muljono International Seminar on Application for Technology of Information and Communication . 2020

机译：印尼科学论文中基于性能分析图的关键词提取
5. Evaluation techniques and graph-based algorithms for automatic summarization and keyphrase extraction. [D] . Hamid, Fahmida. 2016

机译：自动汇总和关键短语提取的评估技术和基于图的算法。
6. Deep neural model with self-training for scientific keyphrase extraction [O] . Xun Zhu, Chen Lyu, Donghong Ji, 2020

机译：具有自我训练的深度神经模型用于科学关键训练
7. Automatic keyphrase extraction using graph-based methods [O] . Josiane Mothe, Faneva Ramiandrisoa, Michael Rasolomanana 2018

机译：使用基于图形的方法自动关键词提取

Graph-based Algorithms for Keyphrase Extraction in Social Text.

摘要

著录项

相似文献

相关主题

期刊订阅