From Text Classification to Keyphrase Extraction for Short Text

机译：从文本分类到短文本的关键词提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Existing keyphrase extraction approaches often suffer from issues such as the sparsity and brevity of short text (e.g., headlines, queries, and tweets). In this paper, we propose a novel keyphrase extraction method for short text by utilizing recurrent neural networks. The main idea behind our approach is to classify short text into a relevant class or category and extract keyphrases from important words in the class or category. Unlike previous supervised approaches that need the information of annotated keyphrases, our approach requires only a text classification dataset (i.e., DBpedia), which is easier to use and requires less human effort. In our approach, we first feed short text into the attention-based neural network for text classification. We then compute attention weights of each word in input short text. Subsequently, we detect keyphrase candidates by chunking phrases and summing the attention weights of compositional words in the chunked phrase. The experimental results clearly show the efficacy of our approach on real-world datasets, such as headlines, queries, and tweets. The proposed method outperforms the Microsoft Cognitive Services and IBM Watson Natural Language Understanding service for keyphrase extraction in terms of F1-score and acceptable percentage on the NYT and Question datasets. Further, we confirm that the proposed method is comparable to supervised methods for keyphrase extraction from short text in the Tweet dataset.

机译：现有的关键短语提取方法经常遭受诸如短文本的稀疏性和简短性（例如标题，查询和推文）之类的问题的困扰。在本文中，我们提出了一种利用递归神经网络的短文本关键词提取方法。我们方法的主要思想是将短文本分类为相关的类或类别，并从该类或类别中的重要单词中提取关键词。与以前的需要注释的短语信息的监督方法不同，我们的方法仅需要文本分类数据集（即DBpedia），它更易于使用且需要更少的人工。在我们的方法中，我们首先将短文本输入基于注意力的神经网络中进行文本分类。然后，我们计算输入短文本中每个单词的注意力权重。随后，我们通过对短语进行分块并对分词短语中的构词的注意力权重求和，来检测候选关键短语。实验结果清楚地表明了我们的方法对真实数据集（例如标题，查询和推文）的有效性。对于F1分数以及NYT和Question数据集上可接受的百分比，所提出的方法优于Microsoft认知服务和IBM Watson自然语言理解服务的关键短语提取。此外，我们确认，所提出的方法与从Tweet数据集中的短文本中提取关键短语的监督方法具有可比性。

著录项

来源
《IEEE International Conference on Big Data》|2019年|1137-1142|共6页
会议地点
作者
Song-Eun Lee; Kang-Min Kim; Woo-Jong Ryu; Jemin Park; SangKeun Lee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Task analysis; Tools; Mathematical model; Recurrent neural networks; Tagging; Twitter;

机译：任务分析;工具;数学模型;递归神经网络;标记; Twitter;

相似文献

外文文献
中文文献
专利

1. TOP-Rank: A TopicalPostionRank for Extraction and Classification of Keyphrases in Text [J] . Mubashar Nazar Awan, Mirza Omer Beg Computer speech and language . 2021,第Jana期

机译：排名：用于提取和分类文本中关键词的提取和分类
2. Short text keyphrase extraction with hypergraphs [J] . Abdelghani Bellaachia, Mohammed Al-Dhelaan Progress in Artificial Intelligence . 2015,第2期

机译：带有超图的短文本关键词提取
3. Deep Text Mining for Automatic Keyphrase Extraction from Text Documents [J] . Muhammad Abulaish, Jahiruddin, Lipika Dey Journal of Intelligent Systems . 2011,第4期

机译：深度文本挖掘，用于从文本文档中自动提取关键词
4. From Text Classification to Keyphrase Extraction for Short Text [C] . Song-Eun Lee, Kang-Min Kim, Woo-Jong Ryu, IEEE International Conference on Big Data . 2019

机译：从文本分类到短文本的关键词提取
5. Graph-based Algorithms for Keyphrase Extraction in Social Text. [D] . Al-Dhelaan, Mohammed. 2014

机译：基于图的社交文本中关键词提取算法。
6. Automated Extraction and Classification of Cancer Stage Mentions fromUnstructured Text Fields in a Central Cancer Registry [O] . Abdulrahman K. AAlAbdulsalam, Jennifer H. Garvin, Andrew Redd, 2018

机译：从中央癌症登记处非结构化文本字段中自动提取和分类癌症分期说明
7. Arabic Language Processing for Text Classification. Contributions to Arabic Root Extraction Techniques, Building An Arabic Corpus, and to Arabic Text Classification Techniques. [O] . Al-Nashashibi May Yacoub Adib 2012

机译：用于文本分类的阿拉伯语言处理。对阿拉伯语根提取技术，建立阿拉伯语语料库和阿拉伯文本分类技术的贡献。

From Text Classification to Keyphrase Extraction for Short Text

摘要

著录项

相似文献

相关主题

期刊订阅