What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation

机译：这个首字母缩略词是什么意思？引入新数据集以进行缩写识别和歧义

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Acronyms are the short forms of phrases that facilitate conveying lengthy sentences in documents and serve as one of the mainstays of writing. Due to their importance, identifying acronyms and corresponding phrases (i.e., acronym identification (AI)) and finding the correct meaning of each acronym (i.e., acronym disambiguation (AD)) are crucial for text understanding. Despite the recent progress on this task, there are some limitations in the existing datasets which hinder further improvement. More specifically, limited size of manually annotated AI datasets or noises in the automatically created acronym identification datasets obstruct designing advanced high-performing acronym identification models. Moreover, the existing datasets are mostly limited to the medical domain and ignore other domains. In order to address these two limitations, we first create a manually annotated large AI dataset for scientific domain. This dataset contains 17,506 sentences which is substantially larger than previous scientific AI datasets. Next, we prepare an AD dataset for scientific domain with 62,441 samples which is significantly larger than previous scientific AD dataset. Our experiments show that the existing state-of-the-art models fall far behind human-level performance on both datasets proposed by this work. In addition, we propose a new deep learning model which utilizes the syntactical structure of the sentence to expand an ambiguous acronym in a sentence. The proposed model outperforms the state-of-the-art models on the new AD dataset, providing a strong baseline for future research on this dataset .

机译：首字母缩略词是简短的短语形式，便于在文件中传送冗长的句子并作为写作的主要句子。由于它们的重要性，识别缩略语和相应的短语（即，缩写识别（AI））并找到每个首字母缩略词的正确含义（即，首字母缩略词歧义（AD））对于文本了解至关重要。尽管最近对此任务进行了进展，但现有数据集有一些限制，这些数据集会妨碍进一步改进。更具体地说，在自动创建的缩写识别数据集中，在自动创建的缩写识别数据集中有限的手动注释的AI数据集或噪声妨碍了设计先进的高性能缩写识别模型。此外，现有数据集主要限于医疗领域并忽略其他域。为了解决这两个限制，我们首先为科学域创建一个手动注释的大型AI数据集。此数据集包含17,506个句子，它比以前的科学AI数据集大得多。接下来，我们为科学域准备一个带有62,441个样本的广告数据集，该样本明显大于以前的科学广告数据集。我们的实验表明，现有的最先进模型远远落后于这项工作提出的两个数据集的人力级别表现。此外，我们提出了一种新的深度学习模式，它利用句子的句法结构来扩展句子中的含糊不清的缩写。拟议的模型在新广告数据集上占据了最先进的模型，为此数据集进行了未来的研究，为未来的研究提供了强大的基准。

著录项

来源
《International Conference on Computational Linguistics》|2020年|3285-3301|共17页
会议地点
作者
Amir Pouran Ben Veyseh; Franck Dernoncourt; Quan Hung Tran; Thien Huu Nguyen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Acronyms: identification, expansion and disambiguation [J] . Jacobs Kayla, Itai Alon, Wintner Shuly Annals of Mathematics and Artificial Intelligence . 2020,第5a6期

机译：首字母缩略词：识别，扩张和消歧
2. Heuristics for identification of acronym-definition patterns within text: towards an automated construction of comprehensive acronym-definition dictionaries. [J] . Wren JD, Garner HR Methods of information in medicine . 2002,第5期

机译：用于识别文本中的首字母缩写定义模式的启发式方法：朝着自动构建全面的首字母缩写定义字典的方向发展。
3. Acronym-Expansion Disambiguation for Intelligent Processing of Enterprise Information [J] . Hwang, Myunggwon, Jeong, Computing and informatics . 2015,第3期

机译：企业信息智能处理的首字母缩写词-扩展歧义消除
4. A method for automatic detection of acronyms in texts and building a dataset for acronym disambiguation [C] . Sasan Azimi, Hadi Veisi, Reyhaneh Amouie Iranian Conference on Signal Processing and Intelligent Systems . 2019

机译：一种自动检测文本中的首字母缩略词并为首字母缩略词歧义消除建立数据集的方法
5. Automatic word sense disambiguation of acronyms and abbreviations in clinical texts [D] . Moon, Sungrim 2012

机译：临床文本中首字母缩写词和缩写词的自动词义消除
6. Challenges and Practical Approaches with Word Sense Disambiguation of Acronyms and Abbreviations in the Clinical Domain [O] . Sungrim Moon, Bridget McInnes, Genevieve B. Melton 2015

机译：临床领域中首字母缩写词和缩写词的词义歧义的挑战和实践方法
7. Guess Me if You Can: Acronym Disambiguation for Enterprises [O] . Yang Li, Bo Zhao, Ariel Fuxman, 2018

机译：猜猜我如果你能：首字母缩略词歧义企业

What Does This Acronym Mean? Introducing a New Dataset for Acronym Identification and Disambiguation

摘要

著录项

相似文献

相关主题

期刊订阅