A Language Modeling Approach for Acronym Expansion Disambiguation

机译：首字母缩写词扩展消歧的语言建模方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nonstandard words such as proper nouns, abbreviations, and acronyms are a major obstacle in natural language text processing and information retrieval. Acronyms, in particular, are difficult to read and process because they are often domain-specific with high degree of polysemy. In this paper, we propose a language modeling approach for the automatic disambiguation of acronym senses using context information. First, a dictionary of all possible expansions of acronyms is generated automatically. The dictionary is used to search for all possible expansions or senses to expand a given acronym. The extracted dictionary consists of about 17 thousands acronym-expansion pairs defining 1,829 expansions from different fields where the average number of expansions per acronym was 9.47. Training data is automatically collected from downloaded documents identified from the results of search engine queries. The collected data is used to build a uni-gram language model that models the context of each candidate expansion. At the in-context expansion prediction phase, the relevance of acronym expansion candidates is calculated based on the similarity between the context of each specific acronym occurrence and the language model of each candidate expansion. Unlike other work in the literature, our approach has the option to reject to expand an acronym if it is not confident on disambiguation. We have evaluated the performance of our language modeling approach and compared it with tf-idf discriminative approach.

机译：非标准词（例如专有名词，缩写词和首字母缩写词）是自然语言文本处理和信息检索的主要障碍。特别是，首字母缩略词难以阅读和处理，因为它们通常是特定领域的，具有高度的多义性。在本文中，我们提出了一种使用上下文信息自动消除首字母缩写词义的语言建模方法。首先，自动生成首字母缩写词所有可能扩展的字典。该词典用于搜索所有可能的扩展名或含义以扩展给定的首字母缩写词。提取的字典包含约17,000个首字母缩写词-扩展对，它们定义了来自不同字段的1,829个扩展，其中每个首字母缩写词的平均扩展数为9.47。训练数据是从从搜索引擎查询结果中识别出的下载文档中自动收集的。收集到的数据用于建立一个unigram语言模型，该模型对每个候选扩展的上下文进行建模。在上下文扩展预测阶段，基于每个特定首字母缩写词出现的上下文与每个候选扩展词的语言模型之间的相似度来计算首字母缩写词扩展候选者的相关性。与文献中的其他工作不同，如果对缩歧没有信心，我们的方法可以选择拒绝扩展首字母缩写词。我们评估了语言建模方法的性能，并将其与tf-idf判别方法进行了比较。

著录项

来源
《International conference on intelligent text processing and computational linguistics》|2015年|264-278|共15页
会议地点
作者
Akram Gaballah Ahmed; Mohamed Farouk Abdel Hady; Emad Nabil; Amr Badr;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
word sense disambiguation; information extraction; language modeling;

机译：词义消歧;信息提取;语言建模;

相似文献

外文文献
中文文献
专利

1. Acronyms: identification, expansion and disambiguation [J] . Jacobs Kayla, Itai Alon, Wintner Shuly Annals of Mathematics and Artificial Intelligence . 2020,第5a6期

机译：首字母缩略词：识别，扩张和消歧
2. Acronym-Expansion Disambiguation for Intelligent Processing of Enterprise Information [J] . Hwang, Myunggwon, Jeong, Computing and informatics . 2015,第3期

机译：企业信息智能处理的首字母缩写词-扩展歧义消除
3. ACRONYM-EXPANSION DISAMBIGUATION FOR INTELLIGENT PROCESSING OF ENTERPRISE INFORMATION [J] . Myunggwon Hwang, Do-Heon Jeong, Jinhyung Kim, Computing and informatics . 2014,第3期

机译：智能扩展企业信息智能处理
4. A Language Modeling Approach for Acronym Expansion Disambiguation [C] . Akram Gaballah Ahmed, Mohamed Farouk Abdel Hady, Emad Nabil, Annual International Conference on Computational Linguistics and Intelligent Text Processing . 2015

机译：一种语言建模方法，用于缩写歧义歧义
5. A probabilistic, integrative approach for improved natural language disambiguation. [D] . Chao, Gerald Che-Shun. 2003

机译：一种用于改善自然语言歧义消除的概率集成方法。
6. Fostering Multilinguality in the UMLS: A Computational Approach to Terminology Expansion for Multiple Languages [O] . Johannes Hellrich, Udo Hahn 2014

机译：在UMLS中促进多语种：一种多语言术语扩展的计算方法
7. Research on Query Disambiguation and Expansion for Cross-Language Information Retrieval [O] . Sadat Fatiha 2015

机译：跨语言信息检索的查询消歧与扩展研究
8. UCLA at TREC 2014 Clinical Decision Support Track: Exploring Language Models, Query Expansion, and Boosting. [R] . Garcia-Gathright, J. I., Meng, F., Hsu, W. 2014

机译：加州大学洛杉矶分校在TREC 2014临床决策支持轨道：探索语言模型，查询扩展和提升。

A Language Modeling Approach for Acronym Expansion Disambiguation

摘要

著录项

相似文献

相关主题

期刊订阅