A Language Modeling Approach for Acronym Expansion Disambiguation

机译：一种语言建模方法，用于缩写歧义歧义

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nonstandard words such as proper nouns, abbreviations, and acronyms are a major obstacle in natural language text processing and information retrieval. Acronyms, in particular, are difficult to read and process because they are often domain-specific with high degree of polysemy. In this paper, we propose a language modeling approach for the automatic disambiguation of acronym senses using context information. First, a dictionary of all possible expansions of acronyms is generated automatically. The dictionary is used to search for all possible expansions or senses to expand a given acronym. The extracted dictionary consists of about 17 thousands acronym-expansion pairs defining 1,829 expansions from different fields where the average number of expansions per acronym was 9.47. Training data is automatically collected from downloaded documents identified from the results of search engine queries. The collected data is used to build a unigram language model that models the context of each candidate expansion. At the in-context expansion prediction phase, the relevance of acronym expansion candidates is calculated based on the similarity between the context of each specific acronym occurrence and the language model of each candidate expansion. Unlike other work in the literature, our approach has the option to reject to expand an acronym if it is not confident on disambiguation. We have evaluated the performance of our language modeling approach and compared it with tf-idf discriminative approach.

机译：非标准单词，例如适当的名词，缩写和首字母缩略词是自然语言文本处理和信息检索中的主要障碍。特别是遗嘱难以阅读和过程，因为它们通常具有高度多义的域特定。在本文中，我们提出了一种使用上下文信息的自动歧义的语言建模方法。首先，自动生成所有可能扩字扩字的所有可能扩展的字典。字典用于搜索所有可能的扩展或感官以扩展给定的缩写。提取的词典由约17千万缩略词 - 扩展对组成，定义了来自不同领域的1,829个扩展，其中每首缩略词的平均扩展数为9.47。从从搜索引擎查询结果中标识的下载文档中自动收集训练数据。收集的数据用于构建模拟每个候选扩展的上下文的Unigram语言模型。在上下文扩展预测阶段，基于每个特定缩略语发生与每个候选扩展的语言模型之间的相似性来计算缩写扩展候选的相关性。与文献中的其他工作不同，我们的方法可以选择拒绝扩展缩略语，如果它对消歧并不充满信心。我们已经评估了语言建模方法的性能，并与TF-IDF鉴别方法进行了比较。

著录项

来源
《Annual International Conference on Computational Linguistics and Intelligent Text Processing》|2015年||共15页
会议地点
作者
Akram Gaballah Ahmed; Mohamed Farouk Abdel Hady; Emad Nabil; Amr Badr;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP312-53;
关键词
Word sense disambiguation; Information extraction; Language modeling;

机译：词感消歧;信息提取;语言建模;

相似文献

外文文献
中文文献
专利

1. Acronyms: identification, expansion and disambiguation [J] . Jacobs Kayla, Itai Alon, Wintner Shuly Annals of Mathematics and Artificial Intelligence . 2020,第5a6期

机译：首字母缩略词：识别，扩张和消歧
2. Acronym-Expansion Disambiguation for Intelligent Processing of Enterprise Information [J] . Hwang, Myunggwon, Jeong, Computing and informatics . 2015,第3期

机译：企业信息智能处理的首字母缩写词-扩展歧义消除
3. ACRONYM-EXPANSION DISAMBIGUATION FOR INTELLIGENT PROCESSING OF ENTERPRISE INFORMATION [J] . Myunggwon Hwang, Do-Heon Jeong, Jinhyung Kim, Computing and informatics . 2014,第3期

机译：智能扩展企业信息智能处理
4. A Language Modeling Approach for Acronym Expansion Disambiguation [C] . Akram Gaballah Ahmed, Mohamed Farouk Abdel Hady, Emad Nabil, International conference on intelligent text processing and computational linguistics . 2015

机译：首字母缩写词扩展消歧的语言建模方法
5. A probabilistic, integrative approach for improved natural language disambiguation. [D] . Chao, Gerald Che-Shun. 2003

机译：一种用于改善自然语言歧义消除的概率集成方法。
6. Fostering Multilinguality in the UMLS: A Computational Approach to Terminology Expansion for Multiple Languages [O] . Johannes Hellrich, Udo Hahn 2014

机译：在UMLS中促进多语种：一种多语言术语扩展的计算方法
7. Research on Query Disambiguation and Expansion for Cross-Language Information Retrieval [O] . Sadat Fatiha 2015

机译：跨语言信息检索的查询消歧与扩展研究
8. UCLA at TREC 2014 Clinical Decision Support Track: Exploring Language Models, Query Expansion, and Boosting. [R] . Garcia-Gathright, J. I., Meng, F., Hsu, W. 2014

机译：加州大学洛杉矶分校在TREC 2014临床决策支持轨道：探索语言模型，查询扩展和提升。

A Language Modeling Approach for Acronym Expansion Disambiguation

摘要

著录项

相似文献

相关主题

期刊订阅