Machine learning with naturally labeled data for identifying abbreviation definitions

Lana Yeganova; Donald C Comeau; W John Wilbur

首页> 外文期刊>BMC Bioinformatics >Machine learning with naturally labeled data for identifying abbreviation definitions

【24h】

Machine learning with naturally labeled data for identifying abbreviation definitions

机译：带有自然标记数据的机器学习，用于识别缩写定义

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

BackgroundThe rapid growth of biomedical literature requires accurate text analysis and text processing tools. Detecting abbreviations and identifying their definitions is an important component of such tools. Most existing approaches for the abbreviation definition identification task employ rule-based methods. While achieving high precision, rule-based methods are limited to the rules defined and fail to capture many uncommon definition patterns. Supervised learning techniques, which offer more flexibility in detecting abbreviation definitions, have also been applied to the problem. However, they require manually labeled training data.MethodsIn this work, we develop a machine learning algorithm for abbreviation definition identification in text which makes use of what we term naturally labeled data. Positive training examples are naturally occurring potential abbreviation-definition pairs in text. Negative training examples are generated by randomly mixing potential abbreviations with unrelated potential definitions. The machine learner is trained to distinguish between these two sets of examples. Then, the learned feature weights are used to identify the abbreviation full form. This approach does not require manually labeled training data.ResultsWe evaluate the performance of our algorithm on the Ab3P, BIOADI and Medstract corpora. Our system demonstrated results that compare favourably to the existing Ab3P and BIOADI systems. We achieve an F-measure of 91.36% on Ab3P corpus, and an F-measure of 87.13% on BIOADI corpus which are superior to the results reported by Ab3P and BIOADI systems. Moreover, we outperform these systems in terms of recall, which is one of our goals.

机译：背景技术生物医学文献的快速增长需要准确的文本分析和文本处理工具。检测缩写并确定其定义是此类工具的重要组成部分。缩写定义识别任务的大多数现有方法都采用基于规则的方法。在实现高精度的同时，基于规则的方法仅限于定义的规则，并且无法捕获许多不常见的定义模式。在检测缩写定义方面提供更大灵活性的监督学习技术也已应用于该问题。但是，它们需要手动标记的训练数据。方法在这项工作中，我们开发了一种机器学习算法来识别文本中的缩写定义，该算法利用了我们所谓的自然标记数据。正面训练示例是文本中自然产生的潜在缩写-定义对。通过将潜在的缩写与无关的潜在定义随机混合来生成负训练示例。训练机器学习器以区分这两组示例。然后，将学习到的特征权重用于识别完整的缩写形式。这种方法不需要手动标记训练数据。结果我们评估了算法在Ab3P，BIOADI和Medstract语料库上的性能。我们的系统显示出的结果与现有的Ab3P和BIOADI系统相比非常理想。我们对Ab3P语料库的F量度达到91.36％，对BIOADI语料库的F量度达到87.13％，这要优于Ab3P和BIOADI系统报告的结果。此外，就召回而言，我们的性能优于这些系统，这是我们的目标之一。

著录项

来源
《BMC Bioinformatics》 |2011年第3期|共页
作者
Lana Yeganova; Donald C Comeau; W John Wilbur;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. Machine learning with naturally labeled data for identifying abbreviation definitions [J] . Lana Yeganova, Donald C Comeau, W John Wilbur BMC Bioinformatics . 2011,第SUPPLEMENTa3期

机译：带有自然标记数据的机器学习，用于识别缩写定义
2. BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature [J] . Cheng-Ju Kuo, Maurice HT Ling, Kuan-Ting Lin, BMC Bioinformatics . 2009,第SUPPLEMENTa15期

机译：Biadi：一种机器学习方法，用于识别生物文学中的缩写和定义
3. An Open Combinatorial Diffraction Dataset Including Consensus Human and Machine Learning Labels with Quantified Uncertainty for Training New Machine Learning Models [J] . Jason R. Hattrick-Simpers, Brian DeCost, A. Gilad Kusne, Integrating Materials and Manufacturing Innovation . 2021,第2期

机译：开放组合衍射数据集包括共识人员和机器学习标签，具有量化的不确定性，用于培训新机器学习模型
4. Identifying Abbreviation Definitions Machine Learning with Naturally Labeled Data [C] . Yeganova Lana, Comeau Donald C., Wilbur W. John Ninth International Conference on Machine Learning and Applications . 2010

机译：使用自然标记的数据识别缩写定义机器学习
5. Identifying Students at Risk of Not Passing Introductory Physics Using Data Mining and Machine Learning [D] . McKeague-McFadden, Ikaika A. 2020

机译：识别使用数据挖掘和机器学习不通过介绍物理的风险
6. Machine learning with naturally labeled data for identifying abbreviation definitions [O] . Lana Yeganova, Donald C Comeau, W John Wilbur 2011

机译：带有自然标记数据的机器学习用于识别缩写定义
7. Machine learning with naturally labeled data for identifying abbreviation definitions [O] . Lana Yeganova, Donald C Comeau, W Wilbur 2011

机译：带有自然标记数据的机器学习，用于识别缩写定义

Machine learning with naturally labeled data for identifying abbreviation definitions

摘要

著录项

相似文献

相关主题

期刊订阅