首页> 外文期刊>ACM transactions on Asian language information processing >Two-Phase Learning for Biological Event Extraction and Verification
【24h】

Two-Phase Learning for Biological Event Extraction and Verification

机译:生物事件提取和验证的两阶段学习

获取原文
获取原文并翻译 | 示例
       

摘要

Many previous biological event-extraction systems were based on hand-crafted rules which were specifically tuned to a specific biological application domain. But manually constructing and tuning the rules are time-consuming processes and make the systems less portable. So supervised machine-learning methods were developed to generate the extraction rules automatically, but accepting the trade-off between precision and recall (high recall with low precision, and vice versa) is a barrier to improving performance. To make matters worse, a text in the biological domain is more complex because it often contains more than two biological events in a sentence, and one event in a noun chunk can be an entity for the other event. As a result, there are as yet no systems that give a good performance in extracting events in biological domains by using supervised machine learning. To overcome the limitations of previous systems and the complexity of biological texts, we present the following new ideas. First, we adopted a supervised machine-learning method to reduce the human effort in making extraction rules in order to obtain a highly domain-portable system. Second, we overcame the classical trade-off between precision and recall by using an event component verification method. Thus, machine learning occurs in two phases in our architecture. In the first phase, the system focuses on improving recall in extracting events between biological entities during a supervised machine-learning period. After extracting the biological events with automatically learned rules, in the second phase the system removes incorrect biological events by verifying the extracted event components with a maximum entropy (ME) classification method. In other words, the system targets for high recall in the first phase and tries to achieve high precision with a classifier in the second phase. Finally, we improved a supervised machine-learning algorithm so that it could learn a rule in a noun chunk and a rule extending throughout a sentence at two different levels, separately, for nested biological events.
机译:以前的许多生物事件提取系统都是基于手工制定的规则,这些规则专门针对特定的生物应用领域进行了调整。但是,手动构建和调整规则是耗时的过程,并使系统的便携性降低。因此,开发了监督式机器学习方法来自动生成提取规则,但是接受精度和查全率之间的权衡(低查全率和高查全率,反之亦然)是提高性能的障碍。更糟的是,生物领域中的文本更为复杂,因为它在一个句子中通常包含两个以上的生物事件,名词块中的一个事件可以是另一事件的实体。结果,目前尚没有通过监督式机器学习在生物领域提取事件方面具有良好性能的系统。为了克服先前系统的局限性和生物文本的复杂性,我们提出以下新思想。首先,我们采用了一种有监督的机器学习方法,以减少制定提取规则所需的人力,从而获得一个高度域可移植的系统。其次,我们使用事件组件验证方法克服了精度和召回率之间的经典权衡。因此,机器学习在我们的体系结构中分为两个阶段。在第一阶段,该系统着重于提高在有监督的机器学习期间提取生物实体之间事件的回想力。在使用自动学习的规则提取生物事件之后,在第二阶段,系统通过使用最大熵(ME)分类方法验证提取的事件成分,从而删除不正确的生物事件。换句话说,该系统在第一阶段的目标是高召回率,并在第二阶段尝试使用分类器实现高精度。最后,我们改进了一种有监督的机器学习算法,以便它可以为嵌套的生物事件分别学习名词块中的规则和贯穿两个不同级别的整个句子中的规则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号