首页> 外文期刊>BMC Bioinformatics >PASBio: predicate-argument structures for event extraction in molecular biology
【24h】

PASBio: predicate-argument structures for event extraction in molecular biology

机译:PASBio:分子生物学中事件提取的谓词参数结构

获取原文
           

摘要

Background The exploitation of information extraction (IE), a technology aiming to provide instances of structured representations from free-form text, has been rapidly growing within the molecular biology (MB) research community to keep track of the latest results reported in literature. IE systems have traditionally used shallow syntactic patterns for matching facts in sentences but such approaches appear inadequate to achieve high accuracy in MB event extraction due to complex sentence structure. A consensus in the IE community is emerging on the necessity for exploiting deeper knowledge structures such as through the relations between a verb and its arguments shown by predicate-argument structure (PAS). PAS is of interest as structures typically correspond to events of interest and their participating entities. For this to be realized within IE a key knowledge component is the definition of PAS frames. PAS frames for non-technical domains such as newswire are already being constructed in several projects such as PropBank, VerbNet, and FrameNet. Knowledge from PAS should enable more accurate applications in several areas where sentence understanding is required like machine translation and text summarization. In this article, we explore the need to adapt PAS for the MB domain and specify PAS frames to support IE, as well as outlining the major issues that require consideration in their construction. Results We introduce PASBio by extending a model based on PropBank to the MB domain. The hypothesis we explore is that PAS holds the key for understanding relationships describing the roles of genes and gene products in mediating their biological functions. We chose predicates describing gene expression, molecular interactions and signal transduction events with the aim of covering a number of research areas in MB. Analysis was performed on sentences containing a set of verbal predicates from MEDLINE and full text journals. Results confirm the necessity to analyze PAS specifically for MB domain. Conclusions At present PASBio contains the analyzed PAS of over 30 verbs, publicly available on the Internet for use in advanced applications. In the future we aim to expand the knowledge base to cover more verbs and the nominal form of each predicate.
机译:背景技术信息提取(IE)的开发旨在在自由形式的文本中提供结构化表示的实例,在分子生物学(MB)研究社区中一直在迅速发展,以跟踪文献中报道的最新结果。 IE系统传统上使用浅句法模式来匹配句子中的事实,但是由于复杂的句子结构,此类方法似乎不足以实现MB事件提取中的高精度。在IE社区中,关于利用更深层次的知识结构的必要性正在形成共识,例如通过动词与其谓词自变量结构(PAS)所表示的自变量之间的关系。由于结构通常与感兴趣的事件及其参与实体相对应,因此,PAS受到关注。为了在IE中实现这一点,关键的知识部分是PAS帧的定义。用于非技术领域(例如新闻专线)的PAS框架已经在PropBank,VerbNet和FrameNet等多个项目中构建。来自PAS的知识应能在需要句子理解的多个领域(例如机器翻译和文本摘要)中实现更准确的应用。在本文中,我们探讨了针对MB域调整PAS的需求,并指定了支持IE的PAS框架,并概述了在构建它们时需要考虑的主要问题。结果我们通过将基于PropBank的模型扩展到MB域来引入PASBio。我们探索的假设是PAS是理解关系的关键,描述关系描述基因和基因产物在介导其生物学功能中的作用。我们选择了描述基因表达,分子相互作用和信号转导事件的谓语,以涵盖MB的许多研究领域。对包含MEDLINE和全文期刊中的一系列语言谓词的句子进行分析。结果证实了分析专用于MB域的PAS的必要性。结论目前,PASBio包含分析过的超过30个动词的PAS,这些动词可以从Internet上公开获得,以用于高级应用程序。将来,我们旨在扩展知识库,以涵盖更多谓​​词和每个谓词的名词形式。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号