...
首页> 外文期刊>Journal of Integrative Bioinformatics >Putting Encyclopaedia Knowledge into Structural Form: Finite State Transducers Approach
【24h】

Putting Encyclopaedia Knowledge into Structural Form: Finite State Transducers Approach

机译:将百科全书知识转变为结构形式:有限状态转换器方法

获取原文

摘要

In biology and functional genomics in particular, understanding the dependence and interplay between different genome and ecological characteristics of organisms is a very challenging problem. There are some public databases which combine this kind of information, but there is still much more information about microbes and other organisms that reside in unstructured and semi-structured documents, such as encyclopaedias. In this paper we present a method for extracting information from semi-structured resources, such as encyclopaedias, based on finite state transducers, consisting of two clearly distinguished phases. The first phase strongly relies on the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting the data, which can be modified so as to achieve the preferred efficiency and it is used for extracting the particular characteristic from the text. We show how the two phase method is applied to the text of the encyclopaedia “Systematic Bacteriology”. A fully structured database with genotype and phenotype characteristics of organisms has been created from the encyclopaedia unstructured descriptions.
机译:特别是在生物学和功能基因组学中,了解生物的不同基因组和生态特征之间的依赖性和相互作用是一个非常具有挑战性的问题。有一些公共数据库结合了这类信息,但是仍然存在更多有关微生物和其他生物的信息,这些信息存在于非结构化和半结构化文档中,例如百科全书。在本文中,我们提出了一种基于有限状态换能器的,从半结构化资源(例如百科全书)中提取信息的方法,该有限状态换能器包括两个明显不同的阶段。第一阶段强烈依赖于文档结构的分析,并且用于查找文本中的数据记录。第二阶段基于为提取数据而创建的有限状态换能器,可以对其进行修改以实现最佳效率,并将其用于从文本中提取特定特征。我们将展示两阶段方法如何应用于百科全书“系统细菌学”。已从百科全书的非结构化描述中创建了具有生物体基因型和表型特征的完全结构化的数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号