首页> 外文会议>Implementation and application of automata >Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach
【24h】

Information Extraction from Semi-structured Resources: A Two-Phase Finite State Transducers Approach

机译:从半结构化资源中提取信息:两阶段有限状态传感器方法

获取原文
获取原文并翻译 | 示例

摘要

The paper presents a new method for extracting information from semi-structured resources, based on finite state transducers. The method has two clearly distinguished phases. The first phase - pre-processing phase -strongly relies upon the analysis of the document structure and it is used for locating records of data in the text. The second phase is based on the finite state transducers created for extracting information. The transducers can be modified so that preferred efficiency is achieved and can be reused for extracting information from other pre-processed documents. We conclude that even untagged text can be treated as a semi-structured one, providing its structure can be successfully pre-processed. As a result, we extracted data from free form encyclopedia text and created a fully structured database with genotype and phenotype characteristics of the organisms.
机译:本文提出了一种基于有限状态传感器从半结构化资源中提取信息的新方法。该方法具有两个明显不同的阶段。第一阶段-预处理阶段-强烈依赖于文档结构的分析,它用于查找文本中的数据记录。第二阶段基于为提取信息而创建的有限状态换能器。可以对换能器进行修改,以实现最佳效率,并且可以将其重新用于从其他预处理文档中提取信息。我们得出的结论是,即使未加标签的文本也可以被视为半结构化的文本,前提是其结构可以成功地进行预处理。结果,我们从自由形式的百科全书文本中提取了数据,并创建了具有生物体基因型和表型特征的完全结构化的数据库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号