首页> 外文期刊>Journal of Computer Science & Technology >L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises
【24h】

L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises

机译:L-树匹配:具有噪声的巨大文本流的新数据提取模型和算法

获取原文
获取原文并翻译 | 示例
       

摘要

In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructed in which model components are structurally correlated via a generalized template. Secondly, a database-populating mechanism is built, along with some object-manipulating operations needed for flexible database design, to support data extraction from huge text stream. Thirdly, top-down and bottom-up strategies are combined to design a new extraction algorithm that can extract data from data sources with optional, unordered, nested, and/or noisy components. Lastly, this method is applied to extract accurate data from biological documents amounting to 100GB for the first online integrated biological data warehouse of China.
机译:在本文中,提出了一种称为L树匹配的新方法,用于从复杂数据源中提取数据。首先,基于本文提出的数据提取逻辑,构建了一个新的数据提取模型,其中模型组件通过通用模板在结构上相关。其次,建立了一种数据库填充机制,以及灵活的数据库设计所需的一些对象操作操作,以支持从大量文本流中提取数据。第三,自上而下和自下而上的策略相结合,设计了一种新的提取算法,该算法可以从具有可选,无序,嵌套和/或嘈杂成分的数据源中提取数据。最后,该方法被用于从中国第一家在线综合生物数据仓库中提取100GB生物文件的准确数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号