L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises

Xu-Bin Deng; Yang-Yong Zhu

首页> 外文期刊>Journal of Computer Science & Technology >L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises

【24h】

L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises

机译：L-树匹配：具有噪声的巨大文本流的新数据提取模型和算法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, a new method, named as L-tree match, is presented for extracting data from complex data sources. Firstly, based on data extraction logic presented in this work, a new data extraction model is constructed in which model components are structurally correlated via a generalized template. Secondly, a database-populating mechanism is built, along with some object-manipulating operations needed for flexible database design, to support data extraction from huge text stream. Thirdly, top-down and bottom-up strategies are combined to design a new extraction algorithm that can extract data from data sources with optional, unordered, nested, and/or noisy components. Lastly, this method is applied to extract accurate data from biological documents amounting to 100GB for the first online integrated biological data warehouse of China.

机译：在本文中，提出了一种称为L树匹配的新方法，用于从复杂数据源中提取数据。首先，基于本文提出的数据提取逻辑，构建了一个新的数据提取模型，其中模型组件通过通用模板在结构上相关。其次，建立了一种数据库填充机制，以及灵活的数据库设计所需的一些对象操作操作，以支持从大量文本流中提取数据。第三，自上而下和自下而上的策略相结合，设计了一种新的提取算法，该算法可以从具有可选，无序，嵌套和/或嘈杂成分的数据源中提取数据。最后，该方法被用于从中国第一家在线综合生物数据仓库中提取100GB生物文件的准确数据。

著录项

来源
《Journal of Computer Science & Technology》 |2005年第6期|p.763-773|共11页
作者
Xu-Bin Deng; Yang-Yong Zhu;
展开▼
作者单位

Department of Computing and Information Technology, Fudan University, Shanghai 200433, P.R. China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
data extraction; data model; extraction algorithm; regular expression; wrapper;

机译：数据提取;数据模型;提取算法;正则表达式;包装器;
入库时间 2022-08-17 23:45:27

相似文献

外文文献
中文文献
专利

1. L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises [J] . Xu-Bin Deng, Yang-Yong Zhu 计算机科学技术学报（英文版） . 2005,第006期

机译：L-树匹配：具有噪声的巨大文本流的新数据提取模型和算法
2. Uncovering Research Streams in the Data Economy Using Text Mining Algorithms [J] . Can Azkan, Markus Spiekermann, Henry Goecke Technology Innovation Management Review . 2019,第11期

机译：使用文本挖掘算法发现数据经济中的研究流
3. Validation of the TOtal Visual acuity extraction Algorithm (TOVA) for automated extraction of visual acuity and intraocular pressure data from free text clinical records [J] . Baughman Doug, Lee Cecilia, Lee Aaron Y. Investigative ophthalmology & visual science . 2017,第8期

机译：从自由文本临床记录中验证可视敏锐度和眼内压力数据的自动提取敏锐提取算法（TOVA）
4. An Ensemble Classification Algorithm for Text Data Stream based on Feature Selection and Topic Model [C] . Zhongxin Wang, Jianqiao Liu, Gang Sun, IEEE International Conference on Artificial Intelligence and Computer Applications . 2020

机译：基于特征选择和主题模型的文本数据流集成分类算法
5. Algorithms for mobile robot localization and mapping, incorporating detailed noise modeling and multi-scale feature extraction. [D] . Pfister, Samuel T. 2006

机译：用于移动机器人定位和制图的算法，结合了详细的噪声建模和多尺度特征提取。
6. Validation of the Total Visual Acuity Extraction Algorithm (TOVA) for Automated Extraction of Visual Acuity Data From Free Text Unstructured Clinical Records [O] . Douglas M. Baughman, Grace L. Su, Irena Tsui, -1

机译：从自由文本非结构化临床记录中自动提取视敏度数据的总视敏度提取算法（TOVA）的验证
7. Information retrieval from huge texts based on data compression and its application to association mining : Data compression for capturing the characteristics in texts [O] . Shirou MARUYAMA, Hiroshi SAKAMOTO 2010

机译：根据数据压缩的巨大文本的信息检索及其在关联挖掘的应用：用于捕获文本中的特征的数据压缩
8. Distributed Computing for Signal Processing: Modeling of Asynchronous Parallel Computation. Appendix D. Analysis of MIMD (Multiple Instruction Streams, Multiple Data Streams) Algorithms: Features, Measurements, and Results [R] . Smith, K. D. 1984

机译：信号处理的分布式计算：异步并行计算的建模。附录D. mImD（多指令流，多数据流）算法的分析：特征，测量和结果

L-Tree Match: A New Data Extraction Model and Algorithm for Huge Text Stream with Noises

摘要

著录项

相似文献

相关主题

期刊订阅