A Bottom-up Approach of Web Data Extraction based on Entity Recognition and Integration

机译：基于实体识别和集成的自下而上的Web数据提取方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays, most popular methods for web data extraction (WDE) are top-down ones depending on structure. However, these techniques are not scalable enough when coming to complex pages. Consequently, we put forward a bottom-up approach for WDE based on entity recognition and integration to avoid over dependency to structure of web pages. The approach proposed focuses on primary text sequences labeling first and also gives consideration to repetitive patterns of them as well. We propose a Two-Level extraction model for entity recognition and repetitive pattern extraction algorithm for entity integration. Our approach can effectively reduce the attribute labeling mistakes. Also, we demonstrate our approach by scientifically experimental results. The conclusion is that our approach perform better than the traditional extraction techniques, especially on complex Web pages.

机译：如今，最流行的Web数据提取（WDE）方法是自上而下的方法，具体取决于结构。但是，这些技术在进入复杂页面时不够可伸缩。因此，我们提出了一种基于实体识别和集成的自下而上的WDE方法，以避免对Web页面结构的过度依赖。提出的方法首先关注主要文本序列的标签，并且还考虑了它们的重复模式。我们提出了用于实体识别的两级提取模型和用于实体集成的重复模式提取算法。我们的方法可以有效减少属性标记错误。此外，我们通过科学的实验结果证明了我们的方法。结论是，我们的方法比传统的提取技术性能更好，尤其是在复杂的网页上。

著录项

来源
《2011 Eighth Web Information Systems and Applications Conference》|2011年|150-155|共6页
会议地点 Chongqing(CN)
作者
Tong Liu; Derong Shen; Jing Shan; Tiezheng Nie; Yue Kou;
展开▼
作者单位

College of Information Science and Engineering Northeastern University Shenyang, China;

College of Information Science and Engineering Northeastern University Shenyang, China;

College of Information Science and Engineering Northeastern University Shenyang, China;

College of Information Science and Engineering Northeastern University Shenyang, China;

College of Information Science and Engineering Northeastern University Shenyang, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
web data extraction; entity recognition; entity integration; bottom-up;

机译：Web数据提取;实体识别;实体集成;自下而上;

相似文献

外文文献
中文文献
专利

1. River food webs: an integrative approach to bottom-up flow webs, top-down impact webs, and trophic position [J] . ARTHUR C. BENKE Ecology: A Publication of the Ecological Society of America . 2018,第6期

机译：河食品网：自下而上流动幅度，自上而下的冲击网和营养位置的综合方法
2. A Bottom-up, Knowledge-Aware Approach to Integrating and Querying Web Data Services [J] . SILVIA QUARTERONI, MARCO BRAMBILLA, STEFANO CERI ACM transactions on the web . 2013,第4期

机译：一种自底向上的知识感知方法，用于集成和查询Web数据服务
3. The Web-based Database Integration Approach: The Experiment of the Composite Approach to Integrate E-commerce Databases Using the Extensible Markup Language (XML) [J] . Sang Hyun Kim, Hee Jung Jung Asian Journal of Information Technology . 2005,第12期

机译：基于Web的数据库集成方法：使用可扩展标记语言（XML）集成电子商务数据库的组合方法的实验
4. A Bottom-up Approach of Web Data Extraction based on Entity Recognition and Integration [C] . Liu Tong, Shen Derong, Shan Jing, 2011 Eighth Web Information Systems and Applications Conference . 2011

机译：基于实体识别和集成的自下而上的Web数据提取方法
5. Using a named entity tagger and a syntactic parser to improve Web-based answer extraction [D] . Kamel, Yasser. 2004

机译：使用命名实体标记器和语法解析器来改进基于Web的答案提取
6. Correction: A linear classifier based on entity recognition tools and a statistical approach to method extraction in the protein-protein interaction literature [O] . Anália Lourenço, Michael Conover, Andrew Wong, 2012

机译：校正：一种基于实体识别工具的线性分类器以及一种蛋白质-蛋白质相互作用文献中的统计方法提取方法
7. A Bottom-Up Term Extraction Approach for Web-Based Translation in Chinese-English IR Systems [O] . Lu Chengye, Xu Yue, Geva Shlomo 2007

机译：汉英红外系统中基于网络的自底向上术语提取方法
8. Web-Scale Search-Based Data Extraction and Integration [R] . Chang, K. C., Shuck, T., Kabra, G. 2011

机译：基于Web规模搜索的数据提取与集成

A Bottom-up Approach of Web Data Extraction based on Entity Recognition and Integration

摘要

著录项

相似文献

相关主题

期刊订阅