首页> 外文会议>British National Conference on Databases(BNCOD 23) >The Lixto Project: Exploring New Frontiers of Web Data Extraction
【24h】

The Lixto Project: Exploring New Frontiers of Web Data Extraction

机译:LIXTO项目:探索网络数据提取的新边界

获取原文

摘要

The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction language and a tool to visually define extraction programs from sample Web pages, the scope of the project has been extended over time. Today, new issues such as employing learning algorithms for the definition of extraction programs, automatically extracting data from Web pages featuring a table-centric visual appearance, and extracting from alternative document formats such as PDF are being investigated.
机译:Lixto项目是Web数据提取领域的持续研究工作。然而,该项目最初开始开发基于逻辑的提取语言和工具来从样本网页视觉定义提取程序,因此项目的范围随着时间的推移而延长。如今,正在采用学习算法的新问题,用于定义提取程序,自动从网页中提取具有以表为中心的视觉外观的网页,并正在研究从诸如PDF的替代文档格式中提取。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号