首页> 外文会议>International Conference on Practical Applications of Agents and Multiagent Systems >Kizomba: An Unsupervised Heuristic-Based Web Information Extractor
【24h】

Kizomba: An Unsupervised Heuristic-Based Web Information Extractor

机译:Kizomba:基于无监督的启发式网络信息提取器

获取原文

摘要

The Web is an ever growing repository of valuable information. That information lacks semantics since it is buried into web documents that are represented using HTML. Information extractors are software components that help software engineers in the task of extracting structured information from web documents. The problem that we face is how to devise information extractors that can extract information from current web sites with high precision and recall. Our proposal is unsupervised and heuristic-based, which makes it appropriate for the Web.
机译:该网站是越来越多的有价值信息的存储库。 该信息缺乏语义,因为它被埋入了使用HTML表示的Web文档。 信息提取器是帮助软件工程师在从Web文档中提取结构化信息的任务中的软件组件。 我们面临的问题是如何设计能够用高精度和召回从当前网站提取信息的信息提取器。 我们的提案是无监督的,其基于启发式的,这使得适合网络。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号