首页> 外文会议>European Semantic Web Symposium >Learning to Harvest Information for the Semantic Web
【24h】

Learning to Harvest Information for the Semantic Web

机译:学习收获语义网的信息

获取原文

摘要

In this paper we describe a methodology for harvesting in formation from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries) or a user-defined lexicon. Retrieved, information is then used to partially annotate documents. An notated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotation to annotate more documents that will be used to train more complex IE engines and so on. In this paper we describe the methodology and its implementation in the Armadillo system, compare it with the current state of the art, and describe the details of an implemented application. Finally we draw some conclusions and highlight some challenges and future work.
机译:在本文中,我们描述了一种从大型分布式存储库(例如大网站)的形成方法,具有最小的用户干预。该方法基于信息提取,信息集成和机器学习技术的组合。通过从结构化源(例如数据库和数字图书馆)或用户定义的Lexicon中提取信息来接种学习。检索,然后使用信息来部分注释文档。记录的文档用于引导学习,以便为简单的信息提取(即)方法,这反过来会产生更多的注释来注释更多的文档,这些文档将用于培训更复杂的IE引擎等。在本文中,我们描述了在犰狳系统中的方法和实现,将其与现有技术进行比较,并描述了实现应用的细节。最后,我们得出一些结论,突出了一些挑战和未来的工作。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号