首页> 外文会议>Rough sets and knowledge technology >Cross Language Information Extraction Knowledge Adaptation
【24h】

Cross Language Information Extraction Knowledge Adaptation

机译:跨语言信息提取知识适应

获取原文
获取原文并翻译 | 示例

摘要

We propose a framework for adapting a previously learned wrapper from a source Web site to unseen sites which are written in different languages. The idea of our framework is to utilize the previously learned information extraction knowledge and the previously extracted or collected items in the source Web site. These knowledge and data are automatically translated to the same language as the unseen sites via online Web resources such as online Web dictionary or map. Multiple text mining methods are employed to automatically discover some machine labeled training examples in the unseen site. Both content oriented features and site dependent features of the machine labeled training examples are used for learning the new wrapper for the new unseen site using our language independent wrapper induction component. We conducted experiments on some real-world Web sites in different languages to demonstrate the effectiveness of our framework.
机译:我们提出了一个框架,用于将以前从源网站学到的包装器改编成用不同语言编写的看不见的网站。我们框架的想法是利用源网站中以前学习的信息提取知识以及以前提取或收集的项目。这些知识和数据将通过在线Web词典或地图等在线Web资源自动翻译成与看不见的站点相同的语言。采用了多种文本挖掘方法来自动发现看不见的站点中某些机器标记的培训示例。带有机器标记的培训示例的面向内容的功能和与站点有关的功能都用于使用我们的独立于语言的包装器归纳组件为新的看不见的站点学习新的包装器。我们在一些使用不同语言的现实世界网站上进行了实验,以证明我们框架的有效性。

著录项

  • 来源
  • 会议地点 Gold Coast(AU);Gold Coast(AU)
  • 作者单位

    Department of Computer Science and Engineering,The Chinese University of Hong Kong, Shatin, Hong Kong;

    Department of Computer Science, City University of Hong Kong,83 Tat Chee Avenue, Kowloon, Hong Kong;

    Department of Systems Engineering and Engineering Management,The Chinese University of Hong Kong, Shatin, Hong Kong;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 程序设计、软件工程;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号