首页> 外文会议>ACM CIKM workshop on web information and data management 2009 >Post Processing Wrapper Generated Tables for Labeling Anonymous Datasets
【24h】

Post Processing Wrapper Generated Tables for Labeling Anonymous Datasets

机译:后处理包装程序生成的表用于标记匿名数据集

获取原文
获取原文并翻译 | 示例

摘要

A large number of wrappers generate tables without column names for human consumption because the meaning of the columns are apparent from the context and easy for humans to understand, but in emerging applications, labels are needed for autonomous assignment and schema mapping where machine try to understand the tables. Autonomous label assignment is critical in volume data processing where ad hoc mediation, extraction and querying is involved. We propose an algorithm Lads for Labeling Anonymous Datasets, which can holistically label tabular web document. The algorithm has been tested on anonymous datasets from a number of sites, e.g music, movie, political, demographic, athletic obtained through different search engines such as Google, Yahoo and MSN. The comparative probabilities of attributes being candidate labels are presented which seem to be very promising, achieved as high as 93% probability of assigning good label to anonymous attribute. To the best of our knowledge, this is the first of its kind for label assignment based on multiple search engines' recommendation.
机译:大量包装器生成的表没有供人使用的列名,因为这些列的含义从上下文中显而易见并且易于理解,但是在新兴的应用程序中,需要使用标签来进行机器试图理解的自主分配和模式映射桌子。在涉及临时调解,提取和查询的卷数据处理中,自主标签分配至关重要。我们提出了一种用于标记匿名数据集的Lad算法,该算法可以从整体上标记表格Web文档。该算法已通过许多网站的匿名数据集进行了测试,这些数据集是通过不同的搜索引擎(例如Google,Yahoo和MSN)获得的,例如音乐,电影,政治,人口,体育。提出了作为候选标签的属性的比较概率,这似乎非常有前途,达到将良好标签分配给匿名属性的概率高达93%。据我们所知,这是首次基于多个搜索引擎的推荐进行标签分配。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号