首页> 外文会议>International conference on very large data bases >On Repairing Structural Problems In Semi-structured Data
【24h】

On Repairing Structural Problems In Semi-structured Data

机译:修复半结构化数据的结构问题

获取原文

摘要

Semi-structured data such as XML are popular for data interchange and storage. However, many XML documents have improper nesting where open- and close-tags are unmatched. Since some semi-structured data (e.g., Latex) have a flexible grammar and since many XML documents lack an accompanying DTD or XSD, we focus on computing a syntactic repair via the edit distance. To solve this problem, we propose a dynamic programming algorithm which takes cubic time. While this algorithm is not scalable, well-formed substrings of the data can be pruned to enable faster computation. Unfortunately, there are still cases where the dynamic program could be very expensive; hence, we give branch-and-bound algorithms based on various combinations of two heuristics, called MinCost and MaxBenefit, that trade off between accuracy and efficiency. Finally, we experimentally demonstrate the performance of these algorithms on real data.
机译:XML等半结构化数据对于数据交换和存储是流行的。但是,许多XML文档具有不当嵌套,其中打开和关闭标签是无与伦比的。由于一些半结构化数据(例如,乳胶)具有灵活的语法,并且由于许多XML文档缺少伴随的DTD或XSD,我们专注于通过编辑距离计算句法修复。为了解决这个问题,我们提出了一种动态编程算法,该算法采用立方时间。虽然该算法不可缩放,但是可以修剪数据的良好形成的数据以实现更快的计算。不幸的是,动态程序可能非常昂贵的情况仍然存在;因此,我们基于两种启发式的各种组合,称为Mincost和MaxBenefit的各种组合给出分支和绑定算法,在准确性和效率之间进行折衷。最后,我们通过实验证明了这些算法对实际数据的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号