...
首页> 外文期刊>SIGMOD record >Constant-Delay Enumeration for Nondeterministic Document Spanners
【24h】

Constant-Delay Enumeration for Nondeterministic Document Spanners

机译:非术语文档扳手的恒定延迟枚举

获取原文
获取原文并翻译 | 示例
           

摘要

One of the classical tasks in information extraction is to extract subparts of texts through regular expressions. In the database theory literature, this approach has been generalized and formalized as document spanners. In this model, extraction is performed by evaluating a particular kind of automata, called a sequential variable-set automaton (VA). The efficiency of this task is then measured in the context of enumeration algorithms: we first run a preprocessing phase computing a compact representation of the answers, and second we produce the results one after the other with a short time between consecutive answers, called the delay of the enumeration. Our goal is to have an algorithm that is tractable in combined complexity, i.e., in the sizes of the input document and the VA, while ensuring the best possible data complexity bounds in the input document size, i.e., a constant delay that does not depend on the document. We present such an algorithm for a variant of VAs called extended sequential VAs and give an experimental evaluation of this algorithm.This article is a shortened version of the conference article [4] published at ICDT'19, incorporating experimental results from the journal version [6] currently under review.
机译:信息提取中的一个经典任务是通过正则表达式提取文本的子部分。在数据库理论文献中,这种方法已被广泛化并形式化为文档跨度。在该模型中,通过评估特定种类的自动机,称为连续可变自动机(VA)来执行提取。然后在枚举算法的上下文中测量此任务的效率:我们首先运行一个预处理的阶段计算答案的紧凑型表示,并且第二我们在连续答案之间的短时间内产生一个接一个的结果,称为延迟。枚举。我们的目标是拥有一种算法,该算法在组合复杂性中,即在输入文档和VA的大小中,同时确保输入文档大小的最佳数据复杂性界限,即不依赖的恒定延迟在文件上。我们为称为扩展顺序VA的VAS变体提供了这种算法,并对这一算法进行了实验评估。这篇文章是会议文章缩短版本[4]在ICDT'19发布,纳入了日志版本的实验结果[4] 6]目前正在审查。

著录项

  • 来源
    《SIGMOD record》 |2020年第1期|25-32|共8页
  • 作者单位

    Inst Polytech Paris LTCI Telecom Paris Paris France;

    CNRS CRIStAL UMR 9189 Paris France|Inria Lille Lille France;

    CNRS CRIL Paris France|Univ Artois Arras France;

    Univ Bayreuth Bayreuth Germany;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号