【24h】

Structuring Domain-Specific Text Archives by Deriving a Probabilistic XML DTD

机译:通过推导概率XML DTD构建领域特定的文本档案

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Domain-specific documents often share an inherent, though undocumented structure. This structure should be made explicit to facilitate efficient, structure-based search in archives as well as information integration. Inferring a semantically structured XML DTD for an archive and subsequently transforming its texts into XML documents is a promising method to reach these objectives. Based on the KDD-driven DIAs-DEM framework, we propose a new method to derive an archive-specific structured XML document type definition (DTD). Our approach utilizes association rule discovery and sequence mining techniques to structure a previously derived flat, i.e. unstructured DTD. We introduce the notion of a probabilistic DTD that is derived by discovering associations among and frequent sequences of XML tags, respectively.
机译:特定于域的文档通常共享一个固有的,但没有文档的结构。应该明确指定此结构,以方便在档案中进行高效的,基于结构的搜索以及信息集成。推断档案的语义结构化XML DTD,然后将其文本转换为XML文档是实现这些目标的一种有前途的方法。基于KDD驱动的DIAs-DEM框架,我们提出了一种新方法来导出特定于档案的结构化XML文档类型定义(DTD)。我们的方法利用关联规则发现和序列挖掘技术来构造先前导出的平面,即非结构化DTD。我们介绍了概率DTD的概念,它是通过分别发现XML标签之间的关联和频繁序列而得出的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号