首页> 外文会议>International conference on data engineering and internet technology;DEIT 2011 >XBeGene: Scalable XML Documents Generator by Example Based on Real Data
【24h】

XBeGene: Scalable XML Documents Generator by Example Based on Real Data

机译:XBeGene:基于实际数据的可伸缩XML文档生成器示例

获取原文

摘要

XML datasets of various sizes and properties are needed to evaluate the correctness and efficiency of XML-based algorithms and applications. While several downloadable datasets can be found online, these are predefined by system experts and might not be suitable to evaluate every algorithm. Tools for generating synthetic XML documents underline an alternative solution, promoting flexibility and adaptability in generating synthetic document collections. Nonetheless, the usefulness of existing XML generators remains rather limited due to the restricted levels of expressiveness allowed to users. In this paper, we develop a novel XML By example Generator (XBeGene) for producing synthetic XML data which closely reflect the user's requirements. Inspired by the query-by-example paradigm in information retrieval, Our generator system i) allows the user to provide her own sample XML documents as input, ii) analyzes the structure, occurrence frequencies, and content distributions for each XML element in the user input documents, and iii) produces synthetic XML documents which closely concur, in both structural and content features, to the user's input data. The size of each synthetic document as well as that of the entire document collection are also specified by the user. Clustering experiments demonstrate high correlation levels between the specified user requirements and the characteristics of the generated XML data, while timing results confirm our approach's scalability to large scale document collections
机译:需要各种大小和属性的XML数据集来评估基于XML的算法和应用程序的正确性和效率。尽管可以在线找到几个可下载的数据集,但是这些数据集是系统专家预先定义的,可能不适合评估每种算法。生成合成XML文档的工具强调了一种替代解决方案,可提高生成合成文档集合的灵活性和适应性。尽管如此,由于允许用户使用的表达水平受到限制,因此现有XML生成器的实用性仍然受到很大限制。在本文中,我们开发了一种新颖的XML示例生成器(XBeGene),用于生成能够紧密反映用户需求的合成XML数据。受信息检索中的示例查询范式启发,我们的生成器系统i)允许用户提供自己的示例XML文档作为输入,ii)分析用户中每个XML元素的结构,出现频率和内容分布输入文档,并且iii)生成合成XML文档,该XML文档在结构和内容方面都与用户的输入数据非常一致。用户还可以指定每个合成文档的大小以及整个文档集合的大小。集群实验表明,指定的用户需求与所生成的XML数据的特征之间具有高度相关性,而时序结果证实了我们的方法对大规模文档收集的可扩展性

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号