【24h】

Incremental Sorting for Large Dynamic Data Sets

机译:大型动态数据集的增量排序

获取原文

摘要

In today's world of pervasive computing, it is straightforward for organizations to generate large amounts of data in support of a variety of business needs. For this reason, it is important to build tools that allow analysts to manage and investigate these data sets quickly and efficiently. One feature needed by these tools is the ability to sort large amounts of data along a number of dimensions to facilitate the search for useful information. In this paper, we describe a new method for incrementally sorting large, multi-dimensional, dynamic data sets. Our particular use case involves sorting large Twitter data sets but our technique can be applied more generally across a variety of data types. Our approach is evaluated with respect to its scalability and by comparing it to several alternatives. It is currently able to efficiently sort data sets consisting of tens of millions of tweets along a variety of dimensions even when the data set is under active collection and new tweets are being added each day. The approach incrementally integrates the new tweets and provides sorted views of all tweets along various dimensions without having to re-sort the previously sorted tweets. The paper presents the benefits of the technique, discusses its limitations, and describes its software engineering contributions.
机译:在当今的普适计算世界中,组织可以轻松生成大量数据以支持各种业务需求。因此,构建使分析人员能够快速有效地管理和调查这些数据集的工具非常重要。这些工具所需的功能之一是能够沿多个维度对大量数据进行分类以促进对有用信息的搜索。在本文中,我们描述了一种对大型,多维,动态数据集进行增量排序的新方法。我们的特定用例涉及对大型Twitter数据集进行排序,但是我们的技术可以更广泛地应用于各种数据类型。通过对我们的方法的可伸缩性进行评估,并将其与其他几种方案进行比较。当前,即使数据集处于活动收集状态并且每天都在添加新的tweet,它也能够有效地对包含数百万条tweet的数据集进行各种维度的排序。该方法渐进地集成了新推文,并提供了各个维度上所有推文的排序视图,而无需重新排序先前已排序的推文。本文介绍了该技术的好处,讨论了其局限性,并描述了其软件工程方面的贡献。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号