首页> 外文学位 >Stream-Dashboard: A big data stream clustering framework with applications to social media streams.

【24h】

Stream-Dashboard: A big data stream clustering framework with applications to social media streams.

机译：Stream-Dashboard：一个大数据流集群框架，其应用程序适用于社交媒体流。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data mining is concerned with detecting patterns of data in raw datasets, which are then used to unearth knowledge that might not have been discovered using conventional querying or statistical methods. This discovered knowledge has been used to empower decision makers in countless applications spanning across many multi-disciplinary areas including business, education, astronomy, security and Information Retrieval to name a few. Many applications generate massive amounts of data continuously and at an increasing rate. This is the case for user activity over social networks such as Facebook and Twitter. This flow of data has been termed, appropriately, a Data Stream, and it introduced a set of new challenges to discover its evolving patterns using data mining techniques. Data stream clustering is concerned with detecting evolving patterns in a data stream using only the similarities between the data points as they arrive without the use of any external information (i.e. unsupervised learning).;In this dissertation, we propose a complete and generic framework to simultaneously mine, track and validate clusters in a big data stream (Stream-Dashboard). The proposed framework consists of three main components: an online data stream clustering algorithm, a component for tracking and validation of pattern behavior using regression analysis, and a component that uses the behavioral information about the detected patterns to improve the quality of the clustering algorithm. As a first component, we propose RINO-Streams, an online clustering algorithm that incrementally updates the clustering model using robust statistics and incremental optimization. The second component is a methodology that we call TRACER, which continuously performs a set of statistical tests using regression analysis to track the evolution of the detected clusters, their characteristics and quality metrics. For the last component, we propose a method to build some behavioral profiles for the clustering model over time, that can be used to improve the performance of the online clustering algorithm, such as adapting the initial values of the input parameters.;The performance and effectiveness of the proposed framework were validated using extensive experiments, and its use was demonstrated on a challenging real word application, specifically unsupervised mining of evolving cluster stories in one pass from the Twitter social media streams.

机译：数据挖掘与检测原始数据集中的数据模式有关，然后将其用于发掘使用常规查询或统计方法可能未发现的知识。这种发现的知识已被用于增强决策者在众多跨学科领域（包括商业，教育，天文学，安全性和信息检索）中的众多应用中的决策者的能力。许多应用程序连续不断地生成大量数据，并且速度越来越快。在诸如Facebook和Twitter之类的社交网络上的用户活动就是这种情况。这种数据流被适当地称为“数据流”，它引入了一系列新挑战，以利用数据挖掘技术发现其不断发展的模式。数据流聚类涉及仅使用数据点到达时的相似性来检测数据流中的演化模式，而无需使用任何外部信息（即无监督学习）。同时挖掘，跟踪和验证大数据流（Stream-Dashboard）中的群集。提出的框架由三个主要组件组成：在线数据流聚类算法，使用回归分析跟踪和验证模式行为的组件以及使用有关检测到的模式的行为信息来提高聚类算法质量的组件。作为第一个组件，我们提出了RINO-Streams，这是一种在线聚类算法，可以使用可靠的统计信息和增量优化来增量式更新聚类模型。第二部分是我们称为TRACER的方法，该方法使用回归分析连续执行一组统计测试，以跟踪检测到的簇的演变，其特征和质量指标。对于最后一个组件，我们提出了一种随着时间的流逝为聚类模型构建一些行为配置文件的方法，该方法可用于提高在线聚类算法的性能，例如调整输入参数的初始值。通过广泛的实验验证了所提出框架的有效性，并在具有挑战性的真实单词应用程序上演示了其用法，特别是从Twitter社交媒体流一次通过无监督地挖掘正在发展的集群故事。

著录项

作者
Hawwash, Basheer.;
展开▼
作者单位

University of Louisville.;

展开▼
授予单位 University of Louisville.;
学科 Computer Science.;Information Technology.;Web Studies.;Artificial Intelligence.
学位 Ph.D.
年度 2013
页码 229 p.
总页数 229
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Software-hardware systems: vertical processing of integer group-data streams. Ⅲ. Application to binary arithmetic operations [J] . Ya.E.Romm Cybernetics and Systems Analysis . 1999,第1期

机译：软件-硬件系统：整数组数据流的垂直处理。 Ⅲ。应用于二进制算术运算
2. CHRONOMERGE: an application for the merging and display of multiple time-stamped data streams. [J] . Nadkarni PM Computers and Biomedical Research . 1998,第6期

机译：CHRONOMERGE：用于合并和显示多个带时间戳的数据流的应用程序。
3. Clustering High-Dimensional Data Stream: A Survey on Subspace Clustering, Projected Clustering on Bioinformatics Applications (Advanced Science, Engineering and Medicine, Vol. 8(9), pp. 749–757 (2016)) [J] . Baghernia Ali, Pavin Hamid, Mirnabibaboli Miresmail, Advanced Science, Engineering and Medicine . 2017,第7期

机译：聚类高维数据流：生物信息学应用中预计集群的子空间聚类调查（高级科学，工程和医学，Vol.8（9），PP。749-757（2016））
4. DCF: An Efficient Data Stream Clustering Framework for Streaming Applications [C] . Kyungmin Cho, Sungjae Jo, Hyukjae Jang, Database and Expert Systems Applications; Lecture Notes in Computer Science; 4080 . 2006

机译：DCF：用于流应用程序的高效数据流群集框架
5. Scalable frameworks and algorithms for cluster ensembles and clustering data streams. [D] . Hore, Prodip. 2007

机译：用于集群集成和集群数据流的可扩展框架和算法。
6. Artificial intelligence in public health: A call for an ethical framework when using social media data for artificial intelligence applications in public health research [O] . Jean-Philippe Gilbert, Victoria Ng, Jingcheng Niu, 2020

机译：公共卫生中的人工智能：在公共卫生研究中使用社交媒体数据进行人工智能应用时需要道德框架
7. DCF: An Efficient Data Stream Clustering Framework for Streaming Applications [O] . Kyungmin Cho, Sungjae Jo, Hyukjae Jang, 2016

机译：DCF：用于流应用的高效数据流集群框架
8. Improved plasma torch and data analysis software for an on-line, multielement ICP spectrometer designed for application to high temperature and pressure fossil fuel process streams. [R] . Yip, J., Pearce, K. D., Christy, C. E., 1993

机译：改进的等离子炬和数据分析软件，用于在线多元素ICp光谱仪，设计用于高温和高压化石燃料工艺流。

Stream-Dashboard: A big data stream clustering framework with applications to social media streams.

摘要

著录项

相似文献

相关主题

期刊订阅