Feature-based analysis of open source using big data analytics.

机译：使用大数据分析对开源进行基于功能的分析。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The open source code base has increased enormously and hence understanding the functionality of the projects has become extremely difficult. The existing approaches of feature discovery that aim to identify functionality are typically semi-automatic and often require human intervention. In this thesis, an innovative framework is proposed for automatic discovery of features and the respective components for any open source project dynamically using Machine Learning. The overall goal of the approach is to create an automated and scalable model which produces accurate results.;The initial step is to extract the meta-data and perform pre-processing. The next step is to dynamically discover topics using Latent Dirichlet Allocation and to form components optimally using K-Means. The final step is to discover the features implemented in the components using Term Frequency - Inverse Document Frequency algorithm. This framework is implemented in Spark that is a fast and parallel processing engine for big data analytics. ArchStudio tool is used to visualize the features to class mapping functionality. As a case study, Apache Solr and Apache Hadoop HDFS are used to illustrate the automatic discovery of components and features. We demonstrated the scalabilty and the accuracy of our proposed model compared with a manual evaluation by software architecture experts as a baseline. The accuracy is 85% when compared with the manual evaluation of Apache Solr. In addition, many new features were discovered for both the case studies through the automated framework.

机译：开源代码库已大大增加，因此了解项目的功能变得极为困难。旨在识别功能的现有特征发现方法通常是半自动的，通常需要人工干预。本文提出了一种创新的框架，该框架可以使用机器学习动态地自动发现任何开源项目的功能及其各个组件。该方法的总体目标是创建一个自动且可扩展的模型，以产生准确的结果。第一步是提取元数据并执行预处理。下一步是使用潜在Dirichlet分配动态发现主题，并使用K-Means最佳地形成组件。最后一步是使用术语频率-反向文档频率算法发现组件中实现的功能。此框架在Spark中实现，Spark是用于大数据分析的快速并行处理引擎。 ArchStudio工具用于可视化要素到类的映射功能。作为案例研究，Apache Solr和Apache Hadoop HDFS用于说明组件和功能的自动发现。我们证明了我们提出的模型的可扩展性和准确性，并以软件体系结构专家的手动评估为基准。与Apache Solr的手动评估相比，准确性为85％。此外，通过自动化框架为这两个案例研究发现了许多新功能。

著录项

作者
Krishnan, Malathy.;
展开▼
作者单位

University of Missouri - Kansas City.;

展开▼
授予单位 University of Missouri - Kansas City.;
学科 Computer science.
学位 M.S.
年度 2015
页码 88 p.
总页数 88
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Remote access methods for exploratory data analysis and statistical modelling: Privacy-Preserving Analytics. [J] . Sparks R, Carter C, Donnelly JB, Computer Methods and Programs in Biomedicine: An International Journal Devoted to the Development, Implementation and Exchange of Computing Methodology and Software Systems in Biomedical Research and Medical Practice . 2008,第3期

机译：探索性数据分析和统计建模的远程访问方法：隐私保护分析。
2. Statistical Data Analytics. Foundations for Data Mining, Informatics, and Knowledge Discovery [J] . Christophe Lalanne Journal of Statistical Software . 2016,第1期

机译：统计数据分析。数据挖掘，信息学和知识发现的基础
3. The effect of feature-based attention on flanker interference processing: An fMRI-constrained source analysis [J] . Julia Siemann, Manfred Herrmann, Daniela Galashan Scientific reports. . 2018,第1期

机译：基于特征的注意力对侧翼干扰处理的影响：FMRI受约束的源分析
4. E-Learning standards and learning analytics. Can data collection be improved by using standard data models? [C] . del Blanco Angel, Serrano Angel, Freire Manuel, IEEE Global Engineering Education Conference . 2013

机译：电子学习标准和学习分析。使用标准数据模型可以改善数据收集吗？
5. A Framework for Social Network Sentiment Analysis Using Big Data Analytics. [D] . Karpurapu, Bharat Sri Harsha. 2017

机译：使用大数据分析进行社交网络情感分析的框架。
6. The effect of feature-based attention on flanker interference processing: An fMRI-constrained source analysis [O] . Julia Siemann, Manfred Herrmann, Daniela Galashan -1

机译：基于特征的注意力对侧翼干扰处理的影响：fMRI约束的源分析
7. Feature-based Clustering of Web Data Sources [O] . Alsayed Algergawy 2012

机译：基于功能的Web数据源群集
8. THE BERKELEY DATA ANALYSIS SYSTEM (BDAS): AN OPEN SOURCE PLATFORM FOR BIG DATA ANALYTICS. [R] . Stoica, I., Franklin, M., Jordan, M., 2017

机译：BERKELEY数据分析系统（BDas）：大数据分析的开源平台。

Feature-based analysis of open source using big data analytics.

摘要

著录项

相似文献

相关主题

期刊订阅