首页> 外文会议>International Conference on Information, Intelligence, Systems and Applications >Data collection and analysis of GitHub repositories and users
【24h】

Data collection and analysis of GitHub repositories and users

机译:GitHub存储库和用户的数据收集和分析

获取原文

摘要

In this paper, we present the collection and mining of GitHub data, aiming to understand GitHub user behavior and project success factors. We collected information about approximately 100K projects and 10K GitHub users//owners of these projects, via GitHub API. Subsequently, we statistically analyzed such data, discretized values of features via k-means algorithm, and finally we applied apriori algorithm via weka in order to find out association rules. Having assumed that project success could be measured by the cardinality of downloads we kept only the rules which had as right par a download cardinality higher than a threshold of 1000 downloads. The results provide intersting insight in the GitHub ecosystem and seven success rules for GitHub projects.
机译:在本文中,我们介绍了GitHub数据的收集和挖掘,旨在了解Github用户行为和项目成功因素。通过Github API收集了大约100k项目和10k Github用户//所有者的信息。随后,我们通过K-Means算法统计分析了这些数据,特征的离散值,最后我们通过Weka应用了ApRiori算法,以便找出关联规则。假设项目成功可以通过下载的基数来衡量,我们只保留了与1000下载的阈值高的下载基数的规则。结果在GitHub生态系统和GitHub项目的七个成功规则方面提供了界面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号