首页> 外文会议>IEEE International Conference on Software Maintenance >Mining Software Profile across Multiple Repositories for Hierarchical Categorization
【24h】

Mining Software Profile across Multiple Repositories for Hierarchical Categorization

机译:跨多个存储库的挖掘软件配置文件进行分级分类

获取原文

摘要

The large amounts of software repositories over the Internet are fundamentally changing the traditional paradigms of software maintenance. Efficient categorization of the massive projects for retrieving the relevant software in these repositories is of vital importance for Internet-based maintenance tasks such as solution searching, best practices learning and so on. Many previous works have been conducted on software categorization by mining source code or byte code, which are only verified on relatively small collections of projects with coarse-grained categories or clusters. However, Internet-based software maintenance requires finer-grained, more scalable and language-independent categorization approaches. In this paper, we propose a novel approach to hierarchically categorize software projects based on their online profiles across multiple repositories. We design a SVM-based categorization framework to classify the massive number of software hierarchically. To improve the categorization performance, we aggregate different types of profile attributes from multiple repositories and design a weighted combination strategy which assigns greater weights to more important attributes. Extensive experiments are carried out on more than 18,000 projects across three repositories. The results show that our approach achieves significant improvements by using weighted combination, and the overall precision, recall and F-Measure can reach 71.41%, 65.60% and 68.38% in appropriate settings. Compared to the previous work, our approach presents competitive results with 123 finer-grained and multi-layered categories. In contrast to those using source code or byte code, our approach is more effective for large-scale and language-independent software categorization.
机译:互联网上大量的软件存储库在根本上改变了软件维护的传统范式。有效分类,用于检索这些存储库中的相关软件的大规模项目对于基于互联网的维护任务,如解决方案搜索,最佳实践学习等至关重要。通过挖掘源代码或字节代码进行了许多以前的作品,这些作品仅在具有粗粒小组类别或集群的相对较小的项目中验证。但是,基于互联网的软件维护需要更精细的粒度,更可扩展和独立语言的分类方法。在本文中,我们提出了一种基于多个存储库的在线配置文件进行分级分类软件项目的新方法。我们设计一个基于SVM的分类框架,可以分层对大规模的软件进行分类。为了提高分类性能,我们从多个存储库中汇总不同类型的配置文件属性,并设计一个加权组合策略,该策略为更重要的属性分配更大的权重。在三个存储库中超过18,000个项目进行了广泛的实验。结果表明,我们的方法通过使用加权组合实现了显着的改进,以及整体精度,召回和F措施在适当的设置中可以达到71.41%,65.60%和68.38%。与以前的工作相比,我们的方法呈现了123个细粒度和多层类别的竞争力。与使用源代码或字节代码的人相比,我们的方法对于大规模和语言无关的软件分类更有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号