首页> 外文期刊>IEICE transactions on information and systems >Hierarchical Categorization of Open Source Software by Online Profiles
【24h】

Hierarchical Categorization of Open Source Software by Online Profiles

机译:Hierarchical Categorization of Open Source Software by Online Profiles

获取原文
获取原文并翻译 | 示例
           

摘要

The large amounts of freely available open source software over the Internet are fundamentally changing the traditional paradigms of software development. Efficient categorization of the massive projects for retrieving relevant software is of vital importance for Internet-based software development such as solution searching, best practices learning and so on. Many previous works have been conducted on software categorization by mining source code or byte code, but were verified on only relatively small collections of projects with coarse-grained categories or clusters. However. Internet-based software development requires finer-grained, more scalable and language-independent categorization approaches. In this paper, we propose a novel approach to hierarchically categorize software projects based on their online profiles. We design a SVM-based categorization framework and adopt a weighted combination strategy to aggregate different types of profile attributes from multiple repositories. Different basic classification algorithms and feature selection techniques are employed and compared. Extensive experiments are carried out on more than 21,000 projects across five repositories. The results show that our approach achieves significant improvements by using weighted combination. Compared to the previous work, our approach presents competitive results with more finer-grained and multi-layered category hierarchy with more than 120 categories. Unlike approaches that use source code or byte code, our approach is more effective for large-scale and language-independent software categorization. In addition, experiments suggest that hierarchical categorization combined with general keyword-based searching improves the retrieval efficiency and accuracy.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号