首页> 外文会议>IEEE International Conference on Computational Intelligence and Computing Research >Text Mining to Concept Mining: Leads Feature Location in Software System
【24h】

Text Mining to Concept Mining: Leads Feature Location in Software System

机译:文本挖掘到概念挖掘:引领软件系统中的特征定位

获取原文

摘要

In the agile application development environment, automatically identifying relevant components in a large complex software system for software maintenance is still remain a research problem with the proliferation of software applications. Earlier, concept mining with formal concept analysis was one of the commonly applied techniques for legacy software systems of small to medium size. Recently, text mining is being widely used for locating features or concerns in a large complex software system. Nevertheless, the literature study reveals that combining text mining with other techniques always yield better accuracy in locating features. Even though it is efficient, applying formal concept analysis on the large systems poses limitation due to its exponential time complexity in constructing concept lattices. In this research work, a model is devised to combine text mining and concept mining for large systems. The unsupervised machine learning technique, Latent Dirichlet Allocation modeling also called as Topic Modeling is used to reduce the feature space on which K-Means clustering is applied to cluster the related documents and formal concept analysis is carried out on individual clusters. Three open source software systems namely JEdit, ArgoUML and JabRef are considered for the experimental study. The empirical evaluation of feature location measure of the proposed model shows a significant improvement in terms of accuracy, scalability, flexibility and efficiency over the contemporary methods existing in the literature.
机译:在敏捷应用程序开发环境中,随着软件应用程序的激增,自动识别大型复杂软件系统中的相关组件以进行软件维护仍然是一个研究问题。早期,带有正式概念分析的概念挖掘是中小型遗留软件系统的常用技术之一。近来,文本挖掘被广泛用于在大型复杂软件系统中定位功能或关注点。尽管如此,文献研究表明,将文本挖掘与其他技术结合使用总是可以在定位特征时获得更好的准确性。尽管它很有效,但由于在大型概念上进行形式概念分析会在构造概念格时耗费大量时间,因此存在局限性。在这项研究工作中,设计了一个模型,将大型系统的文本挖掘和概念挖掘相结合。一种无监督的机器学习技术,即潜在狄利克雷分配模型(也称为主题模型),用于减少特征空间,在该特征空间上应用K均值聚类对相关文档进行聚类,并对单个聚类进行形式化概念分析。实验研究考虑了三个开源软件系统,即JEdit,ArgoUML和JabRef。对所提出模型的特征位置测量进行的经验评估表明,与文献中现有的现代方法相比,该方法在准确性,可扩展性,灵活性和效率方面均取得了显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号