...
首页> 外文期刊>Empirical Software Engineering >A survey on the use of topic models when mining software repositories
【24h】

A survey on the use of topic models when mining software repositories

机译:挖掘软件存储库时使用主题模型的调查

获取原文
           

摘要

Researchers in software engineering have attempted to improve software development by mining and analyzing software repositories. Since the majority of the software engineering data is unstructured, researchers have applied Information Retrieval (IR) techniques to help software development. The recent advances of IR, especially statistical topic models, have helped make sense of unstructured data in software repositories even more. However, even though there are hundreds of studies on applying topic models to software repositories, there is no study that shows how the models are used in the software engineering research community, and which software engineering tasks are being supported through topic models. Moreover, since the performance of these topic models is directly related to the model parameters and usage, knowing how researchers use the topic models may also help future studies make optimal use of such models. Thus, we surveyed 167 articles from the software engineering literature that make use of topic models. We find that i) most studies centre around a limited number of software engineering tasks; ii) most studies use only basic topic models; iii) and researchers usually treat topic models as black boxes without fully exploring their underlying assumptions and parameter values. Our paper provides a starting point for new researchers who are interested in using topic models, and may help new researchers and practitioners determine how to best apply topic models to a particular software engineering task.
机译:软件工程研究人员已尝试通过挖掘和分析软件存储库来改善软件开发。由于大多数软件工程数据都是非结构化的,因此研究人员已应用信息检索(IR)技术来帮助软件开发。 IR的最新进展,尤其是统计主题模型,使软件存储库中的非结构化数据更加有意义。但是,即使有数百篇关于将主题模型应用于软件资源库的研究,也没有研究表明如何在软件工程研究社区中使用模型以及通过主题模型支持哪些软件工程任务。此外,由于这些主题模型的性能与模型参数和用法直接相关,因此了解研究人员如何使用主题模型也可能有助于将来的研究最佳利用此类模型。因此,我们调查了来自软件工程文献的167篇使用主题模型的文章。我们发现:i)大多数研究集中在有限的软件工程任务上; ii)大多数研究仅使用基本主题模型; iii)并且研究人员通常将主题模型视为黑盒,而没有充分探索其基本假设和参数值。本文为有兴趣使用主题模型的新研究人员提供了一个起点,并可以帮助新的研究人员和从业人员确定如何将主题模型最佳地应用于特定的软件工程任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号