首页> 外文会议>International Conference on Machine Learning >Subtle Topic Models and Discovering Subtly Manifested Software Concerns Automatically
【24h】

Subtle Topic Models and Discovering Subtly Manifested Software Concerns Automatically

机译:微妙的主题模型和发现巧妙地表现出软件的自动关注

获取原文

摘要

In a recent pioneering approach LDA was used to discover cross cutting concerns(CCC) automatically from software codebases. LDA though successful in detecting prominent concerns, fails to detect many useful CCCs including ones that may be heavily executed but elude discovery because they do not have a strong prevalence in source-code. We pose this problem as that of discovering topics that rarely occur in individual documents, which we will refer to as subtle topics. Recently an interesting approach, namely focused topic models (FTM) was proposed in (Williamson et al., 2010) for detecting rare topics. FTM, though successful in detecting topics which occur prominently in very few documents, is unable to detect subtle topics. Discovering subtle topics thus remains an important open problem. To address this issue we propose subtle topic models (STM). STM uses a generalized stick breaking process (GSBP) as a prior for defining multiple distributions over topics. This hierarchical structure on topics allows STM to discover rare topics beyond the capabilities of FTM. The associated inference is non-standard and is solved by exploiting the relationship between GSBP and generalized Dirichlet distribution. Empirical results show that STM is able to discover subtle CCC in two benchmark code-bases, a feat which is beyond the scope of existing topic models, thus demonstrating the potential of the model in automated concern discovery, a known difficult problem in Software Engineering. Furthermore it is observed that even in general text corpora STM outperforms the state of art in discovering subtle topics.
机译:在最近的开创性方法中,LDA用于自动从软件代码库中自动发现交叉切割问题(CCC)。 LDA虽然成功地检测到突出的问题,未能检测到许多有用的CCC,包括可能严重执行但从源代码中没有强烈的普遍存在而被大量执行的CCC。我们构成了这个问题,因为发现了在个人文件中很少发生的主题,我们将参考纯粹的主题。最近,有一个有趣的方法,即重点是主题模型(FTM)(Williadmson等,2010),用于检测稀有主题。 FTM,虽然成功地检测到极少数文件中出现突出的主题,但无法检测到微妙的主题。从而发现微妙的主题仍然是一个重要的公开问题。要解决此问题,我们提出了微妙的主题模型(STM)。 STM使用广泛的棒中断处理(GSBP)作为在定义多个主题的多个分布之前。主题的这种层次结构允许STM发现超出FTM功能的罕见主题。相关的推断是非标准的,通过利用GSBP和广义的Dirichlet分布之间的关系来解决。经验结果表明,STM能够在两个基准代码基础上发现微妙的CCC,这是一个超出现有主题模型的范围的壮举,从而展示了模型在自动关注的发现中,是软件工程中的已知难题。此外,也观察到即使在一般文本中,STM也优于发现暗示题目的艺术状态。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号