Subtle Topic Models and Discovering Subtly Manifested Software Concerns Automatically

机译：微妙的主题模型和发现巧妙地表现出软件的自动关注

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In a recent pioneering approach LDA was used to discover cross cutting concerns(CCC) automatically from software codebases. LDA though successful in detecting prominent concerns, fails to detect many useful CCCs including ones that may be heavily executed but elude discovery because they do not have a strong prevalence in source-code. We pose this problem as that of discovering topics that rarely occur in individual documents, which we will refer to as subtle topics. Recently an interesting approach, namely focused topic models (FTM) was proposed in (Williamson et al., 2010) for detecting rare topics. FTM, though successful in detecting topics which occur prominently in very few documents, is unable to detect subtle topics. Discovering subtle topics thus remains an important open problem. To address this issue we propose subtle topic models (STM). STM uses a generalized stick breaking process (GSBP) as a prior for defining multiple distributions over topics. This hierarchical structure on topics allows STM to discover rare topics beyond the capabilities of FTM. The associated inference is non-standard and is solved by exploiting the relationship between GSBP and generalized Dirichlet distribution. Empirical results show that STM is able to discover subtle CCC in two benchmark code-bases, a feat which is beyond the scope of existing topic models, thus demonstrating the potential of the model in automated concern discovery, a known difficult problem in Software Engineering. Furthermore it is observed that even in general text corpora STM outperforms the state of art in discovering subtle topics.

机译：在最近的开创性方法中，LDA用于自动从软件代码库中自动发现交叉切割问题（CCC）。 LDA虽然成功地检测到突出的问题，未能检测到许多有用的CCC，包括可能严重执行但从源代码中没有强烈的普遍存在而被大量执行的CCC。我们构成了这个问题，因为发现了在个人文件中很少发生的主题，我们将参考纯粹的主题。最近，有一个有趣的方法，即重点是主题模型（FTM）（Williadmson等，2010），用于检测稀有主题。 FTM，虽然成功地检测到极少数文件中出现突出的主题，但无法检测到微妙的主题。从而发现微妙的主题仍然是一个重要的公开问题。要解决此问题，我们提出了微妙的主题模型（STM）。 STM使用广泛的棒中断处理（GSBP）作为在定义多个主题的多个分布之前。主题的这种层次结构允许STM发现超出FTM功能的罕见主题。相关的推断是非标准的，通过利用GSBP和广义的Dirichlet分布之间的关系来解决。经验结果表明，STM能够在两个基准代码基础上发现微妙的CCC，这是一个超出现有主题模型的范围的壮举，从而展示了模型在自动关注的发现中，是软件工程中的已知难题。此外，也观察到即使在一般文本中，STM也优于发现暗示题目的艺术状态。

著录项

来源
《International Conference on Machine Learning》|2013年||共9页
会议地点
作者
Mrinal Kanti Das; Suparna Bhattacharya; Chiranjib Bhattacharyya; K. Gopinath;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP181-53;
关键词
入库时间 2022-08-20 20:00:11

相似文献

外文文献
中文文献
专利

1. Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model [J] . Hospedales Timothy M., Li Jian, Gong Shaogang, Pattern Analysis and Machine Intelligence, IEEE Transactions on . 2011,第12期

机译：识别稀有和微妙的行为：弱监督的联合主题模型
2. Discovering regulatory concerns on bridge management: An author-topic model based approach [J] . Wen Qi, Qiang Maoshan, Xia Bingqing, Transport policy . 2019,第MARa期

机译：发现桥梁管理方面的监管问题：一种基于作者主题模型的方法
3. Automatically Labelled Software Topic Model [J] . International journal of open source software & processes . 2020,第1期

机译：自动标记的软件主题模型
4. Subtle Topic Models and Discovering Subtly Manifested Software Concerns Automatically [C] . Mrinal Kanti Das, Suparna Bhattacharya, Chiranjib Bhattacharyya, International Conference on Machine Learning . 2013

机译：微妙的主题模型和发现巧妙地表现出软件的自动关注
5. Applying Andean Shamanism to Healing Faustian Soul Loss: Re-Discovering the Subtle Realities of the Mundus Imaginalis. [D] . Wolff, Danita Gay H. 2014

机译：将安第斯萨满教应用于治疗浮士德式的灵魂损失：重新发现Imaginalis的微妙现实。
6. Improved Detection of Subtle Mesial Temporal Sclerosis: Validation of a Commercially Available Software for Automated Segmentation of Hippocampal Volume [O] . J.M. Mettenburg, B.F. Branstetter, C.A. Wiley, 2019

机译：改进的检测细微的颞叶内侧硬化：海马体积的自动分段的市售软件的验证
7. Identifying Rare and Subtle Behaviors: A Weakly Supervised Joint Topic Model [O] . Hospedales, TM, Li, J, Gong, SG, 2011

机译：识别稀有和微妙的行为：弱监督的联合主题模型

Subtle Topic Models and Discovering Subtly Manifested Software Concerns Automatically

摘要

著录项

相似文献

相关主题

期刊订阅