Automatically recovering functional architecture of the software can facilitate the developer's understanding of how the system works. In legacy systems, original source code is often the only available source of information about the system and it is very time consuming to understand source code. Current architecture recovery techniques either require heavy human intervention or fail to recover quality components. To alleviate these shortcomings, we propose use of machine learning techniques which use structural, runtime behavioral, domain, textual and contextual (e.g. code authorship, line co-change) features. These techniques will allow us to experiment with a large number of features of the software artifacts without having to establish a priori our own insights about what is important and what is not important. We believe this is a promising approach that may finally start to produce usable solutions to this elusive problem.
展开▼