Todd J. H. King, “Finding refactorings using lightweight code analysis,” Master’s thesis, Santa Clara University, Department of Computer Engineering, December 2005.

Abstract

Poorly structured code is hard to maintain and read. Refactoring can improve the code structure and thus make it easier to preserve and to discern the underlying design. According to Martin Fowler’s book Refactoring, refactoring is a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior. Refactoring is a difficult and time consuming process which makes it an unattractive activity for many developers. A tool that could automatically identify refactorable code sections and make fix suggestions would make refactoring much easier and faster. In my research I have developed the tool Look# for just this purpose.

The Look# tool uses lightweight detection algorithms based on code metrics to identify refactorable code sections. Each refactoring technique that Look# can detect has been distilled down into a lightweight algorithm based on thresholds and syntactic keywords. Some refactoring techniques are detected by calculating code metrics and then evaluating if those code metric counts exceed a certain threshold. Other refactoring techniques are detected by looking for certain syntactic indicators, like a public access modifier on a field.

Many of the guidelines Martin Fowler provides for determining when to apply certain refactoring methods are based on human intuition and abstract perceptions. This makes it extremely difficult to automatically identify where many of the refactoring methods should be applied. Therefore the goal of these refactoring algorithms is not to definitively find all the refactorings in the users code, but to use metrics that look for common tell-tale signs of refactorings to help point informed human intuition in the right direction.

The primary barrier to automatic refactoring detection is that it requires a good understanding of the code. In the absence of a good human understanding of the code, the best a detection algorithm can do is look for common tell-tale signs that most likely indicate the presence of refactorable code. This thesis explores using code metrics to look for these tell-tale signs and how successful this approach is. In most cases, the refactoring detection algorithms discussed in this thesis had a high success rate. Some of the less successful algorithms need to be reworked and made smarter.

[Full text in PDF]