Darren C. Atkinson and William G. Griswold, “Effective pattern matching of source code using abstract syntax patterns,” Software — Practice and Experience, vol. 36, pp. 413–447, April 2006.

Abstract

Program understanding can be assisted by tools that match patterns in the program source. Lexical pattern matchers provide excellent performance and ease of use, but have a limited vocabulary. Syntactic matchers provide more precision, but may sacrifice performance, robustness, or power. To achieve more of the benefits of both models, we extend the pattern syntax of AWK to support matching of abstract syntax trees, as demonstrated in a tool called TAWK. Its pattern syntax is language-independent, based on abstract tree patterns. As in AWK, patterns can have associated actions, which in TAWK are written in C for generality, familiarity, and performance. The use of C is simplified by high-level libraries and dynamic linking. To allow processing of program files containing non-syntactic constructs such as textual macros, mechanisms have been designed that allow matching of ‘language-like’ macros in a syntactic fashion. We survey and apply prototypical approaches to concretely demonstrate the tradeoffs in program processing. Our results indicate that TAWK can be used to quickly and easily perform a variety of common software engineering tasks, and the extensions to accommodate non-syntactic features significantly extend the generality of syntactic matchers.