William G. Griswold, Darren C. Atkinson, and Collin McCurdy, “Fast, flexible syntactic pattern matching and processing,” in Proceedings of the 4th Workshop on Program Comprehension, pp. 144–153, March 1996.

Abstract

Program understanding can be assisted by tools that match patterns in the program source. Lexical pattern matchers provide excellent performance and ease of use, but have a limited vocabulary. Syntactic matchers provide more precision, but may sacrifice performance, retargetability, ease of use, or generality.

To achieve more of the benefits of both models, we extend the pattern syntax of AWK to support matching of abstract syntax trees, as demonstrated in a tool called TAWK. Its pattern syntax is language-independent, based on abstract tree patterns. As in awk, patterns can have associated actions, which in tawk are written in C for generality, familiarity, and performance. The use of C is simplified by high-level libraries and dynamic linking. To allow processing of program files containing non-syntactic constructs, mechanisms have been designed that allow transparent matching in a syntactic fashion. So far, TAWK has been retargeted to the MUMPS and C programming languages.

We survey and apply prototypical approaches to concretely demonstrate the tradeoffs. Our results indicate that TAWK can be used to quickly and easily perform a variety of common software engineering tasks, and the extensions to accommodate non-syntactic features significantly extend the generality of syntactic matchers.