Code Search Patterns

Help Index


Introduction

The code searcher allows for much more complex and precise searches as compared to simple string or pattern searches. You can enter a Smalltalk code fragment with possible meta-patterns and search for matches on the code-tree level (i.e. Smalltalk-syntax aware searching).

For example, you can search for assignments of the value 1234 to a variable named "myVar"  with the following pattern:

    myVar := 1234
This will find the assignments even if the code is formatted with whitespace or comments in between or both. It will also ignore the above character sequence inside comments or string literals (i.e. it performs a real "code search", as opposed to a "string search"). The above pattern will even find assignments of a hex literal number (i.e. code like "myVar := 16r4D2").

Likewise, you can search for all creations of three-element Arrays, using a constant array size, with the pattern:

    Array new:3
or, you can search for a particular message being sent to the Transcript with:
    Transcript showCR:'oops'
Try using the codeSearcher on some of your own code (and place comments or whitespace into your code). Compare the results against the results of simple text search operations.

Enhancements to the Original RefactoryBrowser Search Patterns

Two enhancements/changes were added to the code pattern patcher, as compared to the original RefactoryBrowser code: These are described below.

Meta Characters/Variables

The above (simple) examples search for exact code matches. Although this may be useful in it self, we often need the search to be more flexible: we want to search for "any message" or "any literal value" or "any something". For example, we might want to search for all Array instance creations with a constant size, but want to allow for arbitrary constants (i.e. match "Array new: N", where N is any integer constant), or search for a particular message send with "any expression" as argument.

To allow for those to be specified in the search-code pattern, the search-syntax supports pattern variables (also called "meta variables").

Each metaPattern-variable must begin with a ` (backquote) character. This character was choosen, because it does not occur in normal Smalltalk code. It is a bit hard to type on non-us keyboards, though.
Immediately following the `-character, other characters can be entered to specify what type of node this metaPattern-variable should match.
After all the special character have been entered, you must then enter a valid variable name. (the matched code is actually bound to this name inside the searcher and can be later used to match other parts against the same pattern, or by the code-rewriter to replace it by some other code.)
The special characters are listed in the following table:

 (Variable)
if no special character but an identifier follows the backquote, the metaVariable matches only variables, pseudo variables (such as "self") and selectors (if followed by a colon).
For example:
    `foo
matches any variable.
If the same variable name appears again in the match-pattern, a given code only matches if it also contains the same variable.
For example:
    `foo := `foo + 1.
will match any increment operation of a variable such as "a := a + 1", but will NOT match something like "a := b + 1". The later is found by:
    `var1 := `var2 + 1.

Uppercase (Global) Variables:

If the meta variable is upper-case, only globals and class variables are matched; thus:
    `v
will match any (both locals and global) variables, whereas:
    `V
will match only globals and classvars.

# (Literal)
matches any literal object (constant)

For example:

    `#lit
matches any literal, such as: #(...), #[...], #foo, '...', 1, nil, true and false etc.

Notice that "#lit" (i.e. without the backquote) is not a meta pattern, and will match the exact symbol "#literal" only. See the section on "Pattern Blocks" below for how to search for literals of a particular kind (i.e. String or Symbol) or of particular value.

@ (List - or "any number of")
When applied to a variable node, this will match a literal, variable, or a sequence of messages sent to a literal or variable

When applied to a keyword in a message, it will match a list of keyword messages (i.e., any message send)

When applied with a statement character (see below), it will match a list of statements

For example:

    | `@Temps |
matches a (possibly empty) list of temps

    `@.Statements
matches a (possibly empty) list of statements

    `@object
matches any message node, literal node or block node

    foo `@message: `@args
matches any message sent to foo.

Notice, that this can also be used for a partial selector. Therefore:

    `@receiver `@keyword: `@arg1
matches any message.
In contrast,
    `@receiver `keyword1: `@arg1 `keyword2: `@arg2
matches any 2-argument message, and
    `@receiver at: `@arg1 `@keyword: `@arg2
matches any message with at least 2 arguments, which starts with at:.

. (Statement)
matches a statement in a sequence node

For example:

    `.Statement
matches a single statement

And:

`.@Statements
matches a (possibly empty) list of statements.

` (backquote) (Recurse Into)
Whenever a match is found, look inside this matched node for more matches.
For example:
    `@object foo
matches a foo message sent to any expression on the outer level. However, the code "self foo foo" would match only once (the outer message send expression).

In contrast,

    ``@object foo
also matches foo sent to any object, plus for each match found, it will look for more matches in the ``@object part. Thus, this will match twice for the "self foo foo" example.

{ .. }  (Pattern Block)
Allows for a Smalltalk block to be used as match condition on a node. The parseNode (a subinstance of RBProgramNode) is passed as argument to the block.
Among others, useful query messages to it are:

For example:

    `{:node | node isVariable and: ['RB*' match: node name]}
matches any variable whose name starts with 'RB'.

This allows for almost unlimited flexibility in the match:

    `{:node | node isVariable and: ['co*' match: node name]} add: `@arg
matches any add expression sent to any variable which starts with 'co'.

To match empty array constants use:

    `{:node | node isLiteralArray and:[ node value size == 0 ]}

To match empty string constants use:

    `{:node | node isLiteralString and:[ node value = '' ]}

to match a non-block expression:

    `{:node | node isBlock not }
or to match a block with 1 argument:
    `{:node | node isBlock and:[node numArgs == 1] }
however, this can also be acomplished with:
    [:`arg | `expr ]

The block may also be specified as a 2-argument block. In this case, the matching dictionary is passed as second argument, allowing the block to refer to previous match results.

    `someVariable
	at: `#someLiteral
	put: `{:node :matchDict |
		       node isLiteral
		       and:[ node value isString
		       and:[ node value = matchDict at:#someLiteral ]] }

'...' (Pattern String Literal)
Matches a string literal, where the string's contents matches the regular expression given in the pattern.
For example:
    `'.*'
matches any string literal, whereas
    `'[aA]..'
matches any string literal of size 3, which begins with an "a".

String patterns are useful to delimit the search result; for example, the following searches for a string concatenation (","-message) to any string which starts with a space followed by the word 'and':

    `@e  , `' and.*'

If preceeded by the Recurse Into meta character ( ` ), the search recurses into Array literals. I.e. Array literals containing string elements are also searched for a matching string.
For example:

    `'Di.*'
matches any string literal which starts with the characters 'Di'. It would not match an array-literal containing such a string element, though. In contrast,
    ``'Di.*'
also matches string elements of array literals.

Method- vs. Expression Search

The searcher can either search inside a method's body for an expression or statement or include the method's message pattern (the selector definition) in the search.
For example, to search for getter messages (which return a variable which is named the same as the method's name), use the following search pattern:
    `sel
	^ `sel
In the search dialog box, do not forget to set the "Method"-CheckBox; otherwise, the search will be for a matching expression, which will probably not be found.

Another, example is the following pattern, which searches for all 2-argument methods, which return their second argument:

    `selPart1: `arg1  `selPart2: `arg2
	^ `arg2
Consequently, the following pattern searches for methods which simply return the result of another self-send (typically, these are aliasing-methods for compatibility with other Smalltalk libraries):
    `op: `arg
	^ self `op2: `arg
An finally, the following pattern searches for methods which simply delegate to return the result of a super-send (such methods can actually be removed):
    `op: `arg
	^ super `op: `arg

Examples

In most situations, your search will be very simple, such as a particular message send combination or a specific message being sent to a global.

For example, to search for senders of "do:" to a constructed interval, use a pattern like:

    (`@e1 to: `@e2) do: `@block
The following examples demonstrate more specific searches.

Search for a Variable, Literals etc.:

Simple variables: `foo - matches any variable or pseudo-variable.
Simple literals: `#foo - matches any literal (incl. nil, true and false).

Search for a Variable by Name

TO search for variables whoese name includes a substring:
    `{:node | node isVariable and: [node name includesString:'Array' caseSensitive:false]}
In a similar fashoin, search for name prefixes, suffixes, etc. Eg.
    `{:node | node isVariable and: [node name endsWith:'Array' caseSensitive:true]}

Search for a Numeric Constant:

The code search utilitiy is much more precise than the classic text search, when searching for a particular numeric constant. For example, if you search for a numeric constant 16, the code search will also detect constants like "16r10" or "2r10000" but not "160". All of this is much harder (if not impossible) with a simple text search.

Search for a Numeric Constant within a Range:

The following pattern-block will search for a numeric literal with a value between 1 and 5.
    `{:node | node isLiteral
	      and:[ node value isNumber
	      and:[ node value between:1 and:5 ]] }

Search for a Particular Message being Sent with a Particular Argument:

Search for Array instance creations with a constant size of 3 (same as above):
    Array new:3
Search for Array instance creations with a constant size larger than 100:
    Array new:
	`{:node | node isLiteral
	      and:[ node value isNumber
	      and:[ node value > 100 ]] }
Search for Array instance creations with any constant size:
    Array new: `#n
Search for Array instance creations where a variable specifies the size:
    Array new: `v
Search for all Array instance creations (any expression as size):
    Array new: `@e

Search a Message with a nil Argument:

The following uses a pattern-block to search for a message sends with a particular argument value:
    `@e add: `{:n | n isLiteral and:[ n value isNil] }
of course, this particular search can be written simpler as:
    `@e add: nil
However, as already noted above, the pattern block allows for very fine tuned searches (particular integer argument range, string patterns in arguments, etc).

Search a Message with non-Symbol Argument:

The following uses a pattern-block to search for all "breakPoint:"-message sends with a non-symbol argument:
    `@e breakPoint:
	`{:n |
	    n isLiteral not
	    or:[n value isSymbol not ] }

Search for a Message Send with a String Argument:

The following will search for a send of the breakPoint: message with a string argument:
    `@e breakPoint:`{:n |
		    n isLiteral
		    and:[n value isSymbol not
		    and:[n value isString]]}
notice the need for the extra symbol check, because in ST/X isString returns true for symbols.

Search for a particular Message being Sent to a Global:

Search for "at:put:" being sent to the Smalltalk-global, with a variable's value as argument, use
    Smalltalk at:`key put:`val
the above does not match for literal values or expression values as argument(s).
To match those, use
    Smalltalk at:`@key put:`@val
to even look into the argument and look for sends there too, use:
    Smalltalk at:``@key put:``@val
or if you want all sends to ANY global, replace the name by a global-var match pattern (upper case):
    `V at:`@key put:`@val

Search for a Message Send to a Variable:

To search for a particular message:
    nameOfVariable selector: `@expr
    nameOfVariable keyw1: `@expr1 keyw2: `@expr2 ...
or to search for any message:
    nameOfVariable `@msg: ``@args
or, to search for any unary message, use:
    nameOfVariable `msg
Use `v as receiver to search for messages to any variable, and 'V (uppercase) to seatch for messages to any global (typically classes). eg.
    `v size
    `V new: `@expr1

Search for any Message Send:

To search for any message send:
    ``@rec `@msg: ``@args
or, to search for any unary message, use:
    ``@rec `msg
Notice the extra backquotes, which are required to recurse into already matched expressions (otherwise, "rec foo foo" would only be matched once, for the outer foo-message)

This pattern can be used to find repeated sends of the same message, as in:

    `e sort: [:`a :`b | `a `sel < `b `sel ]
which will match typical sort operations.

To search for messages with a particular argument count and argument patterns, but with arbitrary selector, use "`m" (without the @ which indicates repetition).
For example:

    `@e `m:(`@e2 at:1) `m2:(`@e2 at:2)
will match any 2-arg message, which passes the first two elements of a collection as arguments.

And:

    `@e `m:`e2 ifAbsent:`e2
matches any "xxx:ifAbsent:" message.

Search for Super Sends:

For arbitrary super sends, use:
    super `@msg `@args
to search for unary messages, use
    super `msg

Search for Exception Handlers:

    ``@rec on: ``@arg1 do: ``@arg2
and:
    ``@rec handle: ``@arg1 do: ``@arg2
or, for a particular exception class:
    StreamError handle: ``@arg1 do: ``@arg2

Search for Empty Exception Handlers:

    Error handle: [ :``@args | ] do: ``@blk
and:
    ``@blk on: Error do: [ :``@args | ]

Search for ifTrue:ifFalse with non-Block Argument:


    `@e
	ifTrue: `{:node | node isBlock not }
	ifFalse: `{:node | node isBlock not }

Search for Bad (Intention Revealing) Code:

Some code constructs are hard to read or understand, due to being overly complicated or because it uses a construct in an unintended, wrong way. For example:
    ``@object not ifTrue: ``@block
and:
    ``@object not ifFalse: ``@block
are obviously easier written by negating the if-test message.

The following code-pieces check if some element is in a collection:

    ( ``@expr1 detect:[:`v | ``@expr2 ] ifNone:nil ) notNil
    ( ``@expr1 detect:[:`v | ``@expr2 ] ifNone:[] ) notNil
and should be written as:
    ( ``@expr1 contains:[:`v | ``@expr2 ] )
which is much easier to read and understand without having to decrypt what the original programmer's intentions were.

More unclean uses of the collection protocol are:

    `@coll do:[:`el|
	`@condition ifTrue:[
	    ^ true
	]
    ]
which can be replaced by:
    ( `@coll contains[:`el| `@condition ] ) ifTrue:[ ^ true ]
and this pattern searches for "beginners code", which can be replaced by a simpler and more descriptive "detect:ifNone:"-message:
    `@coll do:[:`el|
	`@condition ifTrue:[
	    ^ `el
	]
    ]
More examples are:
    (`@e1 contains:[:`v | `@e2 not])
and:
    `@e1 do:[:`v| `@e2 ifFalse:[^ false] ].
might both be replaced by a #conform:-message.

Here are a few more patterns to search for:

    `@e1 reject:[:`v1 | `@e2 not]
    `@e1 select:[:`v1 | `@e2 not]

Search for Duplicate Statements:

Remember that
    `.duplicate
matches any statement.
Thus, the following:
    `.duplicate.
    `.duplicate
should match two identical consecutive statements that are the whole body of a sequence node. However as soon as we get beyond a within-statement expression, we are matching sequence nodes. Matching two statements within a sequence node therefore requires
    `.@beforeStatements. "<- notice the period at the end here"
    `.duplicate.
    `.duplicate.
    `.afterStatements
Because the . makes the tool build a sequence node, you must provide the "zero or more statements before and after"-code, unlike the expression case, where it could match an expression within a longer expression.
If the sequence node has temporaries, the above will not match. You must use:
    | `@temps |
    `.@beforeStatements. "<- notice the period at the end here"
    `.duplicate.
    `.duplicate.
    `.afterStatements
which will match two duplicate statements within any sequence of statements.

It can be tricky to match sequence nodes. Even Don (one of the original authors of the refactory code) admitted, that he usually took two or three goes to get his expression right.

One problem with the above is that the whole match is presented as search result; although you are usually only interrested in the duplicate statement(s).

Search for a Sequence of Statements:

Having read the previous chapter, it is now clear, that matching a sequence of statements requires a before- and after context.

For example, to match a statement sequence of an instance creation followed by a message sent to the just created object requires more than just the two statement matches.

The following pattern searches for the creation of an OrderedCollection followed by an "add:" message to this collection:

    | `@temps |
    `.@beforeStatements.
    `v = OrderedCollection new.
    `v add: `@expr.
    `.afterStatements

Copyright © eXept Software AG, all rights reserved


Doc $Revision: 1.55 $ $Date: 2024/04/08 10:26:14 $