Cobra Reference - token expressions

	Cobra	Interactive Query Language	regular token expressions

	NAME re — match a patterns specified in a regular token expression SYNTAX re [qualifier] token-expression DESCRIPTION Token expressions can be used as a command-line argument with command line option -e, or interactively during a Cobra session with an re command. The syntax is the same in both cases, but when used interactively the expression should not be quoted. Token expressions can be used to find ranges of tokens that can cross multiple lines of text in the source. An example of interactive use is: : re for \( x:@ident .* \) { .* :x .* } Everything following the command name re is considered to be part of the pattern to be found. The above expression matches for-loops where the control variable is used inside the body of the loop. In this case any match starts at keyword for and ends at the closing curly brace of the body of the loop. By giving multiple re and/or pe commands consecutively, the set of matches can be extended, or modified if the qualifiers & or no are added. As usual, any qualifiers must follow the command name. The matches found by a token expression are handled slightly differently from matches that are created with other interactive commands, which typically match only individual tokens, or small ranges of tokens. Once a non-empty set of matches is created with an re command, all earlier other matches (e.g., obtained with earlier mark commands are temporarily hidden, and replaced with the matching patterns. The best way to display the result of a regular expression pattern match is with the json command. Alternate methods include the standard display, list, and pre commands. The format of a token-expression uses the usual meta-symbols from regular expressions: Any alpha-numeric string is understood to be the text of a single token. For instance the string for matches any token with that name. A range of possible token-matches can be specified in square brackets [..]. For instance: [ for switch if else ] matches any of these four keywords. The plus symbol (+) indicates one of more repetitions of the preceding token, or range of tokens. The star symbol () indicates zero or more repetitions of the preceding token, or range of tokens. The dot symbol (.) matches any single token. The hat symbol (^) indicates a non-match of the immediately following token, or range of tokens. The atsign symbol (@) preceding a name indicates a match of a token type of that name. For instance, @ident indicates a match of any token that is classified as an identifier. Any meta-symbol can be escaped by preceding it with a backslash (\) if the symbol itself is to be matched as a token. Name binding: The notation name:name* indicates that the text of a token that is matched by the string on the right-hand side of the colon is bound to the variable name (an arbitrary string) that appears on the left-hand side of the colon. Similarly, a bound name can be used as a match by preceding the variable name by a single colon, as in :name. EXAMPLES : a ( b \| c )* d # a followed by any repitition of b or c and then a d : { ( return \| ; )* } # a block containing only return statements and semi-colons : else ^[{ if] # an else not followed by either { or if : else ^{ # an else not followed by a { : { .* malloc ^free* } # a block with malloc, but no free : switch \( .* \) { ^default* } # a switch without a default : if \( .* \) ^{ # an if without a { after the condition : if \( .* \) ; # an empty if statement : \( .* \\| .* & .* \) # a condition with binary operators \| and & : \( .* [ \\| & ] .* & \) # a condition with either \| or & : for \( x:@ident.* \) {.* :x = .* } # assignment to control variable of for-loop in body NOTES It is faster to check a range of tokens using square brackets, e.g. [ \\| & ] than to do the same check with a choice operator, e.g. ( \\| \| & ) To view the NDFA that is generated from a token expression as a dot-graph, use the command line option -view. For instance: $ cobra -view -re 'a ( b \| c )* d' file.c SEE ALSO pe pattern expressions, json output format, patterns, qualifiers

Cobra

Interactive Query Language

regular token expressions

NAME

re — match a patterns specified in a regular token expression

SYNTAX

	re [qualifier] token-expression

DESCRIPTION

Token expressions can be used as a command-line argument with command line option -e, or interactively during a Cobra session with an re command. The syntax is the same in both cases, but when used interactively the expression should not be quoted.

Token expressions can be used to find ranges of tokens that can cross multiple lines of text in the source. An example of interactive use is:

	: re for \( x:@ident .* \) { .* :x .* }

Everything following the command name re is considered to be part of the pattern to be found. The above expression matches for-loops where the control variable is used inside the body of the loop. In this case any match starts at keyword for and ends at the closing curly brace of the body of the loop.

By giving multiple re and/or pe commands consecutively, the set of matches can be extended, or modified if the qualifiers & or no are added. As usual, any qualifiers must follow the command name.

The matches found by a token expression are handled slightly differently from matches that are created with other interactive commands, which typically match only individual tokens, or small ranges of tokens. Once a non-empty set of matches is created with an re command, all earlier other matches (e.g., obtained with earlier mark commands are temporarily hidden, and replaced with the matching patterns.

The best way to display the result of a regular expression pattern match is with the json command. Alternate methods include the standard display, list, and pre commands.

The format of a token-expression uses the usual meta-symbols from regular expressions:

Any alpha-numeric string is understood to be the text of a single token. For instance the string for matches any token with that name.
A range of possible token-matches can be specified in square brackets [..]. For instance:
```
	[ for switch if else ]
```
matches any of these four keywords.
The plus symbol (+) indicates one of more repetitions of the preceding token, or range of tokens.
The star symbol (*) indicates zero or more repetitions of the preceding token, or range of tokens.
The dot symbol (.) matches any single token.
The hat symbol (^) indicates a non-match of the immediately following token, or range of tokens.
The atsign symbol (@) preceding a name indicates a match of a token type of that name. For instance, @ident indicates a match of any token that is classified as an identifier.
Any meta-symbol can be escaped by preceding it with a backslash (\) if the symbol itself is to be matched as a token.
Name binding: The notation name:name indicates that the text of a token that is matched by the string on the right-hand side of the colon is bound to the variable name (an arbitrary string) that appears on the left-hand side of the colon.
Similarly, a bound name can be used as a match by preceding the variable name by a single colon, as in :name.

EXAMPLES

        : a ( b | c )* d		# a followed by any repitition of b or c and then a d
        : { ( return | ; )* }		# a block containing only return statements and semi-colons
        : else ^[{ if]			# an else not followed by either { or if
        : else ^{			# an else not followed by a {
        : { .* malloc ^free* }		# a block with malloc, but no free
        : switch \( .* \) { ^default* }	# a switch without a default
        : if \( .* \) ^{		# an if without a { after the condition
        : if \( .* \) ;			# an empty if statement
	: \( .* \| .* & .* \)		# a condition with binary operators | and &
        : \( .* [ \| & ] .* & \)	# a condition with either | or &
        : for \( x:@ident.* \) {.* :x = .* } # assignment to control variable of for-loop in body

NOTES

It is faster to check a range of tokens using square brackets, e.g.

	[ \| & ]

than to do the same check with a choice operator, e.g.

	( \| | & )

To view the NDFA that is generated from a token expression as a dot-graph, use the command line option -view. For instance:

        $ cobra -view -re 'a ( b | c )* d' file.c


	Return to index Manual Tutorial		(Last Updated: 19 January 2021)

NAME

SYNTAX

DESCRIPTION

EXAMPLES

NOTES

SEE ALSO