| Cobra | Interactive Query Language | inline programs | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| NAMEinline programsSYNTAX
	%{ ... %}
DESCRIPTIONCobra inline programs are enclosed in the two character delimiters:
	%{
		...
	%}
An inline program can be part of a script.
Like any other sequence of commands,it can also be
stored in a file and called from the command line, e.g.:$ cobra -f file *.[ch]If called in this way, Cobra executes the commands listed in the file, returns any results, and stops. If not defined in a script (i.e., between def ... end markers), the inline program is executed immediately when the closing delimiter is seen. Inline programs can be used to write more complex queries than interactive commands allow. This includes queries that require the use of conditional selection, iteration, and commands scanning larger parts of the code to identify patterns of interest. By default an inline program executes one for each token in the input stream, but once the program is in control it can arbitrarily change that rule, for instance by moving the current token location, reversing the direction of the search, or aborting it alltogether at any user-defined point. Note that the standard syntax for interactive query commands cannot be used within inline cobra programs: Cobra programs have their own syntax that allows for the definition of more powerful queries. 
 The following table gives and overview of all language features, with links to individual manual pages providng the details. 
 
 
 
 GrammarA simple example of an inline program, consider this definition:
	def prog1
	%{
		print .fnm ":" .lnr ": " .txt "\n";
	%}
	end
	prog1
This program contains a single print statement (see also fct_index)
that is executed once for each token in the data structure that was
built for the program being analyzed.
The print command takes any number of arguments, and will print
output that corresponds with the type of each argument.
Numbers are printed as numbers, and strings as strings,
with the special characters \n and \t interpreted as newline
and tab respectively. (No other special characters are recognized though.)
A sample line of output line from the above program can be:cobra_lib.c:1237: ifA single line of source code will typically hold multiple tokens, so the filename:linenumber combination will in most cases not uniquely identify all tokens. To identify individual tokens better you can also print the token's sequence number. So a program that prints the number and text of each token could be: 
	%{
		print .seq " " .txt "\n";
	%}
(see also tokens).
Statements in Cobra programs must always be terminated by a semi-colon.
A Cobra program is a sequence of statements, with all basic
statements terminated with a semi-colon.
 
	prog:
		stmnt+
		fct_def
	stmnt:
		basic_stmnt ;
		compound_stmnt
	basic_stmnt:
		var = expr		-- assignment
		var ++			-- post-increment
		var --			-- post-decrement
		print params		-- print statement
		fct ( )			-- function call
		fct ( params )		-- function call
		goto label		-- unconditional
		label: stmnt		-- labeled statement
		return			-- in function definitions
		break			-- in while or for loops
		continue		-- in while and for loops
	compound_stmnt:
		while ( expr ) { prog }
		for ( var in array ) { prog }
		foreach ( var in name ) { prog }
		if ( expr ) { prog }
		if ( expr ) { prog } else { prog }
		if ( expr ) { prog } elif ( expr ) { prog }
		function name ( params ) { prog }
		function name ( ) { prog }
where:params: expr expr , params -- one or more expr: ( expr ) -- parentheses expr bin_op expr -- binary operators prefix expr -- eg, !@ident, see below number -- integers only true -- 1 false -- 0 token_ref string variable function_call token_ref: . -- the current token name -- a token variable name Begin -- first token for this core End -- last token for this core first_t -- first token of complete input sequence last_t -- last token of complete input sequence string: "..." -- any user-defined text string variable: . name -- reference to a token field name . name -- reference to a token field name -- variable name [ string ] -- associative array function_call: name ( params ) -- predefined or user-defined functionsthe binary operators are: bin_op: +, -, *, /, % -- arithmetic >, >=, <, <=, ==, !=, ||, && -- booleanThe + operator can also be used for string concatenation, for example: print "foo" + .txt + "goo" "\n";Similarly, the boolean equals and unequals operators can also be used on strings: 
	if (.fnm == "cobra_prep.c") { ... }
The unary prefix operators are:prefix: ! -- logical negation - -- unary minus ~ -- true if .txt contains pattern, eg ~yy ^ -- true if .txt starts with pattern, eg ^yy # -- true if .txt equals pattern, eg #yy @ -- true if .typ matches type, eg @identNote that the # symbol among the unary prefix operators requires some caution, because it also doubles as the Cobra comment delimiter. The rule is that if the # symbol is followed by a space or another # symbol, then it is interpreted as a comment. If it is immediately followed by text, it is interpreted as the prefix operator. For more detail, see expressions. 
Assignments lhs = expr;The left-hand side (lhs) can be a reference to a token (e.g., .) or a token field (e.g., .mark), a variable, or an element of an associative array. Some examples of each use are: .mark = 5; # mark is the only integer token field that can be assigned to .mark--; # post decrement and increment are defined .mark++; # as you suspected, this is a comment .txt = "Foo"; # .txt is one of two text fields that can be assigned to .typ = q.txt + .typ; # .typ is the other; use + for string catenation . = .nxt; . = .prv; . = .jmp; q = .; q = .nxt; . = q; . = q.jmp; A[.txt] = .len; val = .lnr;Variables do not need to be declared before they are used. The type of a variable or associative array element is infered from context to be a value, a string, or a token reference. An associative array is identified by a basename and an index in square brackets. Associative arrays can store any type of result, a value, a string, or a token reference, and they can be indexed with multiple indices separated by commas. Some examples: basename[index] = value; basename[.txt , .len] = .; X[.txt] = .mark; X[.txt]++; Y[.mark] = .fnm; Z[.fnm] = .; Z[.fnm , 0, "foo"] = 42;A new value (and type) may overwrite an old one at any time. Associative array elements that store a token reference cannot be indexed directly with a token field. To do so the element must first be assigned to a regular token reference variable, for instance as follows: Z[.fnm] = .; Z[.fnm].mark; # gives a syntax error q = Z[.fnm]; # is okay print q.mark ":" q.txt "\n";The number of elements in an associative array can be determined with the size function: v = size(Z);Normally the elements of an associative array are retrieved simply by reference, e.g., as in: x = Z["foo"];If the array element evaluated does not exist, the result will be zero (or depending on context the empty string). We can (only) iterate over the elements of an associate array with a for statement, as follows: 
	for (i in Z)
	{	print i.mark ": " i.txt " = " Z[i.txt] "\n";
	}
The loop variable i is assigned as a token reference, which allows us
to refer to different parts of the array elements that are returned.
The index of the associate array is converted a string and available
in the text field of the loop variable: i.txt.
The .mark field gives a numeric index
of the array element. For technical reasons the
number in the .mark field is one higher
than the actual index value, which start at zero.
An array element can also be retrieved directly with a numeric
index with a predefined function. For instance,v = retrieve(Z, 0);retrieves the first element of Z. The ordering of the elements in an associative array depend on internal implementation details, and is not related to the order in which the elements were added to the associative array. We can also interate over the elements of a pattern set, or the tokens of a given pattern, with the foreach statement. For example: 
	: pe S: while ( .* )
	: %{
		cnt = 1;
		foreach (p in S)
		{	print cnt "\n===\n";
			foreach (t in p)
			{	print t.txt " ";
			}
			print "\n";
		}
	  %}
This reproduces all patterns matched, preceded by a count.
The names S and p have to be either the name of an existing
pattern set or the name of the starting token of a pattern
from such a set, as in the example shown.If there is not enough information at compile time to determine if the target S is a set or a token reference, Cobra will assume the latter by default unless the qualifier 'pattern' is added directly following the 'foreach' keyword, as in: 
	foreach pattern (p in S) { ... }
EXAMPLESExample 1The following example shows how we can match on a text string that is specified on the command-line argument to cobra itself. 
	$ cat play/igrep.cobra
	def xmustbeinascript
	%{
		if (@ident && .txt == "x")
		{	print .fnm ":" .lnr ": " .txt "\n";
		}
	%}
	end
	xmustbeinascript
	$ 
	$ cobra -f play/igrep -var x=j *.c
	cobra_lib.c:1824: j 
	cobra_lib.c:1830: j 
	cobra_lib.c:1832: j 
	cobra_lib.c:1835: j 
	cobra_lib.c:1838: j 
	cobra_lib.c:2024: j 
	cobra_lib.c:2041: j 
	cobra_lib.c:2041: j 
	cobra_lib.c:2041: j
	$
Note that there can be more than one match of the token text
per line of code. Line cobra_lib.c:2041 above, for instance,
has three matches of a token named j:cobra_lib.c:2041: for (j = 0; x && j < span; j++)Example 2 The following example illustrates the use of a while loop and of token reference variables. The program counts the number of cases in a C switch statement, taking into account that switch statements may be nested. 
	$ num stats/nr_cases.cobra
	    1	def nr_cases
	    2	%{
	    3		if (.curly > 0 && #switch)
	    4		{      	# introduce a token variable q:
	    5			q = .;
	    6			. = .nxt;
	    7			if (.txt != "(" )
	    8			{       . = q;
	    9				Next;
	   10			}
	   11			. = .jmp;
	   12			. = .nxt;
	   13			if (.txt != "{")
	   14			{       . = q;
	   15				Next;
	   16			}
	   17	
	   18			q.mark = 0;
	   19			while (.curly >= q.curly)
	   20			{	if (.curly == q.curly + 1
	   21				&&  (#case || #default))
	   22				{	q.mark++;
	   23				}
	   24				. = .nxt;
	   25			}
	   26			print q.mark " " .fnm ":" q.lnr "\n";
	   27			. = q;
	   28		}
	   29	%}
	   30	end
	   31	nr_cases
Running it produces output like this, reporting the number
of cases in all switch statements, including the default cases:$ cobra -f stats/nr_cases cobra_lib.c | sort -n 3 cobra_lib.c:1129 3 cobra_lib.c:160 3 cobra_lib.c:500 4 cobra_lib.c:2142 5 cobra_lib.c:993 6 cobra_lib.c:2109 10 cobra_lib.c:963 22 cobra_lib.c:920A line by line explanation of this program is as follows. 
 The only main language feature that we have not discussed yet is the associative array, which can be used to associate a value, string, or token reference with a text string or a value in a named array. The following, somewhat naive, example illustrates the basic concept: 
	%{
		if (#float)
		{	. = .nxt;
			if (@ident)
			{	X[.txt] = 1;
				print .fnm ":" .lnr ": declaration of '" .txt "'\n";
			}
			Next;
		}
		if (@ident && X[.txt] > 0)
		{	print .fnm ":" .lnr ": use of float '" .txt "'\n";
		}
	%}
This example uses an associative array named X
to remember that we have seen the string .txt.
The array associates the array element in this case with a non-zero integer value.
Although in this case the right-hand side of the assignment is a value, it can also
be a string, or a token reference.
The value stored in X is retrieved in the condition of the second if-statement.
If there turns out to be no value stored for the string specified,
the value returned will be zero.The second if-statement checks for every identifier whether the corresponding text string from .txt was recorded before. If so, we know that this identifier first appeared following the C keyword float, and must therefore be a floating point variable. For simplicity here, this version ignores that variable declarations can include multiple names separated by commas, as well as initializers. 
	%{
		if (#float)
		{	. = .nxt;
			if (@ident)
			{	Store[.txt] = .;	# store the current location
				print .fnm ":" .lnr ": declaration of '" .txt "'\n";
			}
			Next;
		}
		if (@ident)
		{	q = Store[.txt];
			if (q.lnr != 0)
			{	print .fnm ":" .lnr ": use of float '" .txt "' ";
				print "declared at " q.fnm ":" q.lnr "\n";
		}	}
	%}
In this version we store and retrieve a token reference, but then need to check
that the retrieved value corresponds to an actual location that was set earlier.
We do so by checking the line number field, which is never zero for an actual token.NOTESToken variable and array references are preserved across runs of inline Cobra programs, which helps to make the following example work:
	%{
		# check the identifier length for all tokens
		# and remember the longest in q
		if (@ident && .len > q.len)
		{	q = .;
		}
	%}
	%{
		print "longest identifier: " q.txt " = " q.len " chars\n";
		Stop;	# stops the second run after the line is printed
	%}
SEE ALSOpe, ps, tokens, variables, expressions, arrays, predefined values, predefined functions, user-defined functions, keywords, save | ||||||||||||||||||||||||||||||||||||||||||
| Return to index Manual Tutorial | (Last Updated: 18 June 2024) | |||