| Cobra | Inline Programs | expressions | ||
|---|---|---|---|---|
| NAMEexpressionsDESCRIPTIONGrammarThe BNF grammar for expressions is as follows:expr: ( expr ) -- parentheses expr bin_op expr -- binary operators prefix expr -- eg, !@ident, see below number -- integers only true -- same as 1 false -- same as 0 token_ref string variable function_call token_ref: . -- the current token name -- variable holding a token reference Begin -- first token for this core End -- last token for this core first_t -- first token of complete input sequence last_t -- last token of complete input sequence string: "..." -- any user-defined text string name -- a variable holding a string value variable: . name -- reference to a token field name . name -- reference to a token field name -- variable name [ string ] -- associative array function_call: name ( params ) -- predefined or user-defined functionsA number is any sequence of the digits 0 to 9, representing integer values, or a the name of a variable holding an integer value. There is no representation for real or floating point numbers in inline programs. A string is any sequence of alphanumeric characters or underscores, starting with a letter. A literal string must be enclosed in double quotes. Variables can of course also hold strings, as in: varname = "hello world"; print varname "\n"; Binary OperatorsThe binary operators are the following:bin_op: +, -, *, /, % -- standard arithmetic >, >=, <, <=, ==, !=, ||, && -- boolean operatorsAs usual, dividing by zero is unwise and will result in a syntax error when the program is executed. There is no predefined protection for value truncation due to integer overflow or underflow errors though. Unary OperatorsThe unary prefix operators are:prefix: ! -- logical negation - -- unary minus ~ -- true if .txt contains pattern, eg ~yy ^ -- true if .txt starts with pattern, eg ^yy # -- true if .txt equals pattern, eg #yy @ -- true if .typ matches type, eg @identNote that the last four unary operators are defined only for references to the current token. To perform similar operations on token references in variables, the predefined string functions discussed shortly (match, strlen, substr) can be used. The prefix operator # requires caution since the # symbol is also used as a comment separator. The rule is that if the # symbol is followed by a space or another # symbol then it is interpreted as a comment, and if followed immediately by text it is interpreted as the unary operator. An example of both types can be as follows: 
	%{
		if (#malloc)	# check for tokens named "malloc"
		{	print .fnm ":" .lnr ": call to malloc\n";
		}
	%}
Function calls and function declarations are
described separately.String Functions and Regular ExpressionsThere are three predefined functions that operate on strings. They are:match(s1, s2) -- true if string s1 matches s2, where s2 can be a regular expression strlen(s) -- returns the length of string s substr(s, n, m) -- returns the m-character substring of s starting at nThe second argument of the match function can, but need not be, a regular expression. If the first character of the argument is a forward slash, then the argument is interpreted as regular expression. If this is not wanted, the slash can of course be escaped with a backslash character. Some examples are as follows: 
	%{
		if (match(.txt, "/[Yy][Yy]"))	# regex
		{	# matches if .txt contains YY yy Yy or yY
			print .fnm ":" .lnr " " .txt "\n";
		}
		if (match(.fnm, "//usr"))	# regex
		{	# matches if the filename contains /usr
			.mark++;
		}
		if (match(.fnm, "\/usr"))	# not a regex
		{	# matches if the filename equals /usr
			cnt++;
		}
		if (.txt == "/usr")		# not a regex
		{	# matches if the filename equals /usr
			first_t.mark++;
		}
	%}
The regular expression can match anywhere in the token text; it need not start at
the beginning or end at the end of that text. A match can be forced to the start
or end of the token text with meta-symbols ^ and $, as in standard Unix-style
regular expressions, for instance:
	%{ if (match(.txt, "/^m")) { print .fnm ":" .lnr ": " .txt "\n"; } %}
to print all tokens with a text starting with the letter m.
Similarly, to print all token texts starting with m and ending with the letter y,
we can use:
	%{ if (match(.txt, "/^m.*y$")) { print .fnm ":" .lnr ": " .txt "\n"; } %}
Here we also make use of meta-character . to match any character, and * to match
zero or more of the preceding characters.
The meta symbols can be escaped with a backslash to remove their special meaning. | ||||
| Inline Programs Manual Tutorial | (Last Updated: 11 May 2019) | |||