Cobra Reference - inline programs: expressions

	Cobra	Inline Programs	expressions

	NAME expressions DESCRIPTION Grammar The BNF grammar for expressions is as follows: expr: ( expr ) -- parentheses expr bin_op expr -- binary operators prefix expr -- eg, !@ident, see below number -- integers only true -- same as 1 false -- same as 0 token_ref string variable function_call token_ref: . -- the current token name -- variable holding a token reference Begin -- first token for this core End -- last token for this core first_t -- first token of complete input sequence last_t -- last token of complete input sequence string: "..." -- any user-defined text string name -- a variable holding a string value variable: . name -- reference to a token field name . name -- reference to a token field name -- variable name [ string ] -- associative array function_call: name ( params ) -- predefined or user-defined functions A number is any sequence of the digits 0 to 9, representing integer values, or a the name of a variable holding an integer value. There is no representation for real or floating point numbers in inline programs. A string is any sequence of alphanumeric characters or underscores, starting with a letter. A literal string must be enclosed in double quotes. Variables can of course also hold strings, as in: varname = "hello world"; print varname "\n"; Binary Operators The binary operators are the following: bin_op: +, -, , /, % -- standard arithmetic >, >=, <, <=, ==, !=, \|\|, && -- boolean operators As usual, dividing by zero is unwise and will result in a syntax error when the program is executed. There is no predefined protection for value truncation due to integer overflow or underflow errors though. Unary Operators The unary prefix operators are: prefix: ! -- logical negation - -- unary minus ~ -- true if .txt contains pattern, eg ~yy ^ -- true if .txt starts with pattern, eg ^yy # -- true if .txt equals pattern, eg #yy @ -- true if .typ matches type, eg @ident Note that the last four unary operators are defined only for references to the current token. To perform similar operations on token references in variables, the predefined string functions discussed shortly (match, strlen, substr) can be used. The prefix operator # requires caution since the # symbol is also used as a comment separator. The rule is that if the # symbol is followed by a space or another # symbol then it is interpreted as a comment, and if followed immediately by text it is interpreted as the unary operator. An example of both types can be as follows: %{ if (#malloc) # check for tokens named "malloc" { print .fnm ":" .lnr ": call to malloc\n"; } %} Function calls and function declarations are described separately. String Functions and Regular Expressions There are three predefined functions that operate on strings. They are: match(s1, s2) -- true if string s1 matches s2, where s2 can be a regular expression strlen(s) -- returns the length of string s substr(s, n, m) -- returns the m-character substring of s starting at n The second argument of the match function can, but need not be, a regular expression. If the first character of the argument is a forward slash, then the argument is interpreted as regular expression. If this is not wanted, the slash can of course be escaped with a backslash character. Some examples are as follows: %{ if (match(.txt, "/[Yy][Yy]")) # regex { # matches if .txt contains YY yy Yy or yY print .fnm ":" .lnr " " .txt "\n"; } if (match(.fnm, "//usr")) # regex { # matches if the filename contains /usr .mark++; } if (match(.fnm, "\/usr")) # not a regex { # matches if the filename equals /usr cnt++; } if (.txt == "/usr") # not a regex { # matches if the filename equals /usr first_t.mark++; } %} The regular expression can match anywhere in the token text; it need not start at the beginning or end at the end of that text. A match can be forced to the start or end of the token text with meta-symbols ^ and $, as in standard Unix-style regular expressions, for instance: %{ if (match(.txt, "/^m")) { print .fnm ":" .lnr ": " .txt "\n"; } %} to print all tokens with a text starting with the letter m. Similarly, to print all token texts starting with m and ending with the letter y, we can use: %{ if (match(.txt, "/^m.y$")) { print .fnm ":" .lnr ": " .txt "\n"; } %} Here we also make use of meta-character . to match any character, and * to match zero or more of the preceding characters. The meta symbols can be escaped with a backslash to remove their special meaning.

Cobra

Inline Programs

expressions

NAME

expressions

DESCRIPTION

Grammar

The BNF grammar for expressions is as follows:

	expr:
		( expr )		-- parentheses
		expr bin_op expr	-- binary operators
		prefix expr		-- eg, !@ident, see below
		number			-- integers only
		true			-- same as 1
		false			-- same as 0
		token_ref
		string
		variable
		function_call

	token_ref:
		.			-- the current token
		name			-- variable holding a token reference
		Begin			-- first token for this core
		End			-- last token for this core
		first_t			-- first token of complete input sequence
		last_t			-- last token of complete input sequence

	string:
		"..."			-- any user-defined text string
		name			-- a variable holding a string value

	variable:
		. name			-- reference to a token field
		name . name		-- reference to a token field
		name			-- variable
		name [ string ]		-- associative array

	function_call:
		name ( params )		-- predefined or user-defined functions

A number is any sequence of the digits 0 to 9, representing integer values, or a the name of a variable holding an integer value. There is no representation for real or floating point numbers in inline programs.

A string is any sequence of alphanumeric characters or underscores, starting with a letter. A literal string must be enclosed in double quotes. Variables can of course also hold strings, as in:

	varname = "hello world";
	print varname "\n";

Binary Operators

The binary operators are the following:

	bin_op:
		+, -, *, /, %			-- standard arithmetic
		>, >=, <, <=, ==, !=, ||, &&	-- boolean operators

As usual, dividing by zero is unwise and will result in a syntax error when the program is executed. There is no predefined protection for value truncation due to integer overflow or underflow errors though.

Unary Operators

The unary prefix operators are:

	prefix:
		!			-- logical negation
		-			-- unary minus
		~			-- true if .txt contains pattern, eg ~yy
		^			-- true if .txt starts with pattern, eg ^yy
		#			-- true if .txt equals pattern, eg #yy
		@			-- true if .typ matches type, eg @ident

Note that the last four unary operators are defined only for references to the current token. To perform similar operations on token references in variables, the predefined string functions discussed shortly (match, strlen, substr) can be used.

The prefix operator # requires caution since the # symbol is also used as a comment separator. The rule is that if the # symbol is followed by a space or another # symbol then it is interpreted as a comment, and if followed immediately by text it is interpreted as the unary operator. An example of both types can be as follows:

	%{
		if (#malloc)	# check for tokens named "malloc"
		{	print .fnm ":" .lnr ": call to malloc\n";
		}
	%}

Function calls and function declarations are described separately.

String Functions and Regular Expressions

There are three predefined functions that operate on strings. They are:

	match(s1, s2)	-- true if string s1 matches s2, where s2 can be a regular expression
	strlen(s)	-- returns the length of string s
	substr(s, n, m)	-- returns the m-character substring of s starting at n

The second argument of the match function can, but need not be, a regular expression. If the first character of the argument is a forward slash, then the argument is interpreted as regular expression. If this is not wanted, the slash can of course be escaped with a backslash character. Some examples are as follows:

	%{
		if (match(.txt, "/[Yy][Yy]"))	# regex
		{	# matches if .txt contains YY yy Yy or yY
			print .fnm ":" .lnr " " .txt "\n";
		}
		if (match(.fnm, "//usr"))	# regex
		{	# matches if the filename contains /usr
			.mark++;
		}
		if (match(.fnm, "\/usr"))	# not a regex
		{	# matches if the filename equals /usr
			cnt++;
		}
		if (.txt == "/usr")		# not a regex
		{	# matches if the filename equals /usr
			first_t.mark++;
		}
	%}

The regular expression can match anywhere in the token text; it need not start at the beginning or end at the end of that text. A match can be forced to the start or end of the token text with meta-symbols ^ and $, as in standard Unix-style regular expressions, for instance:

	%{ if (match(.txt, "/^m")) { print .fnm ":" .lnr ": " .txt "\n"; } %}

to print all tokens with a text starting with the letter m. Similarly, to print all token texts starting with m and ending with the letter y, we can use:

	%{ if (match(.txt, "/^m.*y$")) { print .fnm ":" .lnr ": " .txt "\n"; } %}

Here we also make use of meta-character . to match any character, and * to match zero or more of the preceding characters. The meta symbols can be escaped with a backslash to remove their special meaning.


	Inline Programs Manual Tutorial		(Last Updated: 11 May 2019)