Cobra Inline Programs token fields and references

NAME

token fields and references

DESCRIPTION

A Cobra inline program is by default executed once for every token in the input sequence. That means that when the end of the program is reached (the closing symbol sequence %}) Cobra will move the current token position dor (written: .) one position forward, and repeat the execution of the program.

The key data object that can be interrogated in an inline program is therefore the current token itself: it's position and the various fields that it contains. There are three different types of fields of tokens can be referred to, depending on the type of value that they hold. They are:

	text strings:
		.fct		# function name
		.fnm		# file name
		.txt		# token text
		.typ		# token type
		.fct		# name of containing function, or "global"

	integer values:
		.round		# nesting level of ()
		.bracket	# nesting level of []
		.curly		# nesting level of {}
		.len		# length of token text
		.lnr		# linenumber
		.mark		# user-definable integer value
		.seq		# token sequence number
		.range		# the nr lines in the associated range, if any

	token positions:
		.		# the current token
		.nxt		# the immediately following token
		.prv		# the immediately preceding token
		.jmp		# move to other end of range, eg from { to } or back, if any
		.bound		# link to bound symbol reference, if any (user-defined)
A token position can be assigned to a variable, and then the token fields can be referred to using that variable name. For instance:
	if (.txt == "{")
	{	q = .jmp;	# q points to the matching } token
		r = q.jmp;	# r should now point back at the {
		assert(r == .);	# aborts the program if false
	}
Note that only one level of dereferencing is supported, so:
	r = .nxt.nxt;		# would give a syntax error
	r = .nxt; r = r.nxt;	# works
The .jmp field of a token is only set for tokens that are associated with predefined ranges, such as {, }, [, ], (, and ). Similarly, the .bound field is by default never set, but can be assigned a token reference by the user, in inline programs. This can, for instance, be useful to define the target of a goto, or to connect an if keyword to a possibly associated else. It's not hard to write a short inline program that does these assignments: some examples are in the cobra/rules/play directory. (A simple way to make these specific assignments to the .bound fields is also to use the predefined query command setlinks.)

In many cases you may want to check if the .jmp or .bound field holds a token reference or not. Alas, we cannot simply check the field for a zero value, so we have to check for the value of relevant fields -- keeping in mind that we can only use one level of dereferencing on tokens (so typing .jmp.seq will give a syntax error):

	tmp = .jmp;
	if (tmp.seq != 0)	# or (tmp.lnr > 0), or (tmp.fnm != "")
	{	print "the .jmp field holds a valid token reference\n";
	} else
	{	print "the .jmp field was not set\n";
	}
and similarly for the .bound field. Note that all valid tokens will have a sequence number and line number field greater than zero, except for tokens that are explicitly allocated with a call to the predefined function newtok().

A couple of token positions are predefined. They are:

	Begin	# the first token in the sequence processed by this core
	End	# the last token in that sequence
	first_t	# the very first token in the complete sequence
	last_t	# the last token in the complete sequence
Note that in multi-core processing (when ncore is larger than one) the token sequence is split into as many segments as there are cores, so that in this case each core only sees a subset of all tokens.

There are also two predefined functions that relate to the token being processed. They are:

	Next	# stops processing the current token, and moves to the next
		# token in the sequence to restart the inline program
	Stop	# stops processing the current token, and does not
		# proceed to the next token
There is an implied call to Next whenever the end of the inline program text is reached (the closing %} of the program). The use of Stop is illustrated by the following program that first collects data over the entire token sequence, and then prints the result with a single call:
	%{
		# check the identifier length for all tokens
		# and remember the longest in q

		if (@ident && .len > q.len)
		{	q = .;
		}
	%}
	%{
		print "longest identifier: " q.txt " = " q.len " chars\n";
		Stop;	# stops the second run after the line is printed
	%}
We could in principle achieve the same effect with a single inline program as follows:
	%{
		# check the identifier length for all tokens
		# and remember the longest in q

		if (@ident && .len > q.len)
		{	q = .;
		}
		if (. == End)
		{	print "longest identifier: " q.txt " = " q.len " chars\n";
		}
	%}
but at the expense that we now ask the program to check if we're at the end of the token sequence once for every token visited. This is avoided if we use the separate small program, with the Stop command to prevent that part from also executing once for every token in the sequence.

Inline Programs
Manual
Tutorial
(Last Updated: 11 May 2019)