Cobra Reference - inline programs: tokens

Cobra Inline Programs token fields and references

	Cobra	Inline Programs	token fields and references

	NAME token fields and references DESCRIPTION A Cobra inline program is by default executed once for every token in the input sequence. That means that when the end of the program is reached (the closing symbol sequence `%}`) Cobra will move the current token position dor (written: .) one position forward, and repeat the execution of the program. The key data object that can be interrogated in an inline program is therefore the current token itself: it's position and the various fields that it contains. There are three different types of fields of tokens can be referred to, depending on the type of value that they hold. They are: text strings: .fct # function name .fnm # file name .txt # token text .typ # token type .fct # name of containing function, or "global" integer values: .round # nesting level of () .bracket # nesting level of [] .curly # nesting level of {} .len # length of token text .lnr # linenumber .mark # user-definable integer value .seq # token sequence number .range # the nr lines in the associated range, if any token positions: . # the current token .nxt # the immediately following token .prv # the immediately preceding token .jmp # move to other end of range, eg from { to } or back, if any .bound # link to bound symbol reference, if any (user-defined) The dot symbol that is used here refers to the current token being processed. A token position can be assigned to a variable, and then the token fields can be referred to using that variable name. For instance: if (.txt == "{") { q = .jmp; # q points to the matching } token r = q.jmp; # r should now point back at the { assert(r == .); # aborts the program if false } Note that only one level of dereferencing is supported, so: r = .nxt.nxt; # would give a syntax error r = .nxt; r = r.nxt; # works The .jmp field of a token is only set for tokens that are associated with predefined ranges, such as {, }, [, ], (, and ). Similarly, the .bound field is by default never set, but can be assigned a token reference by the user, in inline programs. This can, for instance, be useful to define the target of a `goto`, or to connect an `if` keyword to a possibly associated `else`. It's not hard to write a short inline program that does these assignments: some examples are in the `cobra/rules/play` directory. (A simple way to make these specific assignments to the .bound fields is also to use the predefined query command setlinks.) In many cases you may want to check if the .jmp or .bound field holds a token reference or not. Alas, we cannot simply check the field for a zero value, so we have to check for the value of relevant fields -- keeping in mind that we can only use one level of dereferencing on tokens (so typing `.jmp.seq` will give a syntax error): tmp = .jmp; if (tmp.seq != 0) # or (tmp.lnr > 0), or (tmp.fnm != "") { print "the .jmp field holds a valid token reference\n"; } else { print "the .jmp field was not set\n"; } and similarly for the .bound field. Note that all valid tokens will have a sequence number and line number field greater than zero, except for tokens that are explicitly allocated with a call to the predefined function `newtok()`. A couple of token positions are predefined. They are: Begin # the first token in the sequence processed by this core End # the last token in that sequence first_t # the very first token in the complete sequence last_t # the last token in the complete sequence They are described in more detail at this page Begin. There are also two predefined functions that relate to the token being processed. They are: Next # stops processing the current token, and moves to the next # token in the sequence to restart the inline program Stop # stops processing the current token, and does not # proceed to the next token They are described in more detail at this page Next.

NAME

token fields and references

DESCRIPTION

A Cobra inline program is by default executed once for every token in the input sequence. That means that when the end of the program is reached (the closing symbol sequence %}) Cobra will move the current token position dor (written: .) one position forward, and repeat the execution of the program.

The key data object that can be interrogated in an inline program is therefore the current token itself: it's position and the various fields that it contains. There are three different types of fields of tokens can be referred to, depending on the type of value that they hold. They are:

	text strings:
		.fct		# function name
		.fnm		# file name
		.txt		# token text
		.typ		# token type
		.fct		# name of containing function, or "global"

	integer values:
		.round		# nesting level of ()
		.bracket	# nesting level of []
		.curly		# nesting level of {}
		.len		# length of token text
		.lnr		# linenumber
		.mark		# user-definable integer value
		.seq		# token sequence number
		.range		# the nr lines in the associated range, if any

	token positions:
		.		# the current token
		.nxt		# the immediately following token
		.prv		# the immediately preceding token
		.jmp		# move to other end of range, eg from { to } or back, if any
		.bound		# link to bound symbol reference, if any (user-defined)

The dot symbol that is used here refers to the current token being processed. A token position can be assigned to a variable, and then the token fields can be referred to using that variable name. For instance:

	if (.txt == "{")
	{	q = .jmp;	# q points to the matching } token
		r = q.jmp;	# r should now point back at the {
		assert(r == .);	# aborts the program if false
	}

Note that only one level of dereferencing is supported, so:

	r = .nxt.nxt;		# would give a syntax error
	r = .nxt; r = r.nxt;	# works

The .jmp field of a token is only set for tokens that are associated with predefined ranges, such as {, }, [, ], (, and ). Similarly, the .bound field is by default never set, but can be assigned a token reference by the user, in inline programs. This can, for instance, be useful to define the target of a goto, or to connect an if keyword to a possibly associated else. It's not hard to write a short inline program that does these assignments: some examples are in the cobra/rules/play directory. (A simple way to make these specific assignments to the .bound fields is also to use the predefined query command setlinks.)

In many cases you may want to check if the .jmp or .bound field holds a token reference or not. Alas, we cannot simply check the field for a zero value, so we have to check for the value of relevant fields -- keeping in mind that we can only use one level of dereferencing on tokens (so typing .jmp.seq will give a syntax error):

	tmp = .jmp;
	if (tmp.seq != 0)	# or (tmp.lnr > 0), or (tmp.fnm != "")
	{	print "the .jmp field holds a valid token reference\n";
	} else
	{	print "the .jmp field was not set\n";
	}

and similarly for the .bound field. Note that all valid tokens will have a sequence number and line number field greater than zero, except for tokens that are explicitly allocated with a call to the predefined function newtok().

A couple of token positions are predefined. They are:

	Begin	# the first token in the sequence processed by this core
	End	# the last token in that sequence
	first_t	# the very first token in the complete sequence
	last_t	# the last token in the complete sequence

They are described in more detail at this page Begin.

There are also two predefined functions that relate to the token being processed. They are:

	Next	# stops processing the current token, and moves to the next
		# token in the sequence to restart the inline program
	Stop	# stops processing the current token, and does not
		# proceed to the next token

They are described in more detail at this page Next.


	Inline Programs Manual Tutorial		(Last Updated: 17 August 2023)