Cobra Interactive Query Language pset

NAME

pset -- builtin function for Cobra programs

SYNTAX

	lst = pset(N)
	lst = pset(N) with (expr)
	lst = pset(N) with "string"
	list2set(lst, "M")

DESCRIPTION

The builtin function
	lst = pset(N);
copies pattern set N (which must exist) and makes it available through the (newly created) list of lexical tokens lst.
Argument N (starting with Version 4.4) can be given as either a name, as in this example, as a quoted string, like "N", or as a variable that holds a name, like:
	vn = "N";
	lst = pset(vn);
Each element in lst points to one complete match of the pattern. Two sub-fields give access to the start and end of that match. For instance, for the first match returned:
	lst.p_start 
	lst.p_end
point to the first and last lexical token of that match.

Bound Variables
A bound variable, if present, can be accessed through two additional fields:

	lst.p_bdef
	lst.p_bref
If these fields are non-zero, they point to respectively the definition and the reference (i.e., first use) of a bound variable from the pattern. Note that not every pattern uses bound variables. Both definition and reference are token references.

Each element in the matches returned is guaranteed to have a non-zero sequence number field lst.seq, which can be used to identify the end of the list when traversing it (see Examples).

Wildcards
The argument * can be used to cause the function to create a single list of all matches from all pattern sets that are currently defined, as in:

	all = pset(*);

Constraints: Conditions
Starting with Cobra version 4.8, it is possible to define constraints on the matches that are included in the list, by adding a with (expr) or a with "string" suffix.
For instance,

	lst1 = pset(N) with (.fnm == "cobra_te.c" && .lnr > 100);
extracts only matches in the file cobra_te.c that occur on lines in higher than 100. A constraint can refer to any token attributes of the starting lexical token of each match in the pattern set, including especially the attribute .range that specifies the length of the match in lines. For instance
	lst3 = pset(N) with (.range >= 10)
selects all matches that are at least ten source lines long.

Constraints: Strings
A constraint can also be given as a string, which then causes the pset function to only select matches in the pattern set that contain the given string in the .txt field of any token in the match. For example

	lst4 = pset(N) with "plst";
selects only matches that contain the word plst anywhere among the matched tokens, and
	lst5 = pset(N) with "else";
select matches that contain the keyword else.

Set Conversion
The result of a call to pset can be converted back into a regular pattern set with the builtin function list2set, as in:

	list2set(lst, "V");
which either creates (if it doesn't exist) or adds the matches from lst to pattern set V, assuming that list lst is not empty.

EXAMPLES

Define a pattern set A and copy the matches into a list to make the matches accessible to the inline program:
	$ cobra cobra_list.c
	1 core, 1 files, 1192 tokens
	: pe A: while ( .* )	# create a sample pattern set A
	1 patterns stored in set 'A'
	: %{
		x = pset(A);	# copy the set into list x
		# alternative forms (Version 4.4 or later):
		#	x = pset("A");
		# or using a string variable:
		#	vn = "A";
		#	x = pset(vn);
		Stop; # don't repeat for every token in the input
	%}
	: dp A
	1: cobra_list.c:7..7
		7  #include "cobra.h"
	2: cobra_list.c:206..206
		206          while (p)
	2 patterns
Next, traverse the matches of that pattern, now stored in list x.
The global variable persists across inline programs within a Cobra session.

The simplest way to traverse the matches in a pattern set is to use a foreach statement directly on the named set (i.e., without using pset):

	%{
		foreach (m in A)
		{	print m.fnm ":" m.lnr " pattern of " m.range " lines";
			cnt = 0;
			foreach (t in m)
			{	cnt++;
			}
			print " and " cnt " tokens\n";
		}
		Stop;
	%}
	cobra_list.c:208 pattern of 7 lines and 41 tokens
This can also be done more explicitly through a pset generated list:
	: %{
		# x = pset(A);		# as set before
		while (x.seq != 0)	# the seq tag is non-zero
		{	y = x.p_start;	# first token in pattern
			z = x.p_end;	# last token in pattern
			print x.seq ":" y.fnm ":" y.lnr "::";
			while (y <= z)	# print the pattern
			{	print " " y.txt;
				y = y.nxt;
			}
			print "\n";

			# locate the first bound variable, if present:
			v = x.p_bdef;
			w = x.p_bref;
			if (v.seq != 0 && w.seq != 0)
			{ print "\tbound variable def: " v.fnm ":" v.lnr ": " v.txt "\n";
			  print "\tbound variable ref: " w.fnm ":" w.lnr ": " w.txt "\n";
			}
			x = x.nxt;	# next pattern in sequence
		}
		Stop;
	%}
	7:cobra_list.c:208:: while ( p )
	: q
	$

NOTES

The list that is returned by a pset call exists separately from the target pattern set, which means that if the pattern set changes (by adding or removing some of the matches), the list remains unchanged and equal to the contents of the set at the time it was created. Similarly, making changes in the list will not change the contents of the pattern set. Converting a list back into a named pattern set can be done with the list2set function.

If there are multiple bound variables, which is not common, only the first one is accessible.

Internally, the list of matches returned by a call to pset is defined as a sequence of token references, with the names p_start and p_end defined as synonyms to the standard token attributes jmp and bound.

SEE ALSO

add_pattern, functions_predefined,

Return to index
Manual
Tutorial
(Last Updated: 22 June 2024)