Cobra Reference - inline programs: string functions

	Cobra	Inline Programs	functions

	NAME string functions DESCRIPTION The following predefined functions are defined for operations on text strings: disambiguate(s) -- returns a modified string that maps near chars like O 0 o to a single unique char gsub(s1, s2, s3) -- returns (a copy of) s3 with all occurrences of string s1 replaced with string s2 hash(e) -- returns an integer number with the hash value for string e hasharray(a, i) -- returns hash value for strings stored in array a, starting at index i itostr(e) -- returns a string represenation of expression e (e.g., of a value) match(s1, s2) -- true if string s1 matches s2, where s2 can be a regular expression split(s, a) -- returns the number of comma-separated fields in string s and assigns them to array a split(s, c, a) -- returns the number of fields separated by 1-character string c in string s and assigns them to array a strstr(s1, s2) -- returns 1+ the index in string s1 of the first occurrence of string s2, or 0 strrstr(s1, s2) -- returns 1+ the index in string s1 of the last occurrence o string s2, or 0 strcmp(s1, s2) -- returns compares strings s1 and s2, returning -1, 0 (when equal), or +1 strlen(s) -- returns the length of string s substr(s, n, m) -- returns the m-character substring of s starting at n The function disambiguate(s) takes a string s and replaces closely related characters with a fixed unique representation. Specifically, I and l are mapped to 1, S to 5, Z to 2, B to 8, O and o to 0 (zero). The function is useful when comparing the modified string to the original. If it differs, there was use of potentially confusing characters and a warning can be issued. This supports compliance with a MISRA coding rule to avoid easily confused names (M2.20.1). The function hash(e) expects a string, or a variable holding a string, as an argument and returns an integer number with the hash value calculated for that string. If the argument is an associative array, the function assumes that the array stores only string values and it will compute the hash over a concatenation of all the strings stored, starting at array index 0. In creating the concatenation the function places a single space in between each string collected. If, for an associative array, a different starting index value than zero is needed, the function hasharray(a,i) can be used, where the second argument specifies the starting index, which should fall within the size of the array. In this case, a must be an array. The call hasharray(a,0) is equivalent to hash(a). The function itostr(e) returns a string representation of an integer value, which can be useful when the type has to be forced into a string representation. (Normally such conversions are done automatically, based on context, so this function should rarely be needed.) Function match(s1, s2) takes two string arguments and returns true when they match. The second argument s2 can also be given as a regular expression. The regular expression can contain the meta-character ^ to match the start of the string and $ to indicate its end. Similarly, a range of possible character matches can be indicated with [...], and a negated range with [^...]. Note that this standard regular expression notation differs from the convention used in pattern expressions (the pe command). Indices for strings used in the substr function start at 0, so substr(.txt, 0, 1) returns a string containing the first character in the token name, Other examples are: %{ if (match(.txt, "/[Yy][Yy]")) # regex { # matches if .txt contains YY yy Yy or yY print .fnm ":" .lnr " " .txt "\n"; } if (match(.fnm, "//usr")) # regex { # matches if the filename contains /usr .mark++; } if (match(.fnm, "\/usr")) # not a regex { # matches if the filename equals /usr cnt++; } if (.txt == "/usr") # not a regex { # matches if the filename equals /usr first_t.mark++; } A["appel", "pear", "lemon"]++; for (i in A) { # supported in Cobra version 4.4 or later (jan '23) n = split(i.txt, fields); # (re)create array fields with 3 elements # in Version 4.7 and later this is equivalent to: # n = split(i, fields); # or # n = split(i, ",", fields); # returns: 3 # each holding one of the index fields while (n > 0) { n--; print "part " n ": " itostr(n) " " fields[n] "\n"; # prints: # part 2: lemon # part 1: pear # part 0: appel } } %} And, more specifically for the split, strstr, and strrstr functions. %{ n1 = split("red,green,blue,yellow,", AA); n2 = split("/appel/pear/banana/orange", "/", BB); n3 = strstr("red,green.blue:velvet", "."); n4 = strrstr("red,green.blue:velvet,", ","); print n1 "\t" n2 "\t" AA[1] "\t" BB[1] "\n"; print n3 " :: " n4 "\n"; print ">> " substr("red,green.blue:velvet", n3, 4) "\n"; Stop; %} Note that the definition of the strstr and strrstr functions differs from the versions used in tool such as awk by returning the index (plus 1 to reserve the return value zero for indicating non-matches) rather than a string starting at the matched character. The last script, for example, produces the following output: 4 5 green appel 10 :: 22 >> blue If regular expressions aren't required, it can be simpler to use the builtin shortcut operators #, @, ^ or ~ for referring to token text fields (see expressions), as in: if (#/usr) # is the same as: if (match(.txt, "\/usr")) if (@const_int) # is the same as: if (match(.typ, "const_int")) A sequence of one or more # symbols is interpreted to start a comment only if it is immediately followed by a space or tab. In the above example we use # both as an operator and as a comment delimiter. Finally, the Cobra version of gsub also differs from its counterpart in awk by returning a copy of the target string with the replacements made, rather than doing the replacements in place inside the target string.

Cobra

Inline Programs

functions

NAME

string functions

DESCRIPTION

The following predefined functions are defined for operations on text strings:

	disambiguate(s)	-- returns a modified string that maps near chars like O 0 o to a single unique char
	gsub(s1, s2, s3) -- returns (a copy of) s3 with all occurrences of string s1 replaced with string s2
	hash(e)  	-- returns an integer number with the hash value for string e
	hasharray(a, i)	-- returns hash value for strings stored in array a, starting at index i
	itostr(e)	-- returns a string represenation of expression e (e.g., of a value)
	match(s1, s2)	-- true if string s1 matches s2, where s2 can be a regular expression
	split(s, a)  	-- returns the number of comma-separated fields in string s and assigns them to array a
	split(s, c, a) 	-- returns the number of fields separated by 1-character string c in string s and assigns them to array a
	strstr(s1, s2)	-- returns 1+ the index in string s1 of the first occurrence of string s2, or 0
	strrstr(s1, s2)	-- returns 1+ the index in string s1 of the last occurrence o string s2, or 0
	strcmp(s1, s2)	-- returns compares strings s1 and s2, returning -1, 0 (when equal), or +1
	strlen(s)	-- returns the length of string s
	substr(s, n, m)	-- returns the m-character substring of s starting at n

The function disambiguate(s) takes a string s and replaces closely related characters with a fixed unique representation. Specifically, I and l are mapped to 1, S to 5, Z to 2, B to 8, O and o to 0 (zero). The function is useful when comparing the modified string to the original. If it differs, there was use of potentially confusing characters and a warning can be issued. This supports compliance with a MISRA coding rule to avoid easily confused names (M2.20.1).

The function hash(e) expects a string, or a variable holding a string, as an argument and returns an integer number with the hash value calculated for that string. If the argument is an associative array, the function assumes that the array stores only string values and it will compute the hash over a concatenation of all the strings stored, starting at array index 0. In creating the concatenation the function places a single space in between each string collected.
If, for an associative array, a different starting index value than zero is needed, the function hasharray(a,i) can be used, where the second argument specifies the starting index, which should fall within the size of the array. In this case, a must be an array. The call hasharray(a,0) is equivalent to hash(a).

The function itostr(e) returns a string representation of an integer value, which can be useful when the type has to be forced into a string representation. (Normally such conversions are done automatically, based on context, so this function should rarely be needed.)

Function match(s1, s2) takes two string arguments and returns true when they match. The second argument s2 can also be given as a regular expression. The regular expression can contain the meta-character ^ to match the start of the string and $ to indicate its end. Similarly, a range of possible character matches can be indicated with [...], and a negated range with [^...].
Note that this standard regular expression notation differs from the convention used in pattern expressions (the pe command).

Indices for strings used in the substr function start at 0, so substr(.txt, 0, 1) returns a string containing the first character in the token name,
Other examples are:

	%{
		if (match(.txt, "/[Yy][Yy]"))	# regex
		{	# matches if .txt contains YY yy Yy or yY
			print .fnm ":" .lnr " " .txt "\n";
		}
		if (match(.fnm, "//usr"))	# regex
		{	# matches if the filename contains /usr
			.mark++;
		}
		if (match(.fnm, "\/usr"))	# not a regex
		{	# matches if the filename equals /usr
			cnt++;
		}
		if (.txt == "/usr")		# not a regex
		{	# matches if the filename equals /usr
			first_t.mark++;
		}
		A["appel", "pear", "lemon"]++;
		for (i in A) {
			# supported in Cobra version 4.4 or later (jan '23)
			n = split(i.txt, fields); # (re)create array fields with 3 elements
			# in Version 4.7 and later this is equivalent to:
			#	n = split(i, fields);
			# or
			#	n = split(i, ",", fields);
			# returns: 3		  # each holding one of the index fields
			while (n > 0)
			{	n--;
				print "part " n ": " itostr(n) " " fields[n] "\n";
				# prints:
				# part 2: lemon
				# part 1: pear
				# part 0: appel
		}	}
	%}

And, more specifically for the split, strstr, and strrstr functions.

	%{
		n1 = split("red,green,blue,yellow,", AA);
		n2 = split("/appel/pear/banana/orange", "/", BB);
		n3 = strstr("red,green.blue:velvet", ".");
		n4 = strrstr("red,green.blue:velvet,", ",");
		print n1 "\t" n2 "\t" AA[1] "\t" BB[1] "\n";
		print n3 " :: " n4 "\n";
	
		print ">> " substr("red,green.blue:velvet", n3, 4) "\n";
		Stop;
	%}

Note that the definition of the strstr and strrstr functions differs from the versions used in tool such as awk by returning the index (plus 1 to reserve the return value zero for indicating non-matches) rather than a string starting at the matched character.
The last script, for example, produces the following output:

	4	5	green	appel
	10 :: 22
	>> blue

If regular expressions aren't required, it can be simpler to use the builtin shortcut operators #, @, ^ or ~ for referring to token text fields (see expressions), as in:

	if (#/usr)	# is the same as:  if (match(.txt, "\/usr"))
	if (@const_int)	# is the same as:  if (match(.typ, "const_int"))

A sequence of one or more # symbols is interpreted to start a comment only if it is immediately followed by a space or tab. In the above example we use # both as an operator and as a comment delimiter.

Finally, the Cobra version of gsub also differs from its counterpart in awk by returning a copy of the target string with the replacements made, rather than doing the replacements in place inside the target string.


	Inline Programs Manual Tutorial		(Last Updated: 21 February 2025)