Cobra Reference Manual

COBRA Reference Manual
Code Browser and Analysis Tool

Principle of Operation

Cobra uses a lexical analyzer to scan in the source code in the files given as arguments on the command-line. It then builds a data structure that can be used for querying that source code, either interactively or with predefined scripts.

The internal data structure dat Cobra builds is a basic linked list of lexical tokens, annotated with some basic information and links to other tokens, for instance to identify matching pairs of parentheses, brackets and braces. The tool does not attempt to parse the code, which means that it can handle a broad range of possible inputs. Despite the simplicity of the data structure, the tool can be remarkably powerful in quickly locating complex patterns in a code base to assist in peer review, code development, or structural code analysis.

There are several ways to write queries. You can use:

Pattern search queries (see summary),
Interactive queries (overview below, or see the index),
Inline programs (see overview),
Standalone checkers (see overview).

Interactive queries are written in a simple command language that can support the most frequent types of searches. When more complex queries need to be handled, requiring anything other than a sequential scan of the token sequence, an inline Cobra programs offer a more powerful alternative. For still more complex queries, that require the construction of more elaborate data structures, the same infrastructure can be used to write standalone checkers in C that can be linked with the Cobra front-end.

Interactive query commands are by default applied to all tokens in the input sequence, optionally using parallel threads of execution where that can improve performance. The queries can be used to set, move, remove, extend, stretch, or inspect user-defined mark points in the code.

If directed to do so with command-line option -cpp, Cobra will use the standard gcc compiler to preprocess the source code before the data structure is built. This can be useful to make sure that the effects of macros and include files is taken into account when locating patterns of interest in the code. By default no preprocessing is done, which makes it possible to also query code bases that cannot be preprocessed or compiled, for instance because the required directives are unknown, or because the code is syntactically flawed.

Use cobra -help to see a list of all current command-line options that the tool recognizes. The synopsis below summarizes Cobra's interactive query command language. Details of each command are given in separate pages that are linked from this overview.

New Features

The first version of Cobra was developed at JPL in 2016. Since then, the tool has been significantly extended. The original release, for example, counted 9,796 lines of code, while the current version (4.5) counts 22,780 lines of code. Some of the new features are:

A richer scripting language for writing inline programs, supporting arbitrary variables, associative arrays, recursive functions, built-in functions, and concurrency control.
A new pattern matching option based on an intuitive adapted version of regular expressions, making it possible to quickly query code. The pattern matcher allows the definition of bound variables and embedded constraints that can direct and refine a search. Full regular expression matching capability also remains available.
Optional parallel processing of input files, which shortens the startup time especially for larger code bases.
Faster processing of queries. For instance, checking for empty else statements with the original version in 18.2 million lines of code would take 10 seconds, with the new Version 2 it takes about 2.5 seconds (in both cases using a single cpu-core).
An extended library of predefined scripts and script libraries (including more elaborate checkers for the Misra guidelines, the CERT coding standard, CWE vulnerabilities, the Power of 10 rules, and the JPL Coding Standard).
Online manual pages for all interactive commands, and for the scripting language that can be used for writing powerful inline programs.
Support for parameterized named def...end scripts.
A series of new commands, including cpp, terse, track, map.
A rethinking and normalization of many command-line options.

Overview of Query Commands

Most interactive query commands can be abbreviated to a single letter, which is useful for scripting and compact query formulation. Below is a quick index of the most important interactive query commands, listed in alphabetical order, with their shorthand (if available), the unabbreviated form, and a brief explanation. Each command is linked to a manual page with a more complete description. For a complete list see: index .

	a append	append an additional source file to the data structure
	b back		move marks back
	c contains	retain mark only if there is a match within the associated range
	e extend	retain mark only if it is followed by the given tokens
	i inspect   	show the lexical tokens for a given source line
	j jump		move mark to the other end of a range
	m mark		mark (or match) tokens if they match a pattern
	n next		move marks forward
	  pe		mark matches of a token pattern expression
	r reset		clear all marks and user-defined ranges
	  re		mark matches of a token regular expression
	< restore	restore all marks using a set 1..3
	> save		save all current marks and ranges in a set 1..3
	s stretch	set a range, starting at marked tokens upto the pattern specified
	u undo		undo the last change made
	  unmark        unmark tokens matching a pattern
	w with		restrict marks to tokens matching an additional constraint

Displaying things:

	d display	show source code context for marked tokens
	h history	show the command history
	l list		list marked tokens
	p pre		show preprocessed source code context for marked tokens
	t track		start or stop redirection of d/l/p output into a named file
	=		print a user-defined string with a value for each marked token

Two interactive commands mimick the Cobra command-line options -expr, and -pat or -pe. They are:

	re  expr	match a code pattern specified as a full regular expression
	pe  expr	similar to re but using a simplified form of regular expressions
	pat expr 	same as pe

Patterns (structured markings) can be manipulated in sets and displayed with the two following commands (which were added in Cobra version 3.6).

	dp args  	display structured patterns (all, or selected ones, optionally from a named set)
	ps args  	pattern set commands (create, delete, convert, list, etc.)

More on the format for expressions used with many of the query commands can be found in the Cobra man-page ($ man cobra), and in the Cobra help summary ($ cobra --).

A few other commands not yet mentioned:

	B		browse the source text of files (see also F and G)
	cfg		show the control flow graph for a given function
	context		show callers and callees for a given functions
	cpp             enabled or disable preprocessed or unpreprocessed code
	default		set a command to be executed after an empty command
	fcg		show the fct call graph, or a path in the fcg
	fcts		show names of all defined functions
	ff		show the source text for a specific function
	ft		show the source text for a specific structure definition
	F		list open files (see also B and G)
	G		search (grep) open files for a pattern (see also F and B)
	map		map token text to user-defined types
	ncore           change the number of cores to use
	q		(quit) terminate the session
	silent           enable or disable silent output
	terse           enable or disable terse output
	?		list all commands with a brief summary of use
	!		shell escape, to execute a system command (e.g., !date)
	.		(dot) read-in the scriptfile, specified as an argument
	:		(colon) execute a named script (often the : is unnecessary)
	def n() ...end  define a named command script
	%{...%}         execute an inline program script

A description of expressions and types can be found at these pages:

	expressions	expressions used in combination with m, w, or =
	types		predefined token types
	qualifiers      qualifiers that can be used with some commands

Synopsis

The main command that can be used to define new markings is:

	m mark		mark tokens if they match a pattern

Three commands can be used to move existing marks:

	b back		move marks back
	n next		move marks forward
	j jump		move mark to the other end of a range, eg {}, (), or []

The only command to associate a new range with a token (additional to predefined ranges) is:

	s stretch	set a range, starting at marked tokens upto the pattern specified

And, finally, four commands can be used to down-select (remove) existing marks:

	e extend	retain mark only if it is followed by the given tokens
	c contains	retain mark only if there is a match within the associated range
	m & expression	retain mark only if it also matches an expression
	w with		retain mark only if an additional constraint is met

where pattern can be a name, type, a regular expression or a cobra pattern expression.

The symbol $$ can be used to refer to the currently matched token, e.g., as set with a mark command, in next, back, extend, contains, or stretch commands.

Additional information

A brief tutorial.
Overview of Cobra command-line options
- command-line options
Defining and Using Pattern sets
- pattern searches
Setting or Moving Marks
- mark, map, next, back, jump
Changing Selections
- contains, extend, reset
Defining Ranges
- stretch
List, Display, Print
- inspect, list, display, pre, print
Global Queries that do not change markings
- B, cfg, context, default, fcg, fcts, ff, F, ft, G, ?, history, quit, !, track
Scripts, Inline programs, and Standalone programs
- dot, progs, standalone
Save, Restore, and Undo
- save, restore, undo
When to use escape characters
- escapes

Last Update: 29 August 2023