|
COBRA Reference Manual
Code Browser and Analysis Tool
Cobra uses a lexical analyzer to scan in the source code
in the files given as arguments on the command-line.
It then builds a data structure that can be used for querying that
source code, either interactively or with predefined scripts.
The internal data structure dat Cobra builds is a basic linked list of
lexical tokens, annotated with some basic information and links
to other tokens, for instance to identify
matching pairs of parentheses, brackets and braces.
The tool does not attempt to parse the code, which means that
it can handle a broad range of possible inputs.
Despite the simplicity of the data structure, the tool can be
remarkably powerful in quickly locating complex patterns
in a code base to assist in
peer review, code development, or structural code analysis.
There are several ways to write queries.
You can use:
- Pattern search queries (see summary),
- Interactive queries (overview below, or see the index),
- Inline programs (see overview),
- Standalone checkers (see overview).
Interactive queries are written in a simple command language
that can support the most frequent types of searches.
When more complex queries need to be handled, requiring
anything other than a sequential scan of the token sequence, an
inline Cobra programs offer a more powerful alternative.
For still more complex queries, that
require the construction of more elaborate data structures,
the same infrastructure can be used to write standalone
checkers in C that can be linked with the Cobra front-end.
Interactive query commands are by default applied to all tokens in the
input sequence, optionally using parallel threads of execution where
that can improve performance.
The queries can be used to set, move, remove,
extend, stretch, or inspect user-defined mark points in the code.
If directed to do so with command-line option -cpp, Cobra
will use the standard gcc compiler to preprocess the source code before
the data structure is built. This can be useful to make sure that the
effects of macros and include files is taken into account when locating patterns of
interest in the code. By default no preprocessing is done, which
makes it possible to also query code bases that cannot be preprocessed
or compiled, for instance because the required directives are unknown,
or because the code is syntactically flawed.
Use cobra -help to see a list of all current command-line
options that the tool recognizes. The synopsis below summarizes
Cobra's interactive query command language. Details of each command
are given in separate pages that are linked from this overview.
New Features
The first version of Cobra was developed at JPL in 2016.
Since then, the tool has been significantly extended.
The original release, for example, counted 9,796 lines of code,
while the current version (4.5) counts 22,780 lines of code.
Some of the new features are:
- A richer scripting language for
writing inline programs, supporting arbitrary variables,
associative arrays, recursive functions, built-in functions, and concurrency control.
- A new pattern matching option based on an intuitive adapted
version of regular expressions, making it possible to quickly query code.
The pattern matcher allows the definition of bound variables and
embedded constraints that can direct and refine a search.
Full regular expression matching capability also remains available.
- Optional parallel processing of input files, which shortens the
startup time especially for larger code bases.
- Faster processing of queries. For instance, checking for empty else
statements with the original version in 18.2 million lines of code would take 10 seconds,
with the new Version 2 it takes about 2.5 seconds (in both cases using a single
cpu-core).
- An extended library of predefined scripts and script libraries
(including more elaborate checkers for the Misra guidelines,
the CERT coding standard, CWE vulnerabilities,
the Power of 10 rules, and the JPL Coding Standard).
- Online manual pages for all interactive commands, and
for the scripting language that can be used for writing
powerful inline programs.
- Support for parameterized named def...end scripts.
- A series of new commands, including cpp,
terse, track,
map.
- A rethinking and normalization of many command-line options.
Most interactive query commands can be abbreviated to a single letter, which is
useful for scripting and compact query formulation.
Below is a quick index of the most important interactive query commands, listed in alphabetical order,
with their shorthand (if available), the unabbreviated form, and a brief explanation.
Each command is linked to a manual page with a more complete description.
For a complete list see: index.
a append append an additional source file to the data structure
b back move marks back
c contains retain mark only if there is a match within the associated range
e extend retain mark only if it is followed by the given tokens
i inspect show the lexical tokens for a given source line
j jump move mark to the other end of a range
m mark mark (or match) tokens if they match a pattern
n next move marks forward
pe mark matches of a token pattern expression
r reset clear all marks and user-defined ranges
re mark matches of a token regular expression
< restore restore all marks using a set 1..3
> save save all current marks and ranges in a set 1..3
s stretch set a range, starting at marked tokens upto the pattern specified
u undo undo the last change made
unmark unmark tokens matching a pattern
w with restrict marks to tokens matching an additional constraint
Displaying things:
d display show source code context for marked tokens
h history show the command history
l list list marked tokens
p pre show preprocessed source code context for marked tokens
t track start or stop redirection of d/l/p output into a named file
= print a user-defined string with a value for each marked token
Two interactive commands mimick the Cobra command-line options -expr, and -pat or -pe. They are:
re expr match a code pattern specified as a full regular expression
pe expr similar to re but using a simplified form of regular expressions
pat expr same as pe
Patterns (structured markings) can be manipulated in sets and displayed with the two following commands
(which were added in Cobra version 3.6).
dp args display structured patterns (all, or selected ones, optionally from a named set)
ps args pattern set commands (create, delete, convert, list, etc.)
More on the format for expressions used with many of the query commands can be found in
the Cobra man-page ($ man cobra), and in the Cobra help summary ($ cobra --).
A few other commands not yet mentioned:
B browse the source text of files (see also F and G)
cfg show the control flow graph for a given function
context show callers and callees for a given functions
cpp enabled or disable preprocessed or unpreprocessed code
default set a command to be executed after an empty command
fcg show the fct call graph, or a path in the fcg
fcts show names of all defined functions
ff show the source text for a specific function
ft show the source text for a specific structure definition
F list open files (see also B and G)
G search (grep) open files for a pattern (see also F and B)
map map token text to user-defined types
ncore change the number of cores to use
q (quit) terminate the session
silent enable or disable silent output
terse enable or disable terse output
? list all commands with a brief summary of use
! shell escape, to execute a system command (e.g., !date)
. (dot) read-in the scriptfile, specified as an argument
: (colon) execute a named script (often the : is unnecessary)
def n() ...end define a named command script
%{...%} execute an inline program script
A description of expressions and types can be found at these pages:
expressions expressions used in combination with m, w, or =
types predefined token types
qualifiers qualifiers that can be used with some commands
Synopsis
The main command that can be used to define new markings is:
m mark mark tokens if they match a pattern
Three commands can be used to move existing marks:
b back move marks back
n next move marks forward
j jump move mark to the other end of a range, eg {}, (), or []
The only command to associate a new range with
a token (additional to predefined ranges) is:
s stretch set a range, starting at marked tokens upto the pattern specified
And, finally, four commands can be used to down-select (remove) existing marks:
e extend retain mark only if it is followed by the given tokens
c contains retain mark only if there is a match within the associated range
m & expression retain mark only if it also matches an expression
w with retain mark only if an additional constraint is met
where pattern can be a name, type, a regular expression or a cobra pattern expression.
The symbol $$ can be used to refer to the currently matched token,
e.g., as set with a mark command, in next,
back, extend, contains, or stretch commands.
Additional information
- A brief tutorial.
- Overview of Cobra command-line options
- Defining and Using Pattern sets
- Setting or Moving Marks
- Changing Selections
- Defining Ranges
- List, Display, Print
- Global Queries that do not change markings
-
B,
cfg,
context,
default,
fcg,
fcts,
ff,
F,
ft,
G,
?,
history,
quit,
!,
track
- Scripts, Inline programs, and Standalone programs
- Save, Restore, and Undo
- When to use escape characters
Last Update: 29 August 2023
|
| |