A program slice is a subset of statements that is obtained from the original program, usually by removing zero or more statements. Slicing is often helpful in debugging and program understanding. For instance, it’s usually easier to locate the source of a variable on a program slice.
A backward slice is constructed from a target in the program, and all data flows in this slice end at the target.
angr has a built-in analysis, called BackwardSlice
, to construct a backward program slice.
This section will act as a how-to for angr’s BackwardSlice
analysis, and followed by some in-depth discussion over the implementation choices and limitations.
To build a BackwardSlice
, you will need the following information as input.
- Required CFG. A control flow graph (CFG) of the program. This CFG must be an accurate CFG (CFGEmulated).
- Required Target, which is the final destination that your backward slice terminates at.
- Optional CDG. A control dependence graph (CDG) derived from the CFG.
angr has a built-in analysis
CDG
for that purpose. - Optional DDG. A data dependence graph (DDG) built on top of the CFG.
angr has a built-in analysis
DDG
for that purpose.
A BackwardSlice
can be constructed with the following code:
>>> import angr
# Load the project
>>> b = angr.Project("examples/fauxware/fauxware", load_options={"auto_load_libs": False})
# Generate a CFG first. In order to generate data dependence graph afterwards, you’ll have to:
# - keep all input states by specifying keep_state=True.
# - store memory, register and temporary values accesses by adding the angr.options.refs option set.
# Feel free to provide more parameters (for example, context_sensitivity_level) for CFG
# recovery based on your needs.
>>> cfg = b.analyses.CFGEmulated(keep_state=True,
... state_add_options=angr.sim_options.refs,
... context_sensitivity_level=2)
# Generate the control dependence graph
>>> cdg = b.analyses.CDG(cfg)
# Build the data dependence graph. It might take a while. Be patient!
>>> ddg = b.analyses.DDG(cfg)
# See where we wanna go... let’s go to the exit() call, which is modeled as a
# SimProcedure.
>>> target_func = cfg.kb.functions.function(name="exit")
# We need the CFGNode instance
>>> target_node = cfg.get_any_node(target_func.addr)
# Let’s get a BackwardSlice out of them!
# `targets` is a list of objects, where each one is either a CodeLocation
# object, or a tuple of CFGNode instance and a statement ID. Setting statement
# ID to -1 means the very beginning of that CFGNode. A SimProcedure does not
# have any statement, so you should always specify -1 for it.
>>> bs = b.analyses.BackwardSlice(cfg, cdg=cdg, ddg=ddg, targets=[ (target_node, -1) ])
# Here is our awesome program slice!
>>> print(bs)
Sometimes it’s difficult to get a data dependence graph, or you may simply want build a program slice on top of a CFG.
That’s basically why DDG is an optional parameter.
You can build a BackwardSlice
solely based on CFG by doing:
>>> bs = b.analyses.BackwardSlice(cfg, control_flow_slice=True)
BackwardSlice (to [(<CFGNode exit (0x10000a0) [0]>, -1)])
Before you go ahead and use BackwardSlice
object, you should notice that the design of this class is fairly arbitrary right now, and it is still subject to change in the near future. We’ll try our best to keep this documentation up-to-date.
After construction, a BackwardSlice
has the following members which describe a program slice:
Member | Mode | Meaning |
---|---|---|
runs_in_slice | CFG-only | A networkx.DiGraph instance showing addresses of blocks and SimProcedures in the program slice, as well as transitions between them |
cfg_nodes_in_slice | CFG-only | A networkx.DiGraph instance showing CFGNodes in the program slice and transitions in between |
chosen_statements | With DDG | A dict mapping basic block addresses to lists of statement IDs that are part of the program slice |
chosen_exits | With DDG | A dict mapping basic block addresses to a list of “exits”. Each exit in the list is a valid transition in the program slice |
Each “exit” in chosen_exit
is a tuple including a statement ID and a list of target addresses.
For example, an “exit” might look like the following:
(35, [ 0x400020 ])
If the “exit” is the default exit of a basic block, it’ll look like the following:
(“default”, [ 0x400085 ])
TODO
Take a look at BackwardSlice.dbg_repr()
!
TODO
TODO
TODO
TODO
TODO