Skip to content
This repository has been archived by the owner on Apr 28, 2023. It is now read-only.

Latest commit

 

History

History
132 lines (99 loc) · 9.87 KB

mixins.md

File metadata and controls

132 lines (99 loc) · 9.87 KB

What's Up With Mixins, Anyway?

If you are trying to work more intently with the deeper parts of angr, you will need to understand one of the design patterns we use frequently: the mixin pattern.

In brief, the mixin pattern is where Python's subclassing features is used not to implement IS-A relationships (a Child is a kind of Person) but instead to implement pieces of functionality for a type in different classes to make more modular and maintainable code. Here's an example of the mixin pattern in action:

class Base:
    def add_one(self, v):
        return v + 1
        
class StringsMixin(Base):
    def add_one(self, v):
        coerce = type(v) is str
        if coerce:
            v = int(v)
        result = super().add_one(v)
        if coerce:
            result = str(result)
        return result
        
class ArraysMixin(Base):
    def add_one(self, v):
        if type(v) is list:
            return [super().add_one(v_x) for v_x in v]
        else:
            return super().add_one(v)
        
class FinalClass(ArraysMixin, StringsMixin, Base):
    pass

With this construction, we are able to define a very simple interface in the Base class, and by "mixing in" two mixins, we can create the FinalClass which has the same interface but with additional features. This is accomplished through Python's powerful multiple inheritance model, which handles method dispatch by creating a method resolution order, or MRO, which is unsuprisingly a list which determines the order in which methods are called as execution proceeds through super() calls. You can view a class' MRO as such:

FinalClass.__mro__

(FinalClass, ArraysMixin, StringsMixin, Base, object)

This means that when we take an instance of FinalClass and call add_one(), Python first checks to see if FinalClass defines an add_one, and then ArraysMixin, and so on and so forth. Furthermore, when ArraysMixin calls super().add_one(), Python will skip past ArraysMixin in the MRO, first checking if StringsMixin defines an add_one, and so forth.

Because multiple inheritance can create strange dependency graphs in the subclass relationship, there are rules for generating the MRO and for determining if a given mix of mixins is even allowed. This is important to understand when building complex classes with many mixins which have dependencies on each other. In short: left-to-right, depth-first, but deferring any base classes which are shared by multiple subclasses (the merge point of a diamond pattern in the inheritance graph) until the last point where they would be encountered in this depth-first search. For example, if you have classes A, B(A), C(B), D(A), E(C, D), then the method resolution order will be E, C, B, D, A. If there is any case in which the MRO would be ambiguous, the class construction is illegal and will throw an exception at import time.

This is complicated! If you find yourself confused, the canonical document explaining the rationale, history, and mechanics of Python's multiple inheritence can be found here.

Mixins in Claripy Solvers

yan please write something here

Mixins in angr Engines

The main entry point to a SimEngine is process(), but how do we determine what that does?

The mixin model is used in SimEngine and friends in order to allow pieces of functionality to be reused between static and symbolic analyses. The default engine, UberEngine, is defined as follows:

class UberEngine(SimEngineFailure, SimEngineSyscall, HooksMixin, SimEngineUnicorn, SuperFastpathMixin, TrackActionsMixin, SimInspectMixin, HeavyResilienceMixin, SootMixin, HeavyVEXMixin):
    pass

Each of these mixins provides either execution through a different medium or some additional instrumentation feature. Though they are not listed here explicitly, there are some base classes implicit to this hierarchy which set up the way this class is traversed. Most of these mixins inherit from SuccessorsMixin, which is what provides the basic process() implementation. This function sets up the SimSuccessors for the rest of the mixins to fill in, and then calls process_successors(), which each of the mixins which provide some mode of execution implement. If the mixin can handle the step, it does so and returns, otherwise it calls super().process_successors(). In this way, the MRO for the engine class determines what the order of precedence for the engine's pieces is.

HeavyVEXMixin and friends

Let's take a closer look at the last mixin, HeavyVEXMixin. If you look at the module hierarchy of the angr engines submodule, you will see that the vex submodule has a lot of pieces in it which are organized by how tightly tied to particular state types or data types they are. The heavy VEX mixin is one version of the culmination of all of these. Let's look at its definition:

class HeavyVEXMixin(SuccessorsMixin, ClaripyDataMixin, SimStateStorageMixin, VEXMixin, VEXLifter):
    ...
    # a WHOLE lot of implementation

So, the heavy VEX mixin is meant to provide fully instrumented symbolic execution on a SimState. What does this entail? The mixins tell the tale.

First, the plain VEXMixin. This mixin is designed to provide the barest-bones framework for processing a VEX block. Take a look at its source code. Its main purpose is to perform the preliminary digestion of the VEX IRSB and dispatch processing of it to methods which are provided by mixins - look at the methods which are either pass or return NotImplemented. Notice that absolutely none of its code makes any assumption whatsoever of what the type of state is or even what the type of the data words inside state are. This job is delegated to other mixins, making the VEXMixin an appropriate base class for literally any analysis on VEX blocks.

The next-most interesting mixin is the ClaripyDataMixin, whose source code is here. This mixin actually integrates the fact that we are executing over the domain of Claripy ASTs. It does this by implementing some of the methods which are unimplemented in the VEXMixin, most importantly the ITE expression, all the operations, and the clean helpers.

In terms of what it looks like to actually touch the SimState, the SimStateStorageMixin provides the glue between the VEXMixin's interface for memory writes et al and SimState's interface for memory writes and such. It is unremarkable, except for a small interaction between it and the ClaripyDataMixin. The Claripy mixin also overrides the memory/register read/write functions, for the purpose of converting between the bitvector and floating-point types, since the vex interface expects to be able to load and store floats, but the SimState interface wants to load and store only bitvectors. Because of this, the claripy mixin must come before the storage mixin in the MRO. This is very much an interaction like the one in the add_one example at the start of this page - one mixin serves as a data filtering layer for another mixin.

Instrumenting the data layer

Let's turn our attention to a mixin which is not included in the HeavyVEXMixin but rather mixed into the UberEngine formula explicitly: the TrackActionsMixin. This mixin implements "SimActions", which is angr parlance for dataflow tracking. Again, look at the source code. The way it does this is that it wraps and unwraps the data layer to pass around additional information about data flows. Look at how it instruments RdTmp, for instance. It immediately super()-calls to the next method in the MRO, but instead of returning that data it returns a tuple of the data and its dependencies, which depending on whether you want temporary variables to be atoms in the dataflow model, will either be just the tmp which was read or the dependencies of the value written to that tmp.

This pattern continues for every single method that this mixin touches - any expression it receives must be unpacked into the expression and its dependencies, and any result must be packaged with its dependencies before it is returned. This works because the mixin above it makes no assumptions about what data it is passing around, and the mixin below it never gets to see any dependencies whatsoever. In fact, there could be multiple mixins performing this kind of wrap-unwrap trick and they could all coexist peacefully!

Note that a mixin which instruments the data layer in this way is obligated to override every single method which takes or returns an expression value, even if it doesn't perform any operation on the expression other than doing the wrapping and unwrapping. To understand why, imagine that the mixin does not override the _handle_vex_const expression, so immediate value loads are not annotated with dependencies. The expression value which will be returned from the mixin which does provide _handle_vex_const will not be a tuple of (expression, deps), it will just be the expression. Imagine this execution is taking place in the context of a WrTmp(t0, Const(0)). The const expression will be passed down to the WrTmp handler along with the identifier of the tmp to write to. However, since _handle_vex_stmt_WrTmp will be overridden by our mixin which touches the data layer, it expects to be passed the tuple including the deps, and so it will crash when trying to unpack the not-a-tuple value.

In this way, you can sort of imagine that a mixin which instruments the data layer in this way is actually creating a contract within Python's nonexistent typesystem - you are guaranteed to receive back any types you return, but you must pass down any types you receive as return values from below.

Mixins in the memory model

audrey please write something here. or fish, I'm not picky