This is work in progress and I don't have any formal background in this area, so don't believe all the stuff from here :)

Resources[edit | edit source]

SymPy[edit | edit source]

Symbolic math in Python. Will be very useful for data flow analysis.

metasm[edit | edit source]

A nice framework, but not for ARM. They have some interesting presentations and papers:

Misc[edit | edit source]

TODO: read them :)

Useful functions[edit | edit source]

See utils.py

Notations[edit | edit source]

  • unk_R0, unk_R1...: before emulating a piece of ARM code, those symbols are loaded into registers
  • arg0 ... arg3: those symbols are loaded into R0...R3 when starting emulation from the first line of a function
  • sp0: stack pointer at the beginning of the function being analyzed
  • unhandled.R1...: there was some instruction not implemented in the ARM emulation code, which referenced R1 (or other registers)
  • MEM(0x1234): memory addressing (pointer dereference operator)
  • ret_myfunc_0xFFFF1234: value returned by "myfunc" when it was called at addr 0xFFFF1234

Techniques[edit | edit source]

In theory, there is no difference between theory and practice. But, in practice, there is. [1]

Code flow analysis (code paths)[edit | edit source]

Look at conditional jumps and generate a list of possible code paths.

A path is a unique sequence of branches from the function entry to the exit [2]. Any module with a succession of n decisions in it can have up to 2^n paths within it [3].

Symbolic emulation of a code path[edit | edit source]

Since we don't (always) know the initial condititions, ARM emulators are not very helpful (at least not for me). Symbolic emulation assumes there are some unknowns there, and here SymPy shows it's mighty power :D

Backtracing[edit | edit source]

This is useful for guessing argument values in function calls. Much better than the old method, but also much slower.


Results[edit | edit source]

Community content is available under CC-BY-SA unless otherwise noted.