This is work in progress and I don't have any formal background in this area, so don't believe all the stuff from here :)
Resources[]
SymPy[]
Symbolic math in Python. Will be very useful for data flow analysis.
metasm[]
A nice framework, but not for ARM. They have some interesting presentations and papers:
- http://metasm.cr0.org/docs/2009-guillot-gazet-hitb-deprotection.pdf
- http://metasm.cr0.org/docs/sstic09-metasm-jcv.pdf
- http://metasm.cr0.org/docs/sstic08-metasm-jcv.pdf
Misc[]
- http://www.zynamics.com/downloads/csw09.pdf
- http://strategoxt.org/Transform/WebHome
- CiteSeerX has tons of papers on reverse engineering, decompilation and stuff like that (I like this one)
- OO Reengineering Patterns book by Serge Demeyer (part II: reverse engineering; lots of text and little info)
TODO: read them :)
Useful functions[]
See utils.py
Notations[]
- unk_R0, unk_R1...: before emulating a piece of ARM code, those symbols are loaded into registers
- arg0 ... arg3: those symbols are loaded into R0...R3 when starting emulation from the first line of a function
- sp0: stack pointer at the beginning of the function being analyzed
- unhandled.R1...: there was some instruction not implemented in the ARM emulation code, which referenced R1 (or other registers)
- MEM(0x1234): memory addressing (pointer dereference operator)
- ret_myfunc_0xFFFF1234: value returned by "myfunc" when it was called at addr 0xFFFF1234
Techniques[]
In theory, there is no difference between theory and practice. But, in practice, there is. [1]
Code flow analysis (code paths)[]
Look at conditional jumps and generate a list of possible code paths.
A path is a unique sequence of branches from the function entry to the exit [2]. Any module with a succession of n decisions in it can have up to 2^n paths within it [3].
Symbolic emulation of a code path[]
Since we don't (always) know the initial condititions, ARM emulators are not very helpful (at least not for me). Symbolic emulation assumes there are some unknowns there, and here SymPy shows it's mighty power :D
Backtracing[]
This is useful for guessing argument values in function calls. Much better than the old method, but also much slower.