This is work in progress and I don't have any formal background in this area, so don't believe all the stuff from here :)
Resources[edit | edit source]
SymPy[edit | edit source]
Symbolic math in Python. Will be very useful for data flow analysis.
A nice framework, but not for ARM. They have some interesting presentations and papers:
Misc[edit | edit source]
- CiteSeerX has tons of papers on reverse engineering, decompilation and stuff like that (I like this one)
- OO Reengineering Patterns book by Serge Demeyer (part II: reverse engineering; lots of text and little info)
TODO: read them :)
Useful functions[edit | edit source]
Notations[edit | edit source]
- unk_R0, unk_R1...: before emulating a piece of ARM code, those symbols are loaded into registers
- arg0 ... arg3: those symbols are loaded into R0...R3 when starting emulation from the first line of a function
- sp0: stack pointer at the beginning of the function being analyzed
- unhandled.R1...: there was some instruction not implemented in the ARM emulation code, which referenced R1 (or other registers)
- MEM(0x1234): memory addressing (pointer dereference operator)
- ret_myfunc_0xFFFF1234: value returned by "myfunc" when it was called at addr 0xFFFF1234
Techniques[edit | edit source]
In theory, there is no difference between theory and practice. But, in practice, there is. 
Look at conditional jumps and generate a list of possible code paths.
Since we don't (always) know the initial condititions, ARM emulators are not very helpful (at least not for me). Symbolic emulation assumes there are some unknowns there, and here SymPy shows it's mighty power :D
This is useful for guessing argument values in function calls. Much better than the old method, but also much slower.