This is useful for analyzing a single code path (without loops!) when we don't know the initial conditions. We can set them to symbolic variables and go step by step through the code to get the data flow.
SymPy can define symbolic variables (unknowns) and even symbolic functions (useful, for example, for modelling memory access, e.g. MEM(0x1234)).
x = Symbol("x") y = Symbol("y") print x + 2 + y + 5 => 7 + x + y
Code is in emusym.py, but the api is not yet stable. It works by translating asm instructions to Python code and running it. Example (this is more like pseudocode, since the actual implementation is a bit more complex and needs to be cleaned up):
MOV a,b => a = b ADD a,b,c => a = b + c LDR R1,[R2+1] => R1 = MEM(R2+1) STR R1,[R2+1] => STOREMEM(R2+1, R1) # Python allows to implement MEM(R2+1) = R1, but I'm too lazy to do that...
So, MEM(addr) is the memory access function (or, if you prefer, the pointer dereference operator). If the addr is inside the ROM, the result is known (obvious). Otherwise, usually the result is not known (it remains a symbol, e.g. MEM(0x1234) or MEM(R1+0x123)), unless the code wrote before something to that address. To track memory writes, STOREMEM uses a dict object.
In theory, this can follow complex computations and get the formula for the result. In practice, the formula may be too long to be usable. There are routines for simplifications, and you can define your own simplifying rules, but that's not always effective, and not easy to implement. Help is welcome :)
Arm.Indy says this is going to be a virtual machine for ARM. I'd say it's not. Good ARM emulators can be found online (Qemu, Pel's armu) and maybe others I'm not aware of. They are very useful in the right hands. However, to use these effectively, you have to know a lot: initial values of parameters, registers and maybe RAM contents, how the external devices work (camera sensors, coprocessors etc).
So, my implementation does not replace an ARM emulator, but rather it complements it (or at least this is my intention).
Yes, the virtual machine can be thought as a particular case of symbolic emulation (when everything is known). In theory only...