For the impatient, results are here.
Overview[]
As I've guessed from [1], this is a technique which can retrieve register values at a given point, by going a little back in the instruction flow and emulating a small section of code at symbolic level.
So let's apply this to function call tracing. Consider this example:
... ROM:FF3EBCD0 STR R10, [R0,#0x6C] ROM:FF3EBCD4 ADD R4, R0, #8 ROM:FF3EBCD8 STR R9, [R0,#0x74] ROM:FF3EBCDC LDR R7, =off_FF53C790 ROM:FF3EBCE0 STR R8, [R0,#0x8C]! ROM:FF3EBCE4 STR R9, [R0,#8] ROM:FF3EBCE8 LDR R0, [R7] ; name ROM:FF3EBCEC MOV R1, #0 ; initial_value_maybe ROM:FF3EBCF0 BL CreateBinarySemaphore
At the beginning of emulation, we'll assign unknown symbolic values to the registers, e.g. unk_r0, unk_r1 and so on.
First, let's try to emulate just the instruction before the call:
MOV R1, #0
R0 = unk_r0 R1 = 0x0 (0)
Got R1, but R0 is unknown. Let's try the last two:
LDR R0, [R7] MOV R1, #0
R0 = *(unk_r7) R1 = 0x0 (0)
We have a hint about R0, but R7 is unknown. This is set at FF3EBCDC, so we have to emulate the last 5 instructions:
LDR R7, =off_FF53C790 STR R8, [R0,#0x8C]! STR R9, [R0,#8] LDR R0, [R7] MOV R1, #0
R7 = 0xFF53C790 *(0x8C + unk_r0) = unk_r8 R0 = 0x8C + unk_r0 *(0x8C+8 + unk_r0) = unk_r9 R0 = *(0xFF53C790) 'CFSemInterrupt' R1 = 0x0 (0)
Found R0 and R1 => the analysis stops here.
Algorithm[]
code path = [last instr. before the call] while not stop_condition: init arm state (with unknown symbols loaded into registers) emulate the code path symbolically check if R0 ... Rn are fully known (n <= 3) if yes => DONE! else, add the previous instruction to the code path
TODO: what's a good stop condition? The arguments can't be found in all cases...
The algorithm is pretty dumb, but works (more or less). It's very slow; I thought it's O(n^2), but 100 loop iterations take around 1 minute...*)
*) After porting this to GPL tools [2], the same code takes around 1 second. Still slow for my taste :)
Idea: why not just go back until the top of the current function and trace for there?
- Plus: it's much faster, and sometimes more accurate (but not always! the dumb algorithm can trace deeper into the call stack)
- Minus: sometimes it gets stuck in loops. Here, the old algorithm might (or might not) help.
- PROBLEM: how to handle loops? My initial guess is to identify them and run them 1, 2 and 3 times to see what's changing and what not. Some loops have more than one entry (or exit) point, and that's ugly...
Also, there may be more than one code path backwards. I did not consider this case, so there will be missed calls.