332 Pages

For the impatient, results are here.

## Overview

As I've guessed from [1], this is a technique which can retrieve register values at a given point, by going a little back in the instruction flow and emulating a small section of code at symbolic level.

So let's apply this to function call tracing. Consider this example:

```...
ROM:FF3EBCD0                 STR     R10, [R0,#0x6C]
ROM:FF3EBCD8                 STR     R9, [R0,#0x74]
ROM:FF3EBCDC                 LDR     R7, =off_FF53C790
ROM:FF3EBCE0                 STR     R8, [R0,#0x8C]!
ROM:FF3EBCE4                 STR     R9, [R0,#8]
ROM:FF3EBCE8                 LDR     R0, [R7]        ; name
ROM:FF3EBCEC                 MOV     R1, #0          ; initial_value_maybe
ROM:FF3EBCF0                 BL      CreateBinarySemaphore
```

At the beginning of emulation, we'll assign unknown symbolic values to the registers, e.g. unk_r0, unk_r1 and so on.

First, let's try to emulate just the instruction before the call:

```MOV     R1, #0
```
```R0 =  unk_r0
R1 =  0x0 (0)
```

Got R1, but R0 is unknown. Let's try the last two:

```LDR     R0, [R7]
MOV     R1, #0
```
```R0 =  *(unk_r7)
R1 =  0x0 (0)
```

We have a hint about R0, but R7 is unknown. This is set at FF3EBCDC, so we have to emulate the last 5 instructions:

```LDR     R7, =off_FF53C790
STR     R8, [R0,#0x8C]!
STR     R9, [R0,#8]
LDR     R0, [R7]
MOV     R1, #0
```
```R7 = 0xFF53C790
*(0x8C   + unk_r0) = unk_r8
R0 = 0x8C + unk_r0
*(0x8C+8 + unk_r0) = unk_r9
R0 =  *(0xFF53C790) 'CFSemInterrupt'
R1 =  0x0 (0)
```

Found R0 and R1 => the analysis stops here.

## Algorithm

```code path = [last instr. before the call]

while not stop_condition:
init arm state (with unknown symbols loaded into registers)
emulate the code path symbolically
check if R0 ... Rn are fully known (n <= 3)
if yes => DONE!
else, add the previous instruction to the code path
```

TODO: what's a good stop condition? The arguments can't be found in all cases...

The algorithm is pretty dumb, but works (more or less). It's very slow; I thought it's O(n^2), but 100 loop iterations take around 1 minute...*)

*) After porting this to GPL tools [2], the same code takes around 1 second. Still slow for my taste :)

Idea: why not just go back until the top of the current function and trace for there?

• Plus: it's much faster, and sometimes more accurate (but not always! the dumb algorithm can trace deeper into the call stack)
• Minus: sometimes it gets stuck in loops. Here, the old algorithm might (or might not) help.
• PROBLEM: how to handle loops? My initial guess is to identify them and run them 1, 2 and 3 times to see what's changing and what not. Some loops have more than one entry (or exit) point, and that's ugly...

Also, there may be more than one code path backwards. I did not consider this case, so there will be missed calls.

## Results

Community content is available under CC-BY-SA unless otherwise noted.