Remember the C# compiler message: not all code paths return a value?
So what's a code path?
A path is a unique sequence of branches from the function entry to the exit [1]. Any module with a succession of n decisions in it can have up to 2^n paths within it [2].
The basics[]
In order to perform static analysis of a function, it's useful to identify the code paths. This can be done by considering jumps (especially conditional ones).
Conditional suffixes in ARM ASM are [here]. You might have noticed that each suffix has an opposite one, like EQ -- NE or LS -- GE.
A conditional branch looks like this:
CMP R0, #0x1F LDREQ R0, [R4,#4] CMPEQ R0, #0 BNE a BL b
which is equivalent to (in Python-like pseudocode):
if R0 == 0x1F: R0 = RAM(R4 + 4) if R0 != 0: goto a b()
Other example:
CMP R0, 0 BNE a BEQ b
This is an if-then-else construct.
All conditional suffixes (EQ, NE etc.) look at the flags to decide if a branch is taken or not. Flags are changed by CMP, CMN, TST, TEQ and MSR (afaik), and also by the instructions with the S suffix. If you know more instructions which change the flags, please edit.
So, when one of those instructions is encountered, we have another branch.
Code is here. It's a mess, so if you find bugs, please report them here.
A code path is stored as a list of RAM addresses (long integers) and is built recursively by taking (or not taking) each conditional branch.
To test this, GraphViz can create a flow chart from all the code paths:
Tricky cases[]
Complex branch conditions[]
A single "if" and two conditions which are not mutually exclusive: MI (opposite of PL) and CS (opposite of CC). I think both can be true or false or one true/other false simultaneously, so in this case there are 4 possible paths from the same point.
ROM:FF44F818 MOVS R2, R2,LSL#31 ROM:FF44F81C LDRMIB R2, [R1],#1 ROM:FF44F820 LDRCSB R3, [R1],#1 ROM:FF44F824 LDRCSB R12, [R1],#1 ROM:FF44F828 STRMIB R2, [R0],#1 ROM:FF44F82C STRCSB R3, [R0],#1 ROM:FF44F830 STRCSB R12, [R0],#1
Solution: when considering which branch to take, instead of saying "take the MI branch", it's better to say "take the MI or CS branch" (without limiting the number of non-opposite branch conditions). The list of conditions should be reset when encountering an instruction which changes the flags.
- if you understand this and are native English speaker, please rephrase it
Loops[]
Those are really tricky. They can be found easily (in theory) by detecting a jump to a visited address (instruction).
Trick: there can be visited instructions which are not executed (because of the opposite branch condition. So there must be two lists: one with executable instructions (the actual code path) and another one with visited addresses (which I have to fix in the implementation).