This is the script for matching functions and data addresses between a bunch of IDC databases.
Usage
[code will be uploaded soon]
Dependencies
- Python (I use 2.6 under Linux). Just to be sure, install numpy/scipy and ipython.
- arm-elf-gcc in your PATH (see Build instructions/550D for how to do that)
- It doesn't require IDAPython nor IDA, just the IDC files.
Preparing input files
Prepare a working directory where you will put the input files. You will need:
- Some dumps, with the .bin extension. Include the load address in the dump name.
- Some IDC files. Try to give them names somewhat similar to the dumps, to help the autodetection.
- The script (called for now match.py), in the same folder (or in PATH, if you like)
For example, those names are valid:
5D_204_06_0xff810000.bin 550D_108_05_0xff010000.bin 500D_0xff010000.bin 5D 204 AJROM0.idc 550D_108_20101116_indy_ROM0.idc
Running
Then just say:
python match.py
and you should get something like this:
Input files: =============================================================================== Binary dump (*.bin) LoadAddr IDC database (*.idc) =============================================================================== 5D_204_06_0xff810000.bin FF810000 5D 204 AJROM0.idc 500D_0xff010000.bin FF010000 n/a 550D_108_05_0xff010000.bin FF010000 550D_108_20101116_indy_ROM0.idc =============================================================================== Disassembling 5D_204_06_0xff810000.bin <ff810000>... ok Disassembling 500D_0xff010000.bin <ff010000>... ok Disassembling 550D_108_05_0xff010000.bin <ff010000>... ok Parsing 5D 204 AJROM0.idc... found 40692 MakeName's and 19191 MakeFunction's Parsing 550D_108_20101116_indy_ROM0.idc... found 56768 MakeName's and 18053 MakeFunction's Parsing disassembly of 5D_204_06_0xff810000.bin... found 1263894 lines Parsing disassembly of 500D_0xff010000.bin... found 1171162 lines Parsing disassembly of 550D_108_05_0xff010000.bin... found 1395198 lines Creating codesigs for 5D_204_06_0xff810000.bin... Creating codesigs for 550D_108_05_0xff010000.bin... saving cache... ok Found 6623 raw code matches between 550D_108_05_0xff010000.bin and 5D_204_06_0xff810000.bin.
Results
To find the results, just sort the working directory by modification date.
- match-log.txt: shows detailed info about the matching process, for each pair of functions.
Advanced use and debugging
Interactive console
You can run it in IPython; after the script finishes, you can poke around and make various queries.
$ ipython In [1]: run match.py ...
In [2]: bins # what dumps we have loaded? Out[2]: ['550D_108_05_0xff010000.bin', '5D_204_06_0xff810000.bin'] In [3]: t2i, mk2 = bins # give a short name to each one
In [4]: hex(D[t2i].ROM[0xff011e1c]) # read from ROM; only multiples of 4 allowed here Out[4]: '0x73616b61'
In [5]: BYTE(t2i, 0xff011bde) # this is for any address; reads a single byte from ROM Out[5]: 143
In [6]: GuessString? # how to get help for a function ... Definition: GuessString(ROM, a) ...
In [7]: GuessString(t2i, 0xff011e1c) # find a string starting from a known address Out[7]: 'akashimorino'
Internals
A dump is identified by its file name, used as index into the various dictionaries used.
Global variables
- bins: list of dumps (i.e. file names with .bin extension)
- loadaddrs: dictionary of load addresses for each dump
- idcs: dictionary of idc file names for each dump
- D: dictionary containing lots of info about dumps: ROM contents, IDC names, functions, signatures...
Functions
- BYTE(bin, addr): read a byte from the ROM, from the dump whose file name is bin
- GuessString(bin, addr): detect a string starting from addr
- funcname(bin, addr): function name extracted from IDC, or sub_ABCD1234 if it's not found
- getname(bin, addr): similar to funcname, but used for other names (not functions).
Functions for interactive use
- find_funcs(bin, regex): find functions using a regex string
In [1]: find_funcs(t2i, r"Flavor[C|S]") ff205b24: FlavorSharpness ff205c14: FlavorContrast ff205dac: FlavorSaturation ff205e9c: FlavorColorTone
- find_refs(bin, value): look for references to a given name or value.
In [1]: find_refs(t2i,0x2b74) DebugMsg+112: ff067458: 2a000003 bcs ff06746c <_binary_550D_108_05_0xff010000_bin_start+0x5746c> ff06745c: e59f00f4 ldr r0, [pc, #244] ; ff067558 <_binary_550D_108_05_0xff010000_bin_start+0x57558> ff067460: e7901101 ldr r1, [r0, r1, lsl #2] pointer to 0x2b74 ... etc ...
In [2]: find_refs(t2i,"sounddev") ...
- guess_data(bin, value): return a friendly name for value. It detects whether value is a function address, a pointer to a string or a pointer to some other value in ROM (or just a plain number).
In [1]: print guess_data(t2i,0xff05de04) pointer to 0x2e5b0 In [2]: print guess_data(t2i,0xff011e1c) 'akashimorino' In [3]: print guess_data(t2i,0xff0673ec) @DebugMsg
- show_func(bin, f): displays disassembly of a function, given by name or address.
- show_diasam(bin, start, end): displays disassembly of the code between start and end address.