Line 64: | Line 64: | ||
===Interactive console=== |
===Interactive console=== |
||
− | The script invokes IPython |
+ | The script invokes IPython at the end of the automatic initial analysis; here you can browse the dump, find/verify matches between firmware versions, and lots of other cool stuff. |
$ python main.py |
$ python main.py |
Revision as of 23:18, 19 November 2010
This is the script for matching functions and data addresses between a bunch of IDC databases.
Theory (how it works): IDAPython/Firmware matching
Usage
[code will be uploaded soon]
Dependencies
- Python (I use 2.6 under Linux). Just to be sure, install numpy/scipy and ipython.
- arm-elf-gcc in your PATH (see Build instructions/550D for how to do that)
- It doesn't require IDAPython nor IDA, just the IDC files.
Preparing input files
Prepare a working directory where you will put the input files. You will need:
- Some dumps, with the .bin extension. Include the load address in the dump name.
- Some IDC files. Try to give them names somewhat similar to the dumps, to help the autodetection.
- The script (called for now match.py), in the same folder (or in PATH, if you like)
For example, those names are valid:
5D_204_06_0xff810000.bin 550D_108_05_0xff010000.bin 500D_0xff010000.bin 5D 204 AJROM0.idc 550D_108_20101116_indy_ROM0.idc
Running
Then just say:
python match.py
and you should get something like this:
Input files: =============================================================================== Binary dump (*.bin) LoadAddr IDC database (*.idc) =============================================================================== 5D_204_06_0xff810000.bin FF810000 5D 204 AJROM0.idc 500D_0xff010000.bin FF010000 n/a 550D_108_05_0xff010000.bin FF010000 550D_108_20101116_indy_ROM0.idc =============================================================================== Disassembling 5D_204_06_0xff810000.bin <ff810000>... ok Disassembling 500D_0xff010000.bin <ff010000>... ok Disassembling 550D_108_05_0xff010000.bin <ff010000>... ok Parsing 5D 204 AJROM0.idc... found 40692 MakeName's and 19191 MakeFunction's Parsing 550D_108_20101116_indy_ROM0.idc... found 56768 MakeName's and 18053 MakeFunction's Parsing disassembly of 5D_204_06_0xff810000.bin... found 1263894 lines Parsing disassembly of 500D_0xff010000.bin... found 1171162 lines Parsing disassembly of 550D_108_05_0xff010000.bin... found 1395198 lines Creating codesigs for 5D_204_06_0xff810000.bin... Creating codesigs for 550D_108_05_0xff010000.bin... saving cache... ok Found 6623 raw code matches between 550D_108_05_0xff010000.bin and 5D_204_06_0xff810000.bin.
Results
To find the results, just sort the working directory by modification date.
- match-log.txt: shows detailed info about the matching process, for each pair of functions.
Advanced use and debugging
Interactive console
The script invokes IPython at the end of the automatic initial analysis; here you can browse the dump, find/verify matches between firmware versions, and lots of other cool stuff.
$ python main.py ... lots of messages ...
ARM firmware analysis console ready.
In [1]:
Internals
[rewriting...]
Global variables
- D: dictionary containing Dump objects, indexed by their file name.
In [1]: D Out[1]: {'550D_108_05_0xff010000.bin': Dump of 550D_108_05_0xff010000.bin, '5D_204_06_0xff810000.bin': Dump of 5D_204_06_0xff810000.bin}
In [2]: D.values() Out[2]: [Dump of 5D_204_06_0xff810000.bin, Dump of 550D_108_05_0xff010000.bin]
In [3]: mk2,t2i = D.values()
Classes/objects
Dump
Contains all info about a dump.
Fields Uppercase's are dictionaries.
In [4]: t2i. <TAB> t2i.A2N t2i.Fun t2i.__class__ t2i.__str__ t2i.loadaddr t2i.ARGS t2i.MNEF t2i.__doc__ t2i._get_strings t2i.refs t2i.DATA t2i.N2A t2i.__init__ t2i._get_strings_work t2i.strings t2i.DISASM t2i.RAWASM t2i.__module__ t2i.bin t2i.strrefs t2i.FUNCS t2i.ROM t2i.__repr__ t2i.funcs
In [10]: t2i.bin, t2i.loadaddr Out[10]: ('550D_108_05_0xff010000.bin', 4278255616L)
In [11]: t2i.funcs("japan") ff20cf64: get_JapanLang_struct_14c48_2a0 ff4369ac: StopMnLanguageJapanApp ff436d30: StartMnLanguageJapanApp ff0978bc: GUI_LimitLangJapan ff4369e4: MnLanguageJapan_handler ff436de0: language_japan_something
In [12]: t2i.refs("sounddev") SoundDevStartOut+32: ff053d70: e51f4b54 ldr r4, [pc, #-2900] ; ff053224 <_binary_550D_108_05_0xff010000_bin_start+0x43224> pointer to 0x1ED0 (sounddev) ...etc...
In [13]: t2i.strrefs("^CreateTask$") String references to ff06e2c8 'CreateTask': createTask_maybe+68: ff06e158: 028f0f5a addeq r0, pc, #360 ; *'CreateTask' 'CreateTask'
Fun
ASM function. Constructor:
- Fun(dump, name_or_addr)
- dump.Fun(name_or_addr)
In [50]: f = Fun(t2i,"setFiltreOff") Unknown function: setFiltreOff. Using closest match: SetFilterOff. In [51]: f Out[51]: SetFilterOff at 0xff064e98 in 550D_108_05_0xff010000.bin
Fields:
In [52]: f. <TAB> f.__class__ f.__init__ f.__repr__ f._get_end f._get_size f.called_by f.disasm f.end f.sig f.__doc__ f.__module__ f.__str__ f._get_sig f.addr f.calls f.dump f.refs f.size
In [53]: "%x"%f.addr Out[53]: 'ff064e98' In [54]: "%x"%f.end Out[54]: 'ff064eb8'
In [55]: f.size Out[55]: 32
In [60]: f.called_by() ff064120: sub_FF064114+12 ff064d30: UnpowerMicAmp+40
In [61]: f.calls() ff064ea8: eb00094f bl @DebugMsg ff064eb4: eafffb4e b @audio_ic_write
In [62]: f.disasm() // Start of function: SetFilterOff NSTUB(SetFilterOff, ff064e98): ff064e98: e92d4010 push {r4, r14} ff064e9c: e28f20f4 add r2, pc, #244 ; *SetFilterOff ff064ea0: e3a01003 mov r1, #3 ; 0x3 ff064ea4: e3a00014 mov r0, #20 ; 0x14 ff064ea8: eb00094f bl @DebugMsg ff064eac: e8bd4010 pop {r4, r14} ff064eb0: e3a00c31 mov r0, #12544 ; 0x3100 ff064eb4: eafffb4e b @audio_ic_write // End of function: sub_FF064EB4
In [63]: f.sig 'push add mov mov bl pop mov b '
Functions
...maybe should be moved into Dump class?
- BYTE(dump, addr): read a byte from the ROM, from the dump object dump
In [5]: hex(BYTE(t2i, 0xff011e1c)) Out[5]: '0x61'
- INT32(dump, addr): shortcut for D[bin].ROM[addr]
In [6]: hex(INT32(t2i, 0xff011e1c)) Out[6]: '0x73616b61'
- GuessString(dump, addr): detect a string starting from addr
In [7]: GuessString(t2i, 0xff011e1c) Out[7]: 'akashimorino'
- funcname(dump, addr): function name extracted from IDC, or sub_ABCD1234 if it's not found
In [8]: funcname(t2i, 0xFF28AA58) Out[8]: 'GetJpegInfo'
- getname(dump, addr): similar to funcname, but used for other names (not functions).
In [10]: getname(t2i, 0x26284) Out[10]: '0x26284 (sd_device)'
- guess_data(dump, value): return a friendly name for value. It detects whether value is a function address, a pointer to a string or a pointer to some other value in ROM (or just a plain number).
In [1]: print guess_data(t2i,0xff05de04) pointer to 0x2e5b0 In [2]: print guess_data(t2i,0xff011e1c) 'akashimorino' In [3]: print guess_data(t2i,0xff0673ec) @DebugMsg
Functions for interactive use
They do not return or change anything, just display stuff at the console.
- find_funcs(bin, regex, ratio=1, num=10): Find functions using either a regex search, or a fuzzy string match
In [1]: find_funcs(t2i, r"Flavor[C|S]") # when ratio=1, it uses a regex search ff205b24: FlavorSharpness ff205c14: FlavorContrast ff205dac: FlavorSaturation ff205e9c: FlavorColorTone
In [2]: find_funcs(mk2, "DebugMsg", 0.5) # when ratio < 1, this is the min. allowed ratio for fuzzy search ff86af48: TH_DebugMsg ff9b7660: AJ_called_by_DebugMsg ff86b22c: AJ_DbgMgr.c
- find_refs(bin, value): look for references to a given name or value.
In [1]: find_refs(t2i,0x2b74) DebugMsg+112: ff067458: 2a000003 bcs ff06746c <_binary_550D_108_05_0xff010000_bin_start+0x5746c> ff06745c: e59f00f4 ldr r0, [pc, #244] ; ff067558 <_binary_550D_108_05_0xff010000_bin_start+0x57558> ff067460: e7901101 ldr r1, [r0, r1, lsl #2] pointer to 0x2b74 ... etc ...
In [2]: find_refs(t2i,"sounddev") ... lots of entries ...
- show_diasam(bin, start, end): displays disassembly of the code between start and end address.
- show_func(bin, f): displays disassembly of a function, given by name or address.
In [1]: show_func(t2i, "SetFilterOff") // Start of function: SetFilterOff NSTUB(SetFilterOff, ff064e98): ff064e98: e92d4010 push {r4, r14} ff064e9c: e28f20f4 add r2, pc, #244 ; *SetFilterOff ff064ea0: e3a01003 mov r1, #3 ; 0x3 ff064ea4: e3a00014 mov r0, #20 ; 0x14 ff064ea8: eb00094f bl @DebugMsg ff064eac: e8bd4010 pop {r4, r14} ff064eb0: e3a00c31 mov r0, #12544 ; 0x3100 ff064eb4: eafffb4e b @audio_ic_write // End of function: sub_FF064EB4