API reference for ARM firmware analysis console
Examples are given for the disassembly of autoexec.bin (550D, 1.0.8) and for the 550D dump, which can be obtained by running Magic Lantern on your camera.
Loading some dumps and some names[]
In [1]: D = load_dumps("(autoexec|108)") Did not find the matching bin for mynames.idc. Ignoring this file. Did not find the matching bin for mychanges.idc. Ignoring this file. Input files: =============================================================================== Binary dump (*.bin) LoadAddr IDC database (*.idc) =============================================================================== 550d.108.0xff010000.bin FF010000 550d.108.20101116_indy_ROM0.idc autoexec.0x8A000.bin 8A000 n/a =============================================================================== Disassembling 550d.108.0xff010000.bin <ff010000>... ok Disassembling autoexec.0x8A000.bin <8a000>... ok Parsing 550d.108.20101116_indy_ROM0.idc... found 56768 MakeName's and 18053 MakeFunction's Parsing disassembly of 550d.108.0xff010000.bin... found 1395198 lines Indexing references... Parsing disassembly of autoexec.0x8A000.bin... found 19564 lines Indexing references... In [2]: t2i,ml = D # D is sorted by bin file name In [3]: ml.load_names("stubs-550d.108.S") Found 80 stubs in stubs-550d.108.S. In [4]: ml.load_names("autoexec.S") Found 8 stubs in autoexec.S.
Magic commands[]
Shortcuts to misc stuff.
They operate on the currently selected dump, so first you have to select one:
In [5]: sel ml
Go to address or name:
In [6]: g 8B194 8b194: e59fb19c ldr r11, [pc, #412] ; 0x8b338: pointer to 0xc2b8c 8b198: e59fc19c ldr r12, [pc, #412] ; 0x8b33c: pointer to 0x8B7F4 (bmp_printf) 8b19c: e58b0000 str r0, [r11] 8b1a0: e1a01006 mov r1, r6 8b1a4: e3a02028 mov r2, #40 ; 0x28 8b1a8: e3a00802 mov r0, #131072 ; 0x20000 8b1ac: e59f318c ldr r3, [pc, #396] ; **'ML v. %s (%s)\nBuilt on %s by %s\n' 8b1b0: e88d0090 stm sp, {r4, r7} 8b1b4: e58dc010 str r12, [sp, #16] 8b1b8: e58d8008 str r8, [sp, #8] In [7]: g fprintf NSTUB(fprintf, 8eec0): 8eec0: e92d000e push {r1, r2, r3} 8eec4: e92d4070 push {r4, r5, r6, lr} 8eec8: e24ddf41 sub sp, sp, #260 ; 0x104 8eecc: e28dcf46 add r12, sp, #280 ; 0x118 8eed0: e1a05000 mov r5, r0 8eed4: e1a0300c mov r3, r12 8eed8: e59d2114 ldr r2, [sp, #276] 8eedc: e58dc100 str r12, [sp, #256] 8eee0: e1a0000d mov r0, sp 8eee4: e59fc03c ldr r12, [pc, #60] ; 0x8ef28: pointer to 0xFF1D6638 (vsnprintf)
Search for strings:
In [8]: s magic finding strings... 91b1c: 'A:/magic.cfg' 916b8: 'magic lantern init done' 91628: 'B:/magic.cfg' 91ff8: 'Magic Lantern install' 917d4: '# Magic Lantern %s (%s)\n# Build on %s by %s\n' 915fc: 'Magic Lantern %s (%s)' 915f0: '[MAGIC] '
Search for references:
In [9]: r additional_version ROMBASE+0x10f8: 8b0f8: e59f3220 ldr r3, [pc, #544] ; 0x8b320: pointer to 0x15094 (additional_version) 0x15094 (additional_version) In [10]: r 0x1ED0 ROMBASE+0x5b54: 8fb54: e59f40bc ldr r4, [pc, #188] ; 0x8fc18: pointer to 0x1ED0 (sounddev) 0x1ED0 (sounddev)
Classes/objects[]
Dump[]
Contains all info about a dump.
In [11]: ml.bin Out[11]: autoexec.0x8A000.bin In [12]: hex(ml.loadaddr), hex(ml.minaddr), hex(ml.maxaddr) Out[12]: ('8A000', '8A000', 'B30BC') In [13]: t2i.funcs(r"Flavor[C|S]") # when ratio=1, it uses a regex search ff205b24: FlavorSharpness ff205c14: FlavorContrast ff205dac: FlavorSaturation ff205e9c: FlavorColorTone In [14]: t2i.funcs("DebugMsg", 0.5) # when ratio < 1, this is the min. allowed ratio for fuzzy search ff0673ec: DebugMsg ff2da3e0: _DebugSignal ff08b350: EnableDebugMon ff0cbac8: DpSetDebugMode In [15]: t2i.refs("sounddev") ... sounddev_task+28: ff053488: e51f426c ldr r4, [pc, #-620] ; 0xff053224: pointer to 0x1ED0 (sounddev) 0x1ED0 (sounddev) ... In [16]: t2i.refs("sounddev", context=2) ... sounddev_task+28: ff053480: e3a01000 mov r1, #0 ; 0x0 ff053484: eb006b5e bl @create_binary_semaphore ff053488: e51f426c ldr r4, [pc, #-620] ; 0xff053224: pointer to 0x1ED0 (sounddev) ff05348c: e3a02000 mov r2, #0 ; 0x0 ff053490: e5840058 str r0, [r4, #88] 0x1ED0 (sounddev) ... In [17]: t2i.strings("AudioLevel") finding strings... ff544628: 'AudioLevelStateSignature' ff1ab830: 'SoundDevice\\AudioLevel.c' ff544670: 'AudioLevel' In [18]: t2i.strrefs("^CreateTask$") String references to ff06e2c8 'CreateTask': createTask_maybe+68: ff06e158: 028f0f5a addeq r0, pc, #360 ; *'CreateTask' 'CreateTask' In [19]: t2i.disasm(0xff010000, 0xff01000f) ff010000: e59ff0bc ldr pc, [pc, #188] ; 0xff0100c4: pointer to 0xff01000c NSTUB($ fr603e.var_4C, ff010004): ff010004: 6e6f6167 powvsez f6, f7, f7 ff010008: 796f7369 stmdbvc pc!, {r0, r3, r5, r6, r8, r9, r12, sp, lr}^ ff01000c: e3a00103 mov r0, #-1073741824 ; 0xc0000000
In [20]: ml.MakeName(0x8D070, "config_parse_file") In [21]: ml.MakeFunction(0x8D070) # will guess the end address Size: 108 In [22]: ml.MakeFunction(0x8D070, 0x8D0DC) # explicit end address In [23]: ml.MakeFunction(0x8D070, 0x8D0DC, "config_parse_file") # explicit end address and name Overwriting name config_parse_file In [24]: ml.MakeFunction(0x8D070, name="config_parse_file") # will guess the end address and set the given name Overwriting name config_parse_file
Fun[]
ASM function. Constructor:
- dump.Fun(name_or_addr)
In [25]: t2i.Fun(0xFF1C924C) Out[25]: ASM function: dispcheck at 0xff1c924c in 550d.108.0xff010000.bin In [26]: f = t2i.Fun("setFiltreOff") Unknown function: setFiltreOff. Using closest match: SetFilterOff. In [27]: f Out[27]: ASM function: SetFilterOff at 0xff064e98 in 550d.108.0xff010000.bin In [28]: f.name Out[28]: SetFilterOff In [29]: "%x"%f.addr Out[29]: ff064e98 In [30]: "%x"%f.end Out[30]: ff064eb8 In [31]: f.size Out[31]: 32 In [32]: f.called_by() ff064d30: UnpowerMicAmp+40 ff064120: sub_FF064114+12 In [33]: f.calls() ff064ea8: eb00094f bl @DebugMsg ff064eb4: eafffb4e b @audio_ic_write // End of function: SetFilterOff In [34]: f.disasm() // Start of function: SetFilterOff NSTUB(SetFilterOff, ff064e98): ff064e98: e92d4010 push {r4, lr} ff064e9c: e28f20f4 add r2, pc, #244 ; *'SetFilterOff' ff064ea0: e3a01003 mov r1, #3 ; 0x3 ff064ea4: e3a00014 mov r0, #20 ; 0x14 ff064ea8: eb00094f bl @DebugMsg ff064eac: e8bd4010 pop {r4, lr} ff064eb0: e3a00c31 mov r0, #12544 ; 0x3100 ff064eb4: eafffb4e b @audio_ic_write // End of function: SetFilterOff In [35]: f.sig Out[35]: push add mov mov bl pop mov b In [36]: f.refs() SetFilterOff+4: ff064e9c: e28f20f4 add r2, pc, #244 ; *'SetFilterOff' 'SetFilterOff' ... In [37]: f.strings() SetFilterOff
Modules[]
match[]
Matches functions and addresses between different versions of firmware, or different camera models.
See GPL Tools/match.py.
guessfunc[]
This guesses function locations from calls (BL, BX), PUSH instructions, and loaded names which are not marked as functions. Function size is guessed with emusym.GuessFunction.
This is the main function. I prefer to run this before generating the HTML output.
In [38]: guessfunc.run(ml) Function 0xff82399c is outside ROM => skipping Function 0xff8922a4 is outside ROM => skipping ...
Those are little workers:
In [39]: analyze_names(dump) In [40]: analyze_push(dump) In [41]: analyze_bl(dump) In [42]: analyze_bx(dump)
html[]
Exports the disassembly to a (huge) set of HTML files, which can be browsed offline (like this example).
It can also do some automated firmware analysis.
In [43]: html.quick(ml) # quick disassembly of dump ml Disassembling autoexec.0x8A000.bin... ... Raw disassembly... [19% done, ETA 0:00:05]...
In [44]: html.full(ml) # quick(...) + symbolic function analysis Disassembling autoexec.0x8A000.bin... ... Running symbolic analysis for autoexec.0x8A000.bin... ... Function analysis... [91% done, ETA 0:00:19]...
In [45]: html.quick(D) # process all dumps from list D In [46]: html.full(D)
In [47]: html.update(ml) # only re-generate modified pages (experimental).
Some small workers:
In [48]: html.link2addr(0x8B194) Out[48]: <a href="0008a000.htm#_8B194">8B194</a> In [49]: html.link2func(ml.Fun("fprintf")) Out[49]: <a href="sub_0008eec0.htm">fprintf</a> In [50]: html.link2funcoff(0x8eee0)
disasm[]
Disassembly and code browsing.
In [51]: hex(BYTE(ml, 0x916b8)) Out[51]: 6D In [52]: hex(UINT32(ml, 0x916b8)) Out[52]: 6967616D In [53]: GuessString(ml, 0x916b8) Out[53]: magic lantern init done
In [54]: funcname(ml, 0x8eec0) Out[54]: fprintf In [55]: funcname(ml, 0x8b000) Out[55]: sub_8B000 In [56]: dataname(ml, 0x1ED0) Out[56]: 0x1ED0 (sounddev)
In [57]: guess_data(ml,0x8c228) Out[57]: 0x8c228: pointer to 0x2E5B0 (bmp_vram_info) In [58]: guess_data(ml,0x8e748) Out[58]: 0x8e748: pointer to 'They try to set code to %d' In [59]: guess_data(ml,0x8dca8) Out[59]: @menu_add
fileutil[]
Utils for working with files.
In [60]: change_ext("foo.py", ".jpg") Out[60]: foo.jpg s = capture(func, *args, **kwargs) # capture output and result of func
stats[]
Sorts functions by number of calls to them:
In [61]: stats.calls_to(t2i) ... 1753: dialog_label_item 2555: assert_0 16694: DebugMsg
Or by how many other functions they call:
In [62]: stats.calls_from(t2i) ... 103: sub_FF300BA0 sub_FF2FF400,JudgeBottomInfoDispT ... 133: sub_FF0340B4 release_mem,sub_FF031E08,LVCAF_Re ... 215: IDLEHandler release_mem,EndMovieRecSequence_N ...
The most called functions seem to be DebugMsg and assert.
idapy[]
IDAPython compatibility layer. Also contains functions from IDAPython/utils.py and other small stuff.
They are NOT fully compatible with IDAPython! I only wrote it to help porting my existing scripts.
The first thing you have to do before using any of those functions is:
In [63]: select_dump(dump)
where dump is a Dump object; for example:
In [64]: select_dump(t2i)
Or use the magic command sel, which does the same thing.
In [65]: GetFirstWord("abc def") Out[65]: abc In [66]: getRegs03("R1, R2 and maybe R5") Out[66]: [1, 2] In [67]: getRegsS("R1, R2 and maybe R5") Out[67]: ['R1', 'R2', 'R5'] In [68]: filter_non_printable("\x07buzz!") Out[68]: buzz! In [69]: print GetDisasm(0xff064bbc) subeq r1, pc, #2944 ; 0xb80 In [70]: GetMnem(0xff064bbc) Out[70]: SUB In [71]: GetMnef(0xff064bbc) Out[71]: SUBEQ In [72]: GetOpnd(0xff064bbc, 0) Out[72]: R1 In [73]: GetOpnd(0xff064bbc, 1) Out[73]: PC In [74]: GetOpnd(0xff064bbc, 2) Out[74]: #2944 In [75]: GetOpType(0xff064bbc, 0) Out[75]: 1 In [76]: GetOpType(0xff064bbc, 1) Out[76]: 1 In [77]: GetOpType(0xff064bbc, 2) Out[77]: 5 In [78]: GetOperandValue(0xff064bbc, 0) Out[78]: 1 In [79]: GetOperandValue(0xff064bbc, 1) Out[79]: 15 In [80]: GetOperandValue(0xff064bbc, 2) Out[80]: 2944
In [81]: print GetDisasm(0xff48e5a4) stmiane r5!, {r6, r10, r12, lr} In [82]: GetModeSuffix(0xff48e5a4) Out[82]: IA In [83]: GetCondSuffix(0xff48e5a4) Out[83]: NE In [84]: OppositeSuffix("EQ") Out[84]: NE
In [85]: print GetDisasm(0xff064b90) ldrb r0, [r0, #16] In [86]: GetExtraSuffixes(0xff064b90) Out[86]: B In [87]: GetByteSuffix(0xff064b90) Out[87]: B In [88]: GetHalfwordSuffix(0xff064b90) In [89]: GetFlagSuffix(0xff064b90)
In [90]: print GetDisasm(0xff064bb4) cmpne r0, #5 ; 0x5 In [91]: ChangesFlags(0xff064bb4) Out[91]: True
In [92]: GetString(0xff011e1c) Out[92]: akashimorino
In [93]: isFuncStart(0xff064e98) Out[93]: True In [94]: GetFunctionName(0xff064e98) Out[94]: SetFilterOff In [95]: GetFuncOffset(0xff064e98) Out[95]: SetFilterOff In [96]: FuncItems(0xff064e98) Out[96]: [4278603416L, 4278603420L, 4278603424L, 4278603428L, 4278603432L, 4278603436L, 4278603440L, 4278603444L]
In [97]: SegStart(0xff064e98) Out[97]: 4278255616 In [98]: SegEnd(0xff064e98) Out[98]: 4283881244
In [99]: CodeRefsTo(0xff064e98) Out[99]: [4278603056L, 4278599968L] In [100]: DataRefsTo(1ED0) ... In [101]: CodeRefsFrom(0xff064e98) Out[101]: [4278612972L, 4278598644L] In [102]: DataRefsFrom(0xff064e98) Out[102]: [4278603672L, 1182033235, 3, 20, 3912040463L, 12544, 3912056945L]
bunch[]
Something like a struct.
In [103]: b = Bunch(foo=5, baz="hello") In [104]: b.foo Out[104]: 5 In [105]: b.baz Out[105]: hello
emusym[]
Symbolic emulation of ASM code, based on SymPy. Theory is here: IDAPython/Static_analysis
These functions operate on the currently selected dump, therefore you have to select one first.
In [106]: sel ml In [107]: CP = emusym.find_code_paths(0x8dac8) # extract code path by doing a recursive branch analysis In [108]: emusym.create_graph(CP, "myfunc.svg") # generate a graph of code flow, with graphviz
In [109]: cp = CP[0] # select first code path In [110]: emusym.resetArm() # reset the symbolic ARM registers and memory contents In [111]: emusym.emusym_code_path(cp) # run one code paths under symbolic emulation In [112]: emusym.emusym_code_path(cp, codetree=True) # return a code tree, useful for decompiling Out[112]: SEQ(IF(MEM(8 + MEM(unk_R3)), IFB(NE, SEQ(IF(MEM(4 + MEM(unk_R3)), IFB(TRUE, SEQ(MEMWRITE( ... In [113]: emusym.split_code_path_ea_cond(cp) # split the code path in two plain lists Out[113]: ([580296, 580300, 580308, 580312, 580316, 580320, 580328, 580332, 580336, 580344, 580344], [[], [], ['NE'], ['NE'], ['NE'], [], ['EQ'], ['EQ'], ['EQ'], ['EQ'], ['EQ']])
In [114]: emusym.GuessFunction(0x8ea48) # guesses the end of the function, when you know the start Size: 424 Out[114]: 584684
bkt[]
Backtracing: solves the contents of ARM registers / memory addresses by symbolic emulation of ASM code, going backwards until all requested unknowns are found. Theory is here: IDAPython/Backtracing.
In [115]: ea = 0x8CE98 In [116]: g ea-28 8ce7c: e1a0700a mov r7, r10 8ce80: e3a00032 mov r0, #50 ; 0x32 8ce84: e3a01003 mov r1, #3 ; 0x3 8ce88: e59f2090 ldr r2, [pc, #144] ; **'%s: ERROR Deleting config' 8ce8c: e59f3074 ldr r3, [pc, #116] ; **'config_parse' 8ce90: e59fa074 ldr r10, [pc, #116] ; 0x8cf0c: pointer to 0xFF0673EC (DebugMsg) 8ce94: e1a0e00f mov lr, pc 8ce98: e12fff1a bx r10 8ce9c: e3570000 cmp r7, #0 ; 0x0 8cea0: 159f507c ldrne r5, [pc, #124] ; 0x8cf24: pointer to 0xFF0182CC (free) In [117]: bkt.back_solve(ea, ['ARM.R0','ARM.R1']) # a faster heuristic Out[117]: [50, 3] In [118]: bkt.back_solve_slow(ea, ['ARM.R0','ARM.R1']) # slow, dumb, but reliable Out[118]: [50, 3] In [119]: bkt.go_back([ea]) Out[119]: [577172, 577176] In [120]: bkt.go_back([ea-4,ea]) Out[120]: [577168, 577172, 577176] In [121]: bkt.find_func_call(ea, 4) Out[121]: (4278612972L, 'DebugMsg', "(0x32, 3, '%s: ERROR Deleting config', 'config_parse')") In [122]: bkt.trace_calls_to("DebugMsg", 4) Function found at ff0673ec Out[122]: {570628: (4278612972L, 'DebugMsg', "(0x12, 3, '%s created (and exiting)', 'null_task')"), ... ...
These functions find the function which is called at a given address, using backtracing if needed.
In [123]: bkt.subaddr_bl(0x8CD74) Out[123]: 575428 In [124]: bkt.subaddr_bx(0x8CE98) Out[124]: 4278612972 In [125]: bkt.subaddr_mov(addr) # for MOV PC, ...
deco[]
Experimental decompiler based on SymPy (see module emusym). Only works for functions without loops.
In [126]: print deco.decompile(0x8D070) *(-4 + sp0) = lr0 *(-8 + sp0) = unk_R6 *(-12 + sp0) = unk_R5 *(-16 + sp0) = unk_R4 FIO_Open(arg0, 0x1000, arg2, ...) => ret_FIO_Open_8D084 strcpy(0xb2818, arg0, arg2, ...) => ret_strcpy_8D0A0 if ret_FIO_Open_8D084 == -1: *(0xb27d4) = 0xb2854 return 0xb27d4 config_parse(ret_FIO_Open_8D084, arg0, arg2, strcpy) => ret_config_parse_8D0B8 FIO_CloseFile(handle=ret_FIO_Open_8D084) => ret_FIO_CloseFile_8D0CC *(0xb27d4) = ret_config_parse_8D0B8 return 0xb27d4 !end
cache[]
Stores the result of lengthy computations.
In [127]: cache.access("my_computation", func_which_takes_ages_to_run) In [128]: cache.disable() In [129]: cache.enable() In [130]: cache.clear()
There is also some experimental support for persistent cache, but it's disabled by default.
progress[]
Simple progress indicator; also computes ETA.
In [131]: progress("Doing some useless stuff...") In [132]: time.sleep(1) In [133]: progress(0.25) Doing some useless stuff... [25% done, ETA 0:00:03]... In [134]: time.sleep(2) In [135]: progress(0.75) Doing some useless stuff... [75% done, ETA 0:00:01]...
doc[]
Make Wiki docs from IPython commands included in main text. Something like auto-generating docs from docstrings.
You write:
This is an example: 2+2
and you get:
This is an example: In [1]: '''2+2''' Out[1]: 4
All the code to be executed in IPython is prefixed by:
- four spaces: normal code, i.e. show input command, run it and show output.
- three spaces and %: show input command, run it, but do not show output.
- three spaces and ~: show input command, but do not run it.
- three spaces and #: run the command, but do not show anything.
- three spaces and [list of indices]: display only those lines of output; use a string instead of index to display anything you like.
e.g.:
[1,2,"...",-1]ml.strings
displays the first line, second line, ellipsis and then the last line. Also, in this mode, max line length is limited to 100.
Spacing is critical, so pay attention! Also, 4 spaces and % is a magic command.
To make a doc:
doc.run("scripts/doc/api-ref.wikipy")
Result is in scripts/doc/api-ref.wiki, with Wiki markup codes.
Enjoy!
--Alexdu 16:56, December 1, 2010 (UTC)