Magic Lantern Firmware Wiki
 
(33 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
This is the script for matching functions and data addresses between a bunch of IDC databases.
 
This is the script for matching functions and data addresses between a bunch of IDC databases.
   
  +
Alternative tools which do (more or less) the same job:
==Usage==
 
  +
* [[Gensig_finsig|Gensig/Finsig from CHDK]]
  +
* [http://code.google.com/p/patchdiff2/ patchdiff2]
  +
* [http://corelabs.coresecurity.com/index.php?module=Wiki&action=view&type=tool&name=turbodiff turbodiff]
   
  +
==Theory==
[code will be uploaded soon]
 
  +
Also check [[IDAPython/Firmware matching]]
   
  +
===Function signature===
====Dependencies====
 
  +
It's the sequence of mnemonics of each instruction, also with flags. To see it, you may use a Fun object:
* Python (I use 2.6 under Linux). Just to be sure, install numpy/scipy and ipython.
 
* arm-elf-gcc in your PATH (see [[Build instructions/550D]] for how to do that)
 
* '''It doesn't require IDAPython nor IDA''', just the IDC files.
 
   
  +
In [1]: '''f = Fun(t2i,"setFiltreOff")'''
====Preparing input files====
 
  +
Unknown function: setFiltreOff. Using closest match: SetFilterOff.
Prepare a working directory where you will put the input files. You will need:
 
  +
* Some dumps, with the .bin extension. Include the load address in the dump name.
 
  +
In [2]: '''f.sig'''
* Some IDC files. Try to give them names somewhat similar to the dumps, to help the autodetection.
 
  +
Out[2]: push add mov mov bl pop mov b
* The script (called for now ''match.py''), in the same folder (or in PATH, if you like)
 
   
  +
Exact matches are easiest to find, since they can be indexed. If the function start and end addresses are known, that helps a lot. If not, the Aho-Corasick algorithm is *extremely* fast, but my code is a bit buggy.
For example, those names are valid:
 
5D_204_06_0xff810000.bin
 
550D_108_05_0xff010000.bin
 
500D_0xff010000.bin
 
5D 204 AJROM0.idc
 
550D_108_20101116_indy_ROM0.idc
 
   
====Running====
+
===Evaluating matches===
  +
Each match found is evaluated with a score. The bigger the score, the better the match. Ideally, matches with positive score should be OK, and matches with negative score should be bad.
   
  +
====Function pairs with identical signature====
Then just say:
 
python match.py
 
   
  +
Scoring for those functions is computed in ''data_match_funcpair''. Each line has one or two data references (first is a number, and then, if that number is a ROM address, the second data ref is the contents of that address).
and you should get something like this:
 
   
  +
We can divide references into '''small numbers''' (let's say < 0x1000), which are offsets, small constants and other uninteresting stuff, and '''large numbers''', which can yield data matches. If the reference is actually a string, we'll consider it a separate case. Now let's compute:
Input files:
 
===============================================================================
 
Binary dump (*.bin) LoadAddr IDC database (*.idc)
 
===============================================================================
 
5D_204_06_0xff810000.bin FF810000 5D 204 AJROM0.idc
 
500D_0xff010000.bin FF010000 n/a
 
550D_108_05_0xff010000.bin FF010000 550D_108_20101116_indy_ROM0.idc
 
===============================================================================
 
Disassembling 5D_204_06_0xff810000.bin <ff810000>... ok
 
Disassembling 500D_0xff010000.bin <ff010000>... ok
 
Disassembling 550D_108_05_0xff010000.bin <ff010000>... ok
 
Parsing 5D 204 AJROM0.idc... found 40692 MakeName's and 19191 MakeFunction's
 
Parsing 550D_108_20101116_indy_ROM0.idc... found 56768 MakeName's and 18053 MakeFunction's
 
Parsing disassembly of 5D_204_06_0xff810000.bin...
 
found 1263894 lines
 
Parsing disassembly of 500D_0xff010000.bin...
 
found 1171162 lines
 
Parsing disassembly of 550D_108_05_0xff010000.bin...
 
found 1395198 lines
 
Creating codesigs for 5D_204_06_0xff810000.bin...
 
Creating codesigs for 550D_108_05_0xff010000.bin...
 
saving cache... ok
 
Found 6623 raw code matches between 550D_108_05_0xff010000.bin and 5D_204_06_0xff810000.bin.
 
   
  +
<math>smallmatch = \dfrac{smallmatch_{OK}}{smallmatch_{total}}</math>
====Results====
 
To find the results, just sort the working directory by modification date.
 
* match-log.txt: shows detailed info about the matching process, for each pair of functions.
 
   
  +
<math>bigmatch = \dfrac{bigmatch_{OK}}{bigmatch_{total}}</math>
   
  +
<math>stringmatch = \dfrac{stringmatch_{OK}}{stringmatch_{total}}</math>
==Advanced use and debugging==
 
   
  +
A string match is much more important than a simple number match, so let's weight them like this:
===Interactive console===
 
You can run it in IPython; after the script finishes, you can poke around and make various queries.
 
   
  +
<math>score_m = (smallmatch - 0.5) \sqrt{smallmatch_{total}} + (bigmatch - 0.5) \sqrt{smallmatch_{total}} + 20 \cdot (stringmatch - 0.5) \sqrt{stringmatch_{total}}</math>
$ ipython
 
In [1]: '''run match.py'''
 
...
 
   
  +
If the strings are almost identical, their match ratio would be also added to <math>stringmatch_{OK}</math>.
   
  +
Further hints may be given by how many times the function is called, and how many function it calls. These numbers don't have to match exactly, so instead let's compute a mismatch:
===Internals===
 
  +
<math>\mathrm{mismatch}(a,b) = \dfrac{|a-b|}{(a+b)/2}</math>
   
  +
With this, the score of a function pair becomes:
A dump is identified by its file name, used as index into the various dictionaries used.
 
   
  +
<math> score = score_m - 2 \cdot \mathrm{mismatch}(refsto_1, refsto_2) - 2 \cdot \mathrm{mismatch}(refsfrom_1, refsfrom_2) </math>
====Global variables====
 
  +
* bins: list of dumps (i.e. file names with .bin extension)
 
  +
Does it work? We'll see. Is there a better formula? Sure, it just waits for you to find it :)
In [2]: '''bins''' # what dumps we have loaded?
 
  +
Out[2]: ['550D_108_05_0xff010000.bin', '5D_204_06_0xff810000.bin']
 
  +
====Function pairs with different signature====
  +
  +
That's more difficult.
  +
  +
====Data address pairs====
  +
  +
These pairs are obtaining by comparing two functions with identical signature, and matching their referenced addresses line by line.
  +
  +
For those addresses, we have these sources of information:
  +
* the quality of the pair(s) from which a given data pair was obtained
  +
* analyzing how is the address referenced throughout the firmware
  +
  +
Since the score of a function pair is additive, we can try to add the scores of all the pairs which yielded this match. If a data match was detected from two matching function pairs, or more, that's much better than being detected from only one function pair. Also, if the data pair was detected from 10 functions or more, it's almost certain it's a good match.
  +
  +
So let's try this formula:
  +
  +
<math>score_{datapair} = AVG(funcpairs) \cdot \sqrt{COUNT(funcpairs)-0.95}</math>
  +
  +
  +
==Usage==
  +
This script is included in [[GPL Tools/ARM console]].
  +
  +
As a tutorial, let's try to generate a stub for 550D 1.0.9. Let's say we already have the stubs and firmware dumps for 5d2 2.0.4 and 550d 1.0.8. No IDC files are needed (only the stubs).
  +
  +
In [3]: '''D = load_dumps("(108|109|204)")'''
 
 
In [3]: '''t2i, mk2 = bins''' # give a short name to each one
+
In [4]: '''t108, t109, mk2 = D'''
   
  +
In [5]: '''M = match.CodeMatch(D)'''
* loadaddrs: dictionary of load addresses for each dump
 
  +
Creating codesigs for 550d.108.0xff010000.bin...
* idcs: dictionary of idc file names for each dump
 
  +
Creating codesigs for 550d.109.0xff010000.bin...
* D: dictionary containing lots of info about dumps: ROM contents, IDC names, functions, signatures...
 
  +
Creating codesigs for 5d2.204.0xff810000.bin...
In [4]: '''D[t2i].ROM[0xff011e1c]'''
 
  +
...
Out[4]: 1935764321
 
  +
Remaining 14915 code matches between 550d.108.0xff010000.bin and 550d.109.0xff010000.bin.
  +
Remaining 9752 code matches between 550d.109.0xff010000.bin and 5d2.204.0xff810000.bin.
  +
Remaining 9906 code matches between 550d.108.0xff010000.bin and 5d2.204.0xff810000.bin.
   
  +
In [6]: '''DM, SRC = match.DataMatch(M)'''
  +
Found 20824 possible code matches between 550d.108.0xff010000.bin and 550d.109.0xff010000.bin.
  +
Found 1053 data matches between 550d.108.0xff010000.bin and 550d.109.0xff010000.bin.
  +
Found 37765 possible code matches between 550d.109.0xff010000.bin and 5d2.204.0xff810000.bin.
  +
...
  +
Found 4010 data matches between 550d.109.0xff010000.bin and 5d2.204.0xff810000.bin.
  +
Found 37768 possible code matches between 550d.108.0xff010000.bin and 5d2.204.0xff810000.bin.
  +
Found 4001 data matches between 550d.108.0xff010000.bin and 5d2.204.0xff810000.bin.
   
  +
In [7]: '''match.sync(D, M, DM, idc=True, stub=False)'''
====Functions====
 
  +
...
* BYTE(bin, addr): read a byte from the ROM, from the dump whose file name is ''bin''
 
  +
# this will choose the best match for each function and save the results. IDC format is recommended, since it also contains function definitions, not just stubs.
In [5]: '''hex(BYTE(t2i, 0xff011e1c))'''
 
Out[5]: '0x61'
 
   
  +
In [8]: '''match.MakeStubs(t109, D, M, DM, SRC)'''
* INT32(bin, addr): shortcut for D[bin].ROM[addr]
 
  +
NSTUB(0xFF0673EC, DebugMsg) // OK (dumps=2, score=8.2e+02)
In [6]: '''hex(INT32(t2i, 0xff011e1c))'''
 
  +
NSTUB(0xFF1C69EC, FIO_CloseFile) // OK (dumps=2, score=79)
Out[6]: '0x73616b61'
 
  +
...
  +
NSTUB(0xFF1D6638, vsnprintf) // OK (dumps=2, score=51)
   
* GuessString(bin, addr): detect a string starting from addr
 
In [7]: '''GuessString(t2i, 0xff011e1c)'''
 
Out[7]: 'akashimorino'
 
   
  +
The generated stub is displayed at the terminal (just copy/paste it to its final destination).
* funcname(bin, addr): function name extracted from IDC, or sub_ABCD1234 if it's not found
 
  +
You'll also want to check the logs: there's a big, colorful one called ''MakeStubs-diff.htm'', which shows side-by-side diffs for matched functions. There are a few other text logs; see the reference section for details.
In [8]: '''funcname(t2i, 0xFF28AA58)'''
 
Out[8]: 'GetJpegInfo'
 
   
  +
==API reference==
* getname(bin, addr): similar to funcname, but used for other names (not functions).
 
In [10]: '''getname(t2i, 0x26284)'''
 
Out[10]: '0x26284 (sd_device)'
 
   
  +
===CodeMatch*===
====Functions for interactive use====
 
  +
These functions implement some code (i.e. function) matching algorithms.
   
  +
The first one search for matches only in the functions marked in the database (usually loaded from IDC). It's fast, but it relies on functions being correctly identified for all the firmware dumps, so it may miss many functions.
* '''find_funcs(bin, regex, ratio=1, num=10)''': Find functions using either a regex search, or a fuzzy string match
 
  +
In [8]: '''M = match.CodeMatch_dbfuncs(D)'''
In [1]: '''find_funcs(t2i, r"Flavor[C|S]")''' # when ratio=1, it uses a regex search
 
ff205b24: FlavorSharpness
 
ff205c14: FlavorContrast
 
ff205dac: FlavorSaturation
 
ff205e9c: FlavorColorTone
 
   
  +
This one performs full-text signature search using an Aho-Corasick tree. It needs the functions to be correctly defined only in the source firmwares.
In [2]: '''find_funcs(mk2, "DebugMsg", 0.5)''' # when ratio < 1, this is the min. allowed ratio for fuzzy search
 
ff86af48: TH_DebugMsg
 
ff9b7660: AJ_called_by_DebugMsg
 
ff86b22c: AJ_DbgMgr.c
 
   
  +
My interpretation of aho-corasick search results is very slow (I'm counting spaces in the signature to find the function offsets) and will be optimized.
* '''find_refs(bin, value)''': look for references to a given name or value.
 
  +
  +
In [9]: '''M = match.CodeMatch_corasick(D)'''
  +
  +
Not implemented yet: a frontend to gensig/finsig. Useful for comparison and for creating side-by-side diffs for gensig/finsig matches.
  +
In [10]: '''M = match.CodeMatch_finsig(D)'''
  +
  +
The last one calls all the above algorithms and combines the results (it hopefully extracts the best from all algorithms):
  +
In [11]: '''M = match.CodeMatch(D)'''
  +
  +
All those functions return a dictionary which looks like this:
  +
In [12]: '''M.keys()'''
  +
Out[12]:
  +
[(Dump of 550d.108.0xff010000.bin, Dump of 550d.109.0xff010000.bin),
  +
(Dump of 550d.109.0xff010000.bin, Dump of 5d2.204.0xff810000.bin),
  +
(Dump of 550d.108.0xff010000.bin, Dump of 5d2.204.0xff810000.bin)]
 
 
In [1]: '''find_refs(t2i,0x2b74)'''
+
In [13]: '''len(M[t108,mk2])'''
  +
Out[13]: 9906
 
 
  +
In [14]: '''pair = M[t108,mk2][5]'''
DebugMsg+112:
 
ff067458: 2a000003 bcs ff06746c <_binary_550D_108_05_0xff010000_bin_start+0x5746c>
 
ff06745c: e59f00f4 ldr r0, [pc, #244] ; ff067558 <_binary_550D_108_05_0xff010000_bin_start+0x57558>
 
ff067460: e7901101 ldr r1, [r0, r1, lsl #2]
 
pointer to 0x2b74
 
 
 
  +
In [15]: '''funcpair((t108,mk2), pair)'''
... etc ...
 
  +
Out[15]: 0xff33cff8:sdcomSendData <-------> 0xffa1d5fc:AJ_sdcomSendData
  +
  +
In [16]: '''Score((t108,mk2), pair)'''
  +
Out[16]: 61.9513542013
  +
  +
These functions create some logs: code-matches-*.txt and raw-match-log-*.txt.
  +
  +
===DataMatch===
  +
Two matched function which identical code structure can yield matches for data addresses (like sounddev, additional_version etc.). They can also suggest matches for functions which are not identical, but they are called from the same place.
   
  +
This is the low-level function, which analyze only a pair of functions (and also provides a score):
In [2]: '''find_refs(t2i,"sounddev")'''
 
  +
In [17]: '''data_match_funcpair((t108,mk2), pair)'''
 
...
 
...
  +
big numbers match: 0.82 (212 / 260)
  +
STRING MATCH: 1 (27 / 27)
  +
score: 62
  +
Out[17]: (61.951354201275649, [(4278638820L, 4287044188L), (4281582832L, 4288793332L), (133060, 827 ...
   
  +
And this analyzes all the matches:
  +
In [18]: '''DM, SRC = match.DataMatch(M)'''
   
  +
Outputs: a dictionary of data matches (DM) and a dictionary of sources (SRC, i.e. matched functions from which the match was deducted).
* '''guess_data(bin, value)''': return a friendly name for ''value''. It detects whether ''value'' is a function address, a pointer to a string or a pointer to some other value in ROM (or just a plain number).
 
  +
This function also creates data-matches.txt and possible-code-matches.txt (one for data addresses and other for functions).
In [1]: '''print guess_data(t2i,0xff05de04)'''
 
  +
pointer to 0x2e5b0
 
  +
===Scoring===
  +
In [19]: '''Score(dumpair, addrpair)'''
 
 
In [2]: '''print guess_data(t2i,0xff011e1c)'''
+
In [20]: '''Score_refmatch(dumpair, addrpair, srcscores)'''
'akashimorino'
 
 
In [3]: '''print guess_data(t2i,0xff0673ec)'''
 
@DebugMsg
 
   
  +
In [21]: '''combined_score([1,2,10,20])'''
* '''show_diasam(bin, start, end)''': displays disassembly of the code between start and end address.
 
  +
Out[21]: 14.4080055872
* '''show_func(bin, f)''': displays disassembly of a function, given by name or address.
 
In [1]: '''show_func(t2i, "SetFilterOff")'''
 
 
 
  +
In [22]: '''combined_score([5])'''
// Start of function: SetFilterOff
 
  +
Out[22]: 1.11803398875
NSTUB(SetFilterOff, ff064e98):
 
  +
ff064e98: e92d4010 push {r4, r14}
 
  +
In [23]: '''combined_score([5,5])'''
ff064e9c: e28f20f4 add r2, pc, #244 ; *SetFilterOff
 
  +
Out[23]: 5.12347538298
ff064ea0: e3a01003 mov r1, #3 ; 0x3
 
  +
ff064ea4: e3a00014 mov r0, #20 ; 0x14
 
  +
===MakeStubs===
ff064ea8: eb00094f bl @DebugMsg
 
  +
ff064eac: e8bd4010 pop {r4, r14}
 
  +
This is used for creating stubs for a new firmware version. See the Usage section.
ff064eb0: e3a00c31 mov r0, #12544 ; 0x3100
 
  +
ff064eb4: eafffb4e b @audio_ic_write
 
  +
===SyncIDC===
// End of function: sub_FF064EB4
 
  +
Not implemented yet. It will write some IDCs for name syncing purposes.
  +
  +
==Logs==
  +
====MakeStubs-diff.htm====
  +
It is created by ''match.MakeStubs'' and contains the following:
  +
* for each function pair: a side-by-side diff
  +
* for each data pair: side-by-side diffs of the functions from which that data pair was found.
  +
====raw-match-log-*.txt====
  +
It shows detailed info about the matching process, for each pair of functions.
  +
====code-matches-*.txt====
  +
This contains the matched function addresses for all the firmware pairs, sorted by score.
  +
====data-matches.txt====
  +
This contains the matched data addresses (i.e. not functions) for all the firmware pairs.
  +
====possible-code-matches.txt====
  +
This contains matched functions found with the data matching method. They are not verified (yet) and may contain many bad matches.
  +
  +
  +
Happy Matching!
  +
  +
--[[User:Alexdu|Alexdu]] 18:12, December 1, 2010 (UTC)

Latest revision as of 09:14, 17 November 2011

This is the script for matching functions and data addresses between a bunch of IDC databases.

Alternative tools which do (more or less) the same job:

Theory[]

Also check IDAPython/Firmware matching

Function signature[]

It's the sequence of mnemonics of each instruction, also with flags. To see it, you may use a Fun object:

In [1]: f = Fun(t2i,"setFiltreOff")
Unknown function: setFiltreOff. Using closest match: SetFilterOff.

In [2]: f.sig
Out[2]: push add mov mov bl pop mov b 

Exact matches are easiest to find, since they can be indexed. If the function start and end addresses are known, that helps a lot. If not, the Aho-Corasick algorithm is *extremely* fast, but my code is a bit buggy.

Evaluating matches[]

Each match found is evaluated with a score. The bigger the score, the better the match. Ideally, matches with positive score should be OK, and matches with negative score should be bad.

Function pairs with identical signature[]

Scoring for those functions is computed in data_match_funcpair. Each line has one or two data references (first is a number, and then, if that number is a ROM address, the second data ref is the contents of that address).

We can divide references into small numbers (let's say < 0x1000), which are offsets, small constants and other uninteresting stuff, and large numbers, which can yield data matches. If the reference is actually a string, we'll consider it a separate case. Now let's compute:

A string match is much more important than a simple number match, so let's weight them like this:

If the strings are almost identical, their match ratio would be also added to .

Further hints may be given by how many times the function is called, and how many function it calls. These numbers don't have to match exactly, so instead let's compute a mismatch:

With this, the score of a function pair becomes:

Does it work? We'll see. Is there a better formula? Sure, it just waits for you to find it :)

Function pairs with different signature[]

That's more difficult.

Data address pairs[]

These pairs are obtaining by comparing two functions with identical signature, and matching their referenced addresses line by line.

For those addresses, we have these sources of information:

  • the quality of the pair(s) from which a given data pair was obtained
  • analyzing how is the address referenced throughout the firmware

Since the score of a function pair is additive, we can try to add the scores of all the pairs which yielded this match. If a data match was detected from two matching function pairs, or more, that's much better than being detected from only one function pair. Also, if the data pair was detected from 10 functions or more, it's almost certain it's a good match.

So let's try this formula:


Usage[]

This script is included in GPL Tools/ARM console.

As a tutorial, let's try to generate a stub for 550D 1.0.9. Let's say we already have the stubs and firmware dumps for 5d2 2.0.4 and 550d 1.0.8. No IDC files are needed (only the stubs).

In [3]: D = load_dumps("(108|109|204)")

In [4]: t108, t109, mk2 = D
In [5]: M = match.CodeMatch(D)
Creating codesigs for 550d.108.0xff010000.bin...
Creating codesigs for 550d.109.0xff010000.bin...
Creating codesigs for 5d2.204.0xff810000.bin...
...
Remaining 14915 code matches between 550d.108.0xff010000.bin and 550d.109.0xff010000.bin.
Remaining 9752 code matches between 550d.109.0xff010000.bin and 5d2.204.0xff810000.bin.
Remaining 9906 code matches between 550d.108.0xff010000.bin and 5d2.204.0xff810000.bin.
In [6]: DM, SRC = match.DataMatch(M)
Found 20824 possible code matches between 550d.108.0xff010000.bin and 550d.109.0xff010000.bin.
Found 1053 data matches between 550d.108.0xff010000.bin and 550d.109.0xff010000.bin.
Found 37765 possible code matches between 550d.109.0xff010000.bin and 5d2.204.0xff810000.bin.
...
Found 4010 data matches between 550d.109.0xff010000.bin and 5d2.204.0xff810000.bin.
Found 37768 possible code matches between 550d.108.0xff010000.bin and 5d2.204.0xff810000.bin.
Found 4001 data matches between 550d.108.0xff010000.bin and 5d2.204.0xff810000.bin.
In [7]: match.sync(D, M, DM, idc=True, stub=False)
...
# this will choose the best match for each function and save the results. IDC format is recommended, since it also contains function definitions, not just stubs.
In [8]: match.MakeStubs(t109, D, M, DM, SRC)
NSTUB(0xFF0673EC, DebugMsg)                       // OK (dumps=2, score=8.2e+02)
NSTUB(0xFF1C69EC, FIO_CloseFile)                  // OK (dumps=2, score=79)
...
NSTUB(0xFF1D6638, vsnprintf)                      // OK (dumps=2, score=51)


The generated stub is displayed at the terminal (just copy/paste it to its final destination). You'll also want to check the logs: there's a big, colorful one called MakeStubs-diff.htm, which shows side-by-side diffs for matched functions. There are a few other text logs; see the reference section for details.

API reference[]

CodeMatch*[]

These functions implement some code (i.e. function) matching algorithms.

The first one search for matches only in the functions marked in the database (usually loaded from IDC). It's fast, but it relies on functions being correctly identified for all the firmware dumps, so it may miss many functions.

In [8]: M = match.CodeMatch_dbfuncs(D)

This one performs full-text signature search using an Aho-Corasick tree. It needs the functions to be correctly defined only in the source firmwares.

My interpretation of aho-corasick search results is very slow (I'm counting spaces in the signature to find the function offsets) and will be optimized.

In [9]: M = match.CodeMatch_corasick(D)

Not implemented yet: a frontend to gensig/finsig. Useful for comparison and for creating side-by-side diffs for gensig/finsig matches.

In [10]: M = match.CodeMatch_finsig(D)

The last one calls all the above algorithms and combines the results (it hopefully extracts the best from all algorithms):

In [11]: M = match.CodeMatch(D)

All those functions return a dictionary which looks like this:

In [12]: M.keys()
Out[12]: 
[(Dump of 550d.108.0xff010000.bin, Dump of 550d.109.0xff010000.bin),
 (Dump of 550d.109.0xff010000.bin, Dump of 5d2.204.0xff810000.bin),
 (Dump of 550d.108.0xff010000.bin, Dump of 5d2.204.0xff810000.bin)]

In [13]: len(M[t108,mk2])
Out[13]: 9906

In [14]: pair = M[t108,mk2][5]

In [15]: funcpair((t108,mk2), pair)
Out[15]:                                     0xff33cff8:sdcomSendData <-------> 0xffa1d5fc:AJ_sdcomSendData

In [16]: Score((t108,mk2), pair)
Out[16]: 61.9513542013

These functions create some logs: code-matches-*.txt and raw-match-log-*.txt.

DataMatch[]

Two matched function which identical code structure can yield matches for data addresses (like sounddev, additional_version etc.). They can also suggest matches for functions which are not identical, but they are called from the same place.

This is the low-level function, which analyze only a pair of functions (and also provides a score):

In [17]: data_match_funcpair((t108,mk2), pair)
...
big numbers match: 0.82 (212 / 260)
STRING MATCH: 1 (27 / 27)
score: 62
Out[17]: (61.951354201275649, [(4278638820L, 4287044188L), (4281582832L, 4288793332L), (133060, 827 ...

And this analyzes all the matches:

In [18]: DM, SRC = match.DataMatch(M)

Outputs: a dictionary of data matches (DM) and a dictionary of sources (SRC, i.e. matched functions from which the match was deducted). This function also creates data-matches.txt and possible-code-matches.txt (one for data addresses and other for functions).

Scoring[]

In [19]: Score(dumpair, addrpair)

In [20]: Score_refmatch(dumpair, addrpair, srcscores)
In [21]: combined_score([1,2,10,20])
Out[21]: 14.4080055872

In [22]: combined_score([5])
Out[22]: 1.11803398875

In [23]: combined_score([5,5])
Out[23]: 5.12347538298

MakeStubs[]

This is used for creating stubs for a new firmware version. See the Usage section.

SyncIDC[]

Not implemented yet. It will write some IDCs for name syncing purposes.

Logs[]

MakeStubs-diff.htm[]

It is created by match.MakeStubs and contains the following:

  • for each function pair: a side-by-side diff
  • for each data pair: side-by-side diffs of the functions from which that data pair was found.

raw-match-log-*.txt[]

It shows detailed info about the matching process, for each pair of functions.

code-matches-*.txt[]

This contains the matched function addresses for all the firmware pairs, sorted by score.

data-matches.txt[]

This contains the matched data addresses (i.e. not functions) for all the firmware pairs.

possible-code-matches.txt[]

This contains matched functions found with the data matching method. They are not verified (yet) and may contain many bad matches.


Happy Matching!

--Alexdu 18:12, December 1, 2010 (UTC)