Originally Posted by
Empted
Have nearly same idea to avoid any hardcoded function offsets. Currently trying to make some function signatures that are based on assembly analys. Like conditional jump count, hardcoded constants, number of calls, returns, type of params and so on. This should end up in more robust offset finding, but still with some global changes even this kind of searching will fail. But implementing something like match measure, can probably solve this. The process will not be completely autonomic, but with some kind of choosing the function that fits good under the signature.
P.S. the idea was born to match functions from MAC binaries with the PC versions, but it's double that hard because of different assembly styles and params passing and so on.
This sounds promising for finding correspondences between mac and PC binaries of the same version, but consider for example Spell_C_Failed, which got an additional parameter since the last patch. Although this would be much more reliable than simple pattern matching.
Originally Posted by
boredevil
Awesome reads on that topic. I was pondering in the same direction as the blog post, although the techniques mentioned might not be generic enough to diff between patches.
We don't need exact matches, rather some way to wildcard instructions.
Perhaps one could intermingle it with some nondeterminism/optional nodes.
Meanwhile I am reconsidering pattern matching, but with a more nondeterministic notion, perhaps with a tree/graph structure, to model data flows. E.g.:
Code:
useeax = seq("mov eax, <x>", "mov eax, [eax+<y>]")
useebx = seq("mov ebx, <x>", "call [ebx]")
res = seq(all(useeax, useebx), "add eax 5")
Where the instructions are replaced with their respective byte patterns.
seq requires the matches to be in sequential order, with 0..n bytes inbetween. I figure there might be other matchers like seq_strict(...), any(...), all(...), many(m) [1..n], opt(m) [0..1].
"ab d9" is actually shorthand for seq_strict(0xab, 0xd9). I wonder if this can be realized sufficiently efficient.
Edit: I'm baffled. asmDiff works really well on Spell_C_Failed. I'll try some other samples and then probably look into I could leverage that.