Exact match. Not showing close matches.
PICList
Thread
'[OT] batch hex file similarity comparison tool?'
2006\01\13@141417
by
William Couture
Hi all,
The company I work for is suing a Taiwanese company for making an illegal
"clone" of one of our products. I'm trying to help prove that the code in the
ROM is actually ours. This started before I joined the company, so I'm not
familiar with all the details.
There are a lot of versions of code for this, and I need to find the one they
copied.
Today, however, my boss suddenly told me that they made some changes,
it's not an exact copy.
So, I need to do a "similarity search" over several hundred hex files scattered
throughout a file tree to find the one that they derived their code from.
Has anyone ever heard of such a tool?
Thanks,
Bill
--
Psst... Hey, you... Buddy... Want a kitten? straycatblues.petfinder.org
2006\01\13@143031
by
Alex Harford
On 1/13/06, William Couture <spam_OUTbcoutureTakeThisOuT
gmail.com> wrote:
>
> Today, however, my boss suddenly told me that they made some changes,
> it's not an exact copy.
>
> So, I need to do a "similarity search" over several hundred hex files scattered
> throughout a file tree to find the one that they derived their code from.
>
> Has anyone ever heard of such a tool?
No but I've been interested in this as well because I've been working
on reverse engineering the GM ECMs and I'd like to have a nice way of
automating the search for large chunks of identical (or very similar)
binary data. I'm sure GM reused a large amount of code across the
various ECMs but I just haven't gotten a round tuit yet.
AFAIK the unix 'diff' program only works on text, and I haven't had
any luck with the binary diff tools I've found on the net.
Since I'm not a computer scientist I'm sure I'm missing out on a more
efficient algorithm, but simple program could probably be written to
read a chunk of binary data (4 bytes lets say) and search for the same
pattern in another file. If that pattern is found, check the next
byte in both files until they don't match. Do this for every 4 byte
chunk in the original file and report the large chunks that were
found.
Alex
2006\01\13@143537
by
Robert Rolf
|
I would convert the hex files to a binary image to make the
similarities more visible.
Then use Symantec 'filecompare' in forced ASCII mode.
That way identical sequences will be detected and highlighted.
The other way would be to convert the binary files back into
source using an appropriate disassembler (they are everywhere on
the web) and again do a file compare.
Trying to do this with the hex files is a pointless exercise in
futility since the hex encoding process will make changes less visible.
i.e. referenced addresses will be different, but with source, you can
see the intent is identical.
Robert
William Couture wrote:
{Quote hidden}> Hi all,
>
> The company I work for is suing a Taiwanese company for making an illegal
> "clone" of one of our products. I'm trying to help prove that the code in the
> ROM is actually ours. This started before I joined the company, so I'm not
> familiar with all the details.
>
> There are a lot of versions of code for this, and I need to find the one they
> copied.
>
> Today, however, my boss suddenly told me that they made some changes,
> it's not an exact copy.
>
> So, I need to do a "similarity search" over several hundred hex files scattered
> throughout a file tree to find the one that they derived their code from.
>
> Has anyone ever heard of such a tool?
>
> Thanks,
> Bill
>
> --
> Psst... Hey, you... Buddy... Want a kitten? straycatblues.petfinder.org
>
2006\01\13@144533
by
Herbert Graf
|
On Fri, 2006-01-13 at 14:14 -0500, William Couture wrote:
> Hi all,
>
> The company I work for is suing a Taiwanese company for making an illegal
> "clone" of one of our products. I'm trying to help prove that the code in the
> ROM is actually ours. This started before I joined the company, so I'm not
> familiar with all the details.
>
> There are a lot of versions of code for this, and I need to find the one they
> copied.
>
> Today, however, my boss suddenly told me that they made some changes,
> it's not an exact copy.
>
> So, I need to do a "similarity search" over several hundred hex files scattered
> throughout a file tree to find the one that they derived their code from.
>
> Has anyone ever heard of such a tool?
Don't know what OS you run, but diff under linux/unix will do that very
simply. I'm sure there's something similar for the OS you're interested
in.
TTYL
-----------------------------
Herbert's PIC Stuff:
http://repatch.dyndns.org:8383/pic_stuff/
2006\01\13@151443
by
Wouter van Ooijen
> So, I need to do a "similarity search" over several hundred
> hex files scattered
> throughout a file tree to find the one that they derived
> their code from.
>
> Has anyone ever heard of such a tool?
Maybe concatenate the candidate .hex file with the one from the pirate,
and zip the combination. A copy or partial copy will compress better
than a different application. You might need some experimenting to
interpret the results. But at least you can automate the process.
Wouter van Ooijen
-- -------------------------------------------
Van Ooijen Technische Informatica: http://www.voti.nl
consultancy, development, PICmicro products
docent Hogeschool van Utrecht: http://www.voti.nl/hvu
2006\01\13@154236
by
William Chops Westfield
>> Today, however, my boss suddenly told me that they made some changes,
>> it's not an exact copy.
>>
>> So, I need to do a "similarity search" over several hundred hex
>> files scattered throughout a file tree to find the one that they
>> derived their code from.
For something like PIC code, where you HAVE the original source
code, I think you'd be best off disassembling the code, running
it through some sort of symbol assignment based on your source,
and comparing the results...
BillW
2006\01\13@164145
by
Michael Dipperstein
|
> From: .....piclist-bouncesKILLspam
@spam@mit.edu [piclist-bounces
KILLspammit.edu] On
Behalf
> Of William Couture
> So, I need to do a "similarity search" over several hundred hex files
> scattered
> throughout a file tree to find the one that they derived their code
from.
>
> Has anyone ever heard of such a tool?
The people who do DNA sequence alignment have tools for calculating
similarity between DNA sequences. One of the algorithms that they use
is called PAM (Percent Acceptable Mutation). I don't remember much
about it, but I've heard of people adopting it for spell checkers. It
might be useful for stings of data bytes too.
You also need to be careful about the hex files you compare. There's
nothing magical about the order that the lines appear in or the number
of bytes on a line. Unprogrammed locations don't have to be included
either.
My first thought at handling hex file variations is to reformat them
using a program reads the hex file data into an array the size of the
PIC's memory, then writes out the sorted data bytes to a file to be used
for the comparison.
-Mike
2006\01\13@172127
by
andrew kelley
Or as an alternative to all those, write a utility which will read in
the opcodes and search for pattern matches between the two.. (because
the operands will most likely differ at least for jumps (if they
shuffled the code around)..) and also for memory if they changed that
around too.. But basically comparing code sequences.. As a start I
can send you some code that I have for processing PIC opcodes (in C).
Email me offlist if you are interested.. ( I could even write the
pattern matching code...)
andrew
2006\01\13@182254
by
James Newtons Massmind
Resync Hex file comparison is available in:
Hex Works from BreakPoint Software
010Edit from Sweetscape Software
---
James.
> {Original Message removed}
2006\01\13@195226
by
Rolf
A while ago there was a big deal between PearPC and CherryOS (Mac
Emulators).
Google search for that because people used some very fancy mechanisms to
identify common code patterns betweenthem (lots).
The "drunken-blog" was the best reference, but it is no longer
available, although it is still in the google cache.
Rolf
William Couture wrote:
{Quote hidden}> Hi all,
>
> The company I work for is suing a Taiwanese company for making an illegal
> "clone" of one of our products. I'm trying to help prove that the code in the
> ROM is actually ours. This started before I joined the company, so I'm not
> familiar with all the details.
>
> There are a lot of versions of code for this, and I need to find the one they
> copied.
>
> Today, however, my boss suddenly told me that they made some changes,
> it's not an exact copy.
>
> So, I need to do a "similarity search" over several hundred hex files scattered
> throughout a file tree to find the one that they derived their code from.
>
> Has anyone ever heard of such a tool?
>
> Thanks,
> Bill
>
> --
> Psst... Hey, you... Buddy... Want a kitten? straycatblues.petfinder.org
>
>
2006\01\13@211045
by
Jose Da Silva
|
On January 13, 2006 11:14 am, William Couture wrote:
> The company I work for is suing a Taiwanese company for making an
> illegal "clone" of one of our products. I'm trying to help prove
> that the code in the ROM is actually ours. This started before I
> joined the company, so I'm not familiar with all the details.
>
> There are a lot of versions of code for this, and I need to find the
> one they copied.
1. Use a disassembler to reverse the code into source code.
2. Use just the command sequence to quickly find-out which one they
copied by using the unix "diff" command:
xxxx movlw xxx <-(the xxx stuff throws off a diff too quick)
xxxx addwf xxx
gets stripped-down to this:
movlw
addwf
etc...
This should probably give you the quickest answer to find the version
since you really don't care too much if they swapped RAM file registers
(this affects the byte values in the hex code) which is going to
throw-off your diff command quite quickly.
Let me know off-list if you want help.
http://www.JoesCat.com/micro/picchip.htm
2006\01\14@083709
by
William Couture
On 1/13/06, Jose Da Silva <.....DigitalKILLspam
.....joescat.com> wrote:
> On January 13, 2006 11:14 am, William Couture wrote:
> > There are a lot of versions of code for this, and I need to find the
> > one they copied.
>
> 1. Use a disassembler to reverse the code into source code.
> 2. Use just the command sequence to quickly find-out which one they
> copied by using the unix "diff" command:
> xxxx movlw xxx <-(the xxx stuff throws off a diff too quick)
> xxxx addwf xxx
> gets stripped-down to this:
> movlw
> addwf
> etc...
>
> This should probably give you the quickest answer to find the version
> since you really don't care too much if they swapped RAM file registers
> (this affects the byte values in the hex code) which is going to
> throw-off your diff command quite quickly.
Some more info (some of it learned yesterday afternoon, after my
initial question):
The original source code is C for a 68HC11. They did their changes
from a ROM. So, I have to figure out if they just patched the ROM
image, or did they disassemble, change, and re-assemble. Then,
maybe I can find the version they pirated... Bleh...
Bill
--
Psst... Hey, you... Buddy... Want a kitten? straycatblues.petfinder.org
2006\01\14@144612
by
Peter
One way to compute the similarity of data is as follows:
1. turn each file into a binary image (a .bin file should work)
2. compute the DFFT of the file using a fixed 'window'
3. compute the CoG of the normalized DFFT of each DFFT, using frequency
and amplitude as 2d space
4. sort the results from 3, using the 3-result of the file to be
compared with as reference.
Maybe I am not very clear in my explanation, ask for more. The algorythm
is used in image processing among other things (e.g. image recognition).
Also speech recognition and general pattern matching. Such an algorythm
should exist somewhere anyway. Step 3 can be replaced by other types of
calculations.
Peter
2006\01\14@160935
by
Jose Da Silva
|
On January 14, 2006 05:37 am, William Couture wrote:
{Quote hidden}> On 1/13/06, Jose Da Silva <
EraseMEDigitalspam_OUT
TakeThisOuTjoescat.com> wrote:
> > On January 13, 2006 11:14 am, William Couture wrote:
> > > There are a lot of versions of code for this, and I need to find
> > > the one they copied.
> >
> > 1. Use a disassembler to reverse the code into source code.
> > 2. Use just the command sequence to quickly find-out which one they
> > copied by using the unix "diff" command:
> > xxxx movlw xxx <-(the xxx stuff throws off a diff too
> > quick) xxxx addwf xxx
> > gets stripped-down to this:
> > movlw
> > addwf
> > etc...
> >
> > This should probably give you the quickest answer to find the
> > version since you really don't care too much if they swapped RAM
> > file registers (this affects the byte values in the hex code) which
> > is going to throw-off your diff command quite quickly.
>
> Some more info (some of it learned yesterday afternoon, after my
> initial question):
>
> The original source code is C for a 68HC11. They did their changes
> from a ROM. So, I have to figure out if they just patched the ROM
> image, or did they disassemble, change, and re-assemble. Then,
> maybe I can find the version they pirated... Bleh...
Bleh is correct since you are dealing with 1,2,3 or 4 byte codes.
2006\01\23@111031
by
William Couture
I though I'd follow up on this.
I've found the pirated file -- it turns out that I was searching the
wrong files, the .HEX files in the project directory, despite their
names, had nothing to do with the code (I still don't know why
they are there, the programmer is long gone).
The correct object code was stored in a Motorola S file. Once I
found that out, it was fairly easy to search on them and find the
"correct" file by a simple similarity comparison.
The pirated version had 54 bytes out of 65536 changed, all of them
constant data. I used the linkmap to figure out exactly what they
had changed and to what, and given all the pertinant information
to my boss.
Thanks for the suggestions everyone!
Bill
--
Psst... Hey, you... Buddy... Want a kitten? straycatblues.petfinder.org
2006\01\23@114755
by
Alan B. Pearce
>The pirated version had 54 bytes out of 65536 changed,
The copyright information ???
2006\01\23@115905
by
William Couture
On 1/23/06, Alan B. Pearce <A.B.Pearce
spam_OUTrl.ac.uk> wrote:
> >The pirated version had 54 bytes out of 65536 changed,
>
> The copyright information ???
Copyright, unit name, software version, a couple of error
messages, and a few numeric constants (so it did not
"act like" our controller by default).
Bill
--
Psst... Hey, you... Buddy... Want a kitten? straycatblues.petfinder.org
More... (looser matching)
- Last day of these posts
- In 2006
, 2007 only
- Today
- New search...