Megalextoria
Retro computing and gaming, sci-fi books, tv and movies and other geeky stuff.

Home » Digital Archaeology » Computer Arcana » Computer Folklore » AI and decompilation?
Show: Today's Messages :: Show Polls :: Message Navigator
E-mail to friend 
Switch to threaded view of this topic Create a new topic Submit Reply
AI and decompilation? [message #403652] Mon, 04 January 2021 06:00 Go to next message
Anonymous
Karma:
Originally posted by: gareth evans

Thinking back to my first job, nearly 50 years ago now,
when I had to dis-assemble DEC's paper tape BASIC
interpreter in order to enhance it, I guess that
dis-assemblers and decompilers must now be ten-a-penny,
especially for programs running under Windows where
the structure of Windows programs is well-known with
an assumption that C was the source language?

But I wonder if Artificial Intelligence could, after
being fed with numerous instruction sets, take a
block of binary, and analyse its source without
any prior knowledge of the instruction set?

I am particularly interested in the Binary Blob
provided for Raspberry Pi computers, with a view to
getting detailed knowledge of the video processors
employed therein.
Re: AI and decompilation? [message #403654 is a reply to message #403652] Mon, 04 January 2021 06:42 Go to previous messageGo to next message
Ahem A Rivet's Shot is currently offline  Ahem A Rivet's Shot
Messages: 4843
Registered: January 2012
Karma: 0
Senior Member
On Mon, 4 Jan 2021 11:00:29 +0000
gareth evans <headstone255@yahoo.com> wrote:

> But I wonder if Artificial Intelligence could, after
> being fed with numerous instruction sets, take a
> block of binary, and analyse its source without
> any prior knowledge of the instruction set?

Now *that* would be an interesting AI project to see the results
of. I'm pretty sure the answer to your question is "Nobody knows, please
publish when you find out" or thereabouts.

There's plenty of training material available in the form of open
source compiled for all sorts of platforms you just need to decide on an
AI architecture that's up to the job (hopefully something short of
Alpha Go Zero), build it (or rent it in "the cloud") and train it. It would
still be useful if you had to train one for each instruction set (or
family).

The biggest challenge would be comparing the source codes, but code
that compiles to an equivalent binary would be good enough as long as it
didn't cheat (create binary array and call it for example).

--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/
Re: AI and decompilation? [message #403656 is a reply to message #403652] Mon, 04 January 2021 08:08 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Pancho

On 04/01/2021 11:00, gareth evans wrote:
> Thinking back to my first job, nearly 50 years ago now,
> when I had to dis-assemble DEC's paper tape BASIC
> interpreter in order to enhance it, I guess that
> dis-assemblers and decompilers must now be ten-a-penny,
> especially for programs running under Windows where
> the structure of Windows programs is well-known with
> an assumption that C was the source language?
>
> But I wonder if Artificial Intelligence could, after
> being fed with numerous instruction sets, take a
> block of binary, and analyse its source without
> any prior knowledge of the instruction set?
>
> I am particularly interested in the Binary Blob
> provided for Raspberry Pi computers, with a view to
> getting detailed knowledge of the video processors
> employed therein.
>
I think a lot of the problem is defining the question.

What do you want it to do?
Re: AI and decompilation? [message #403661 is a reply to message #403652] Mon, 04 January 2021 11:05 Go to previous messageGo to next message
Dennis Lee Bieber is currently offline  Dennis Lee Bieber
Messages: 18
Registered: March 2012
Karma: 0
Junior Member
On Mon, 4 Jan 2021 11:00:29 +0000, gareth evans <headstone255@yahoo.com>
declaimed the following:

> Thinking back to my first job, nearly 50 years ago now,
> when I had to dis-assemble DEC's paper tape BASIC
> interpreter in order to enhance it, I guess that
> dis-assemblers and decompilers must now be ten-a-penny,
> especially for programs running under Windows where
> the structure of Windows programs is well-known with
> an assumption that C was the source language?
>
Actually, I think the use of disassemblers et al has fallen away.
Modern processors have so many peephole optimizations and out-of-order
execution streams that converting an executable back to assembly source is
almost meaningless -- and getting back to a high-level language is near
impossible. One would have to be an expert at the assembly for a processor
to have any chance of understanding the result.


--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com http://wlfraed.microdiversity.freeddns.org/
Re: AI and decompilation? [message #403662 is a reply to message #403661] Mon, 04 January 2021 12:07 Go to previous messageGo to next message
Martin Gregorie is currently offline  Martin Gregorie
Messages: 69
Registered: April 2013
Karma: 0
Member
On Mon, 04 Jan 2021 11:05:55 -0500, Dennis Lee Bieber wrote:

> On Mon, 4 Jan 2021 11:00:29 +0000, gareth evans <headstone255@yahoo.com>
> declaimed the following:
>
>> Thinking back to my first job, nearly 50 years ago now,
>> when I had to dis-assemble DEC's paper tape BASIC interpreter in order
>> to enhance it, I guess that dis-assemblers and decompilers must now be
>> ten-a-penny,
>> especially for programs running under Windows where the structure of
>> Windows programs is well-known with an assumption that C was the source
>> language?
>>
> Actually, I think the use of disassemblers et al has fallen away.
> Modern processors have so many peephole optimizations and out-of-order
> execution streams that converting an executable back to assembly source
> is almost meaningless -- and getting back to a high-level language is
> near impossible. One would have to be an expert at the assembly for a
> processor to have any chance of understanding the result.

The retro-computing guys - those who are fans of the MC6800 and MC6809
microprocessors anyway, anyway, seem to be getting a rather good semi-
interactive disassembler up and running. So far it understands
executables that run under FLEX, FLEX09 for both 6800 and 6809 and under
UniFlex and OS9/level 1 and 2 on a 6809 and can automatically detect
which OS the binary was compiled for. This is quite impressive, since all
four OSen have very different API call structures despite FLEX09,UniFlex
and OS/9 all running on the same chip.


--
--
Martin | martin at
Gregorie | gregorie dot org
Re: AI and decompilation? [message #403666 is a reply to message #403661] Mon, 04 January 2021 12:47 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: gareth evans

On 04/01/2021 16:05, Dennis Lee Bieber wrote:
> Actually, I think the use of disassemblers et al has fallen away.
> Modern processors have so many peephole optimizations and out-of-order
> execution streams that converting an executable back to assembly source is
> almost meaningless -- and getting back to a high-level language is near
> impossible. One would have to be an expert at the assembly for a processor
> to have any chance of understanding the result.
>
>

AF6VN DE G4SDW

But we Radio Hams thrive on such low level technicalities! :-)

73.
Re: AI and decompilation? [message #403667 is a reply to message #403656] Mon, 04 January 2021 12:51 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: gareth evans

On 04/01/2021 13:08, Pancho wrote:
> On 04/01/2021 11:00, gareth evans wrote:
>> Thinking back to my first job, nearly 50 years ago now,
>> when I had to dis-assemble DEC's paper tape BASIC
>> interpreter in order to enhance it, I guess that
>> dis-assemblers and decompilers must now be ten-a-penny,
>> especially for programs running under Windows where
>> the structure of Windows programs is well-known with
>> an assumption that C was the source language?
>>
>> But I wonder if Artificial Intelligence could, after
>> being fed with numerous instruction sets, take a
>> block of binary, and analyse its source without
>> any prior knowledge of the instruction set?
>>
>> I am particularly interested in the Binary Blob
>> provided for Raspberry Pi computers, with a view to
>> getting detailed knowledge of the video processors
>> employed therein.
>>
> I think a lot of the problem is defining the question.
>
> What do you want it to do?
>

I don't want it to do anything. I want to play at a low level
with the thing ... large oaks from little acorns grow.
Re: AI and decompilation? [message #403668 is a reply to message #403662] Mon, 04 January 2021 12:52 Go to previous messageGo to next message
scott is currently offline  scott
Messages: 4237
Registered: February 2012
Karma: 0
Senior Member
Martin Gregorie <martin@mydomain.invalid> writes:
> On Mon, 04 Jan 2021 11:05:55 -0500, Dennis Lee Bieber wrote:
>
>> On Mon, 4 Jan 2021 11:00:29 +0000, gareth evans <headstone255@yahoo.com>
>> declaimed the following:
>>
>>> Thinking back to my first job, nearly 50 years ago now,
>>> when I had to dis-assemble DEC's paper tape BASIC interpreter in order
>>> to enhance it, I guess that dis-assemblers and decompilers must now be
>>> ten-a-penny,
>>> especially for programs running under Windows where the structure of
>>> Windows programs is well-known with an assumption that C was the source
>>> language?
>>>
>> Actually, I think the use of disassemblers et al has fallen away.
>> Modern processors have so many peephole optimizations and out-of-order
>> execution streams that converting an executable back to assembly source
>> is almost meaningless -- and getting back to a high-level language is
>> near impossible. One would have to be an expert at the assembly for a
>> processor to have any chance of understanding the result.
>
> The retro-computing guys - those who are fans of the MC6800 and MC6809
> microprocessors anyway, anyway, seem to be getting a rather good semi-
> interactive disassembler up and running.

Security experts have several very powerful disassemblers and decompilers
they use for Intel/AMD/ARM processors.

https://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_ and_Decompilers
Re: AI and decompilation? [message #403680 is a reply to message #403661] Mon, 04 January 2021 14:18 Go to previous messageGo to next message
Dan Espen is currently offline  Dan Espen
Messages: 3867
Registered: January 2012
Karma: 0
Senior Member
Dennis Lee Bieber <wlfraed@ix.netcom.com> writes:

> On Mon, 4 Jan 2021 11:00:29 +0000, gareth evans <headstone255@yahoo.com>
> declaimed the following:
>
>> Thinking back to my first job, nearly 50 years ago now,
>> when I had to dis-assemble DEC's paper tape BASIC
>> interpreter in order to enhance it, I guess that
>> dis-assemblers and decompilers must now be ten-a-penny,
>> especially for programs running under Windows where
>> the structure of Windows programs is well-known with
>> an assumption that C was the source language?
>>
> Actually, I think the use of disassemblers et al has fallen away.
> Modern processors have so many peephole optimizations and out-of-order
> execution streams that converting an executable back to assembly source is
> almost meaningless -- and getting back to a high-level language is near
> impossible. One would have to be an expert at the assembly for a processor
> to have any chance of understanding the result.

Well, in my last job I often used disassemblers.
IBM z/OS.
Very useful for understanding IBM code.

I can't see what out of order execution has to do with a disassembler.
You disassemble executables.

Since I understand Assembler, I certainly got meaning out of it
even if the original was an optimized HLL. You can see what services
are being called.

--
Dan Espen
Re: AI and decompilation? [message #403690 is a reply to message #403652] Mon, 04 January 2021 15:11 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Eli the Bearded

In comp.sys.raspberry-pi, gareth evans <headstone255@yahoo.com> wrote:
> But I wonder if Artificial Intelligence could, after
> being fed with numerous instruction sets, take a
> block of binary, and analyse its source without
> any prior knowledge of the instruction set?

I suspect AI could be trained to do that, perhaps better than being
trained to read English. Not sure if anyone has ever tried.

> I am particularly interested in the Binary Blob
> provided for Raspberry Pi computers, with a view to
> getting detailed knowledge of the video processors
> employed therein.

The info-sec people use disassemblers all the time, and don't limit
themselves to compiled from C and intended for Windows binaries. They
try to extract passwords and locate flaws in firmware for all sorts
of internet-connected things. I recall Cybergibbons creating some
tutorials in November or December. It was linked from his twitter
account, but I didn't pay that close attention to where it was. A
quick look at his blog and youtube didn't find them, but he's got a
robust web presence.

Elijah
------
have you searched if anyone else has reversed engineered it already?
Re: AI and decompilation? [message #403697 is a reply to message #403667] Mon, 04 January 2021 16:57 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Pancho

On 04/01/2021 17:51, gareth evans wrote:
> On 04/01/2021 13:08, Pancho wrote:
>> On 04/01/2021 11:00, gareth evans wrote:
>>> Thinking back to my first job, nearly 50 years ago now,
>>> when I had to dis-assemble DEC's paper tape BASIC
>>> interpreter in order to enhance it, I guess that
>>> dis-assemblers and decompilers must now be ten-a-penny,
>>> especially for programs running under Windows where
>>> the structure of Windows programs is well-known with
>>> an assumption that C was the source language?
>>>
>>> But I wonder if Artificial Intelligence could, after
>>> being fed with numerous instruction sets, take a
>>> block of binary, and analyse its source without
>>> any prior knowledge of the instruction set?
>>>
>>> I am particularly interested in the Binary Blob
>>> provided for Raspberry Pi computers, with a view to
>>> getting detailed knowledge of the video processors
>>> employed therein.
>>>
>> I think a lot of the problem is defining the question.
>>
>> What do you want it to do?
>>
>
> I don't want it to do anything. I want to play at a low level
> with the thing ... large oaks from little acorns grow.
>

Play with what thing? What is an instruction set, what is the Binary
Blob? Why do you need an AI?

Most compilers leave fingerprints on executables you don't need an AI to
detect them. I remember decompiling in the early 80's but complex modern
code can often be a challenge to naively reverse engineer a high level
understanding from even if you do have source code. Take away sensible
variable and function names and you are stuffed.
Re: AI and decompilation? [message #403700 is a reply to message #403697] Mon, 04 January 2021 17:23 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: gareth evans

On 04/01/2021 21:57, Pancho wrote:
> On 04/01/2021 17:51, gareth evans wrote:
>> On 04/01/2021 13:08, Pancho wrote:
>>> On 04/01/2021 11:00, gareth evans wrote:
>>>> Thinking back to my first job, nearly 50 years ago now,
>>>> when I had to dis-assemble DEC's paper tape BASIC
>>>> interpreter in order to enhance it, I guess that
>>>> dis-assemblers and decompilers must now be ten-a-penny,
>>>> especially for programs running under Windows where
>>>> the structure of Windows programs is well-known with
>>>> an assumption that C was the source language?
>>>>
>>>> But I wonder if Artificial Intelligence could, after
>>>> being fed with numerous instruction sets, take a
>>>> block of binary, and analyse its source without
>>>> any prior knowledge of the instruction set?
>>>>
>>>> I am particularly interested in the Binary Blob
>>>> provided for Raspberry Pi computers, with a view to
>>>> getting detailed knowledge of the video processors
>>>> employed therein.
>>>>
>>> I think a lot of the problem is defining the question.
>>>
>>> What do you want it to do?
>>>
>>
>> I don't want it to do anything. I want to play at a low level
>> with the thing ... large oaks from little acorns grow.
>>
>
> Play with what thing? What is an instruction set, what is the Binary
> Blob? Why do you need an AI?
>
> Most compilers leave fingerprints on executables you don't need an AI to
> detect them. I remember decompiling in the early 80's but complex modern
> code can often be a challenge to naively reverse engineer a high level
> understanding from even if you do have source code. Take away sensible
> variable and function names and you are stuffed.

Somehow I think that we're not singing from the same hymn sheet.

Sorry.
Re: AI and decompilation? [message #403703 is a reply to message #403697] Mon, 04 January 2021 17:50 Go to previous messageGo to next message
Dan Espen is currently offline  Dan Espen
Messages: 3867
Registered: January 2012
Karma: 0
Senior Member
Pancho <Pancho.Dontmaileme@outlook.com> writes:

> On 04/01/2021 17:51, gareth evans wrote:
>> On 04/01/2021 13:08, Pancho wrote:
>>> On 04/01/2021 11:00, gareth evans wrote:
>>>> Thinking back to my first job, nearly 50 years ago now,
>>>> when I had to dis-assemble DEC's paper tape BASIC
>>>> interpreter in order to enhance it, I guess that
>>>> dis-assemblers and decompilers must now be ten-a-penny,
>>>> especially for programs running under Windows where
>>>> the structure of Windows programs is well-known with
>>>> an assumption that C was the source language?
>>>>
>>>> But I wonder if Artificial Intelligence could, after
>>>> being fed with numerous instruction sets, take a
>>>> block of binary, and analyse its source without
>>>> any prior knowledge of the instruction set?
>>>>
>>>> I am particularly interested in the Binary Blob
>>>> provided for Raspberry Pi computers, with a view to
>>>> getting detailed knowledge of the video processors
>>>> employed therein.
>>>>
>>> I think a lot of the problem is defining the question.
>>>
>>> What do you want it to do?
>>>
>> I don't want it to do anything. I want to play at a low level
>> with the thing ... large oaks from little acorns grow.
>>
>
> Play with what thing? What is an instruction set, what is the Binary
> Blob? Why do you need an AI?
>
> Most compilers leave fingerprints on executables you don't need an AI
> to detect them. I remember decompiling in the early 80's but complex
> modern code can often be a challenge to naively reverse engineer a
> high level understanding from even if you do have source code. Take
> away sensible variable and function names and you are stuffed.

I've had more than one experience in putting those meaningful variable
names right back. It's actually pretty easy, a somewhat rote process.
Find the read input instruction. Since you know the layout of the input
record, you now have labels to many of the references to that input
area.

I think you can work out how to proceed.


--
Dan Espen
Re: AI and decompilation? [message #403704 is a reply to message #403703] Mon, 04 January 2021 18:00 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Pancho

On 04/01/2021 22:50, Dan Espen wrote:
> Pancho <Pancho.Dontmaileme@outlook.com> writes:
>
>> On 04/01/2021 17:51, gareth evans wrote:
>>> On 04/01/2021 13:08, Pancho wrote:
>>>> On 04/01/2021 11:00, gareth evans wrote:
>>>> > Thinking back to my first job, nearly 50 years ago now,
>>>> > when I had to dis-assemble DEC's paper tape BASIC
>>>> > interpreter in order to enhance it, I guess that
>>>> > dis-assemblers and decompilers must now be ten-a-penny,
>>>> > especially for programs running under Windows where
>>>> > the structure of Windows programs is well-known with
>>>> > an assumption that C was the source language?
>>>> >
>>>> > But I wonder if Artificial Intelligence could, after
>>>> > being fed with numerous instruction sets, take a
>>>> > block of binary, and analyse its source without
>>>> > any prior knowledge of the instruction set?
>>>> >
>>>> > I am particularly interested in the Binary Blob
>>>> > provided for Raspberry Pi computers, with a view to
>>>> > getting detailed knowledge of the video processors
>>>> > employed therein.
>>>> >
>>>> I think a lot of the problem is defining the question.
>>>>
>>>> What do you want it to do?
>>>>
>>> I don't want it to do anything. I want to play at a low level
>>> with the thing ... large oaks from little acorns grow.
>>>
>>
>> Play with what thing? What is an instruction set, what is the Binary
>> Blob? Why do you need an AI?
>>
>> Most compilers leave fingerprints on executables you don't need an AI
>> to detect them. I remember decompiling in the early 80's but complex
>> modern code can often be a challenge to naively reverse engineer a
>> high level understanding from even if you do have source code. Take
>> away sensible variable and function names and you are stuffed.
>
> I've had more than one experience in putting those meaningful variable
> names right back. It's actually pretty easy, a somewhat rote process.
> Find the read input instruction. Since you know the layout of the input
> record, you now have labels to many of the references to that input
> area.
>
> I think you can work out how to proceed.
>
>
Without the source how do you know any meaningful variable names in the
first place?
Re: AI and decompilation? [message #403705 is a reply to message #403661] Mon, 04 January 2021 18:01 Go to previous messageGo to next message
Theo Markettos is currently offline  Theo Markettos
Messages: 17
Registered: April 2013
Karma: 0
Junior Member
Dennis Lee Bieber <wlfraed@ix.netcom.com> wrote:
> Actually, I think the use of disassemblers et al has fallen away.
> Modern processors have so many peephole optimizations and out-of-order
> execution streams that converting an executable back to assembly source is
> almost meaningless -- and getting back to a high-level language is near
> impossible. One would have to be an expert at the assembly for a processor
> to have any chance of understanding the result.

Apple essentially do this for their Rosetta 2 x86-to-ARM converter. They
take existing x86 executables, which are likely generated by their Xcode
LLVM compiler. They convert the assembly back into LLVM's intermediate
representation, which is the idealised-assembly representation most of the
compiler stages work on. Then they push that IR through the regular ARM
LLVM backend, including optimiser stages, to produce 64-bit ARM executables.

It's not a language intended for humans to read, but it's high enough for
the compiler stages to work on. Doing it this way avoids having to emulate
any ARM instructions.

Theo
Re: AI and decompilation? [message #403706 is a reply to message #403700] Mon, 04 January 2021 18:09 Go to previous messageGo to next message
Martin Gregorie is currently offline  Martin Gregorie
Messages: 69
Registered: April 2013
Karma: 0
Member
On Mon, 04 Jan 2021 22:23:14 +0000, gareth evans wrote:

>
> Somehow I think that we're not singing from the same hymn sheet.
>
There is an intermediate disassembler style that sits between a
traditional disassembler and the mythical AI disassembler: that is the
'semi-interactive' type I mentioned. Since I know of at least one of
these that is currently up and running I probably should have explained
it better, so here goes:

What I meant by this is a disassembler that initially generates an
assembly source file but doesn't just save it. Instead it shows that to
the user in an interactive, scrolling display which allows the user to
assign names to branch destinations, call targets and addresses of
variables, while simultaneously storing these in a symbol table, which is
also viewable, editable on screen and can be saved and later reloaded at
the start of a future session.

Most importantly, at any point you can rerun the disassembly, but this
time the disassembler will use the symbol table to include names in the
symbol table in its output. IOW, after you've added one or more
name/address pairs to the symbol table, rerunning the disassembler will
incorporate these into the new version of the disassembled source.
Working this way is obviously faster and less error-prone than saving the
first pass disassembler output and manually editing it.

For extra points the disassembler should be able to:

- start by reading a predefined symbol set that contains the OS API names
and names of OS public variables.

- be configurable to search for and read in more than one symbol set.

- use a modified version of the symbol table editor to add comments that
will appear as comment blocks in front of a nominated address or after
the address content as a trailing content.

- generate a disassembled source file that can be assembled without
needing further changes.


--
--
Martin | martin at
Gregorie | gregorie dot org
Re: AI and decompilation? [message #403710 is a reply to message #403680] Mon, 04 January 2021 18:54 Go to previous messageGo to next message
Peter Flass is currently offline  Peter Flass
Messages: 8375
Registered: December 2011
Karma: 0
Senior Member
Dan Espen <dan1espen@gmail.com> wrote:
> Dennis Lee Bieber <wlfraed@ix.netcom.com> writes:
>
>> On Mon, 4 Jan 2021 11:00:29 +0000, gareth evans <headstone255@yahoo.com>
>> declaimed the following:
>>
>>> Thinking back to my first job, nearly 50 years ago now,
>>> when I had to dis-assemble DEC's paper tape BASIC
>>> interpreter in order to enhance it, I guess that
>>> dis-assemblers and decompilers must now be ten-a-penny,
>>> especially for programs running under Windows where
>>> the structure of Windows programs is well-known with
>>> an assumption that C was the source language?
>>>
>> Actually, I think the use of disassemblers et al has fallen away.
>> Modern processors have so many peephole optimizations and out-of-order
>> execution streams that converting an executable back to assembly source is
>> almost meaningless -- and getting back to a high-level language is near
>> impossible. One would have to be an expert at the assembly for a processor
>> to have any chance of understanding the result.
>
> Well, in my last job I often used disassemblers.
> IBM z/OS.
> Very useful for understanding IBM code.

I was going to say that disassemblers for IBM seem to work fairly well.
I’ve used them a few times.

>
> I can't see what out of order execution has to do with a disassembler.
> You disassemble executables.
>
> Since I understand Assembler, I certainly got meaning out of it
> even if the original was an optimized HLL. You can see what services
> are being called.
>

I think, for example, that one disassembler might recognize the SVC
number.i think it put the macro name in as a comment (LINK, GETMAIN, etc.)


--
Pete
Re: AI and decompilation? [message #403712 is a reply to message #403704] Mon, 04 January 2021 18:59 Go to previous messageGo to next message
Peter Flass is currently offline  Peter Flass
Messages: 8375
Registered: December 2011
Karma: 0
Senior Member
Pancho <Pancho.Dontmaileme@outlook.com> wrote:
> On 04/01/2021 22:50, Dan Espen wrote:
>> Pancho <Pancho.Dontmaileme@outlook.com> writes:
>>
>>> On 04/01/2021 17:51, gareth evans wrote:
>>>> On 04/01/2021 13:08, Pancho wrote:
>>>> > On 04/01/2021 11:00, gareth evans wrote:
>>>> >> Thinking back to my first job, nearly 50 years ago now,
>>>> >> when I had to dis-assemble DEC's paper tape BASIC
>>>> >> interpreter in order to enhance it, I guess that
>>>> >> dis-assemblers and decompilers must now be ten-a-penny,
>>>> >> especially for programs running under Windows where
>>>> >> the structure of Windows programs is well-known with
>>>> >> an assumption that C was the source language?
>>>> >>
>>>> >> But I wonder if Artificial Intelligence could, after
>>>> >> being fed with numerous instruction sets, take a
>>>> >> block of binary, and analyse its source without
>>>> >> any prior knowledge of the instruction set?
>>>> >>
>>>> >> I am particularly interested in the Binary Blob
>>>> >> provided for Raspberry Pi computers, with a view to
>>>> >> getting detailed knowledge of the video processors
>>>> >> employed therein.
>>>> >>
>>>> > I think a lot of the problem is defining the question.
>>>> >
>>>> > What do you want it to do?
>>>> >
>>>> I don't want it to do anything. I want to play at a low level
>>>> with the thing ... large oaks from little acorns grow.
>>>>
>>>
>>> Play with what thing? What is an instruction set, what is the Binary
>>> Blob? Why do you need an AI?
>>>
>>> Most compilers leave fingerprints on executables you don't need an AI
>>> to detect them. I remember decompiling in the early 80's but complex
>>> modern code can often be a challenge to naively reverse engineer a
>>> high level understanding from even if you do have source code. Take
>>> away sensible variable and function names and you are stuffed.
>>
>> I've had more than one experience in putting those meaningful variable
>> names right back. It's actually pretty easy, a somewhat rote process.
>> Find the read input instruction. Since you know the layout of the input
>> record, you now have labels to many of the references to that input
>> area.
>>
>> I think you can work out how to proceed.
>>
>>
> Without the source how do you know any meaningful variable names in the
> first place?
>

I dis a fun side project a few years back. The source for one module of
PL/I(F) was chooched on the distribution tape, about the last third was
missing. I disassembled the object module, and was able to recognize
variable names and standard compiler macros. I got my restored version back
to identical to the original, and also a fairly readable source.

--
Pete
Re: AI and decompilation? [message #403715 is a reply to message #403697] Mon, 04 January 2021 20:38 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: J. Clarke

On Mon, 4 Jan 2021 21:57:32 +0000, Pancho
<Pancho.Dontmaileme@outlook.com> wrote:

> On 04/01/2021 17:51, gareth evans wrote:
>> On 04/01/2021 13:08, Pancho wrote:
>>> On 04/01/2021 11:00, gareth evans wrote:
>>>> Thinking back to my first job, nearly 50 years ago now,
>>>> when I had to dis-assemble DEC's paper tape BASIC
>>>> interpreter in order to enhance it, I guess that
>>>> dis-assemblers and decompilers must now be ten-a-penny,
>>>> especially for programs running under Windows where
>>>> the structure of Windows programs is well-known with
>>>> an assumption that C was the source language?
>>>>
>>>> But I wonder if Artificial Intelligence could, after
>>>> being fed with numerous instruction sets, take a
>>>> block of binary, and analyse its source without
>>>> any prior knowledge of the instruction set?
>>>>
>>>> I am particularly interested in the Binary Blob
>>>> provided for Raspberry Pi computers, with a view to
>>>> getting detailed knowledge of the video processors
>>>> employed therein.
>>>>
>>> I think a lot of the problem is defining the question.
>>>
>>> What do you want it to do?
>>>
>>
>> I don't want it to do anything. I want to play at a low level
>> with the thing ... large oaks from little acorns grow.
>>
>
> Play with what thing?

The pieces of the hardware supported by the Blob.

> What is an instruction set,

The list of binary codes that tell the procesor what to do.

> what is the Binary Blob?

On the Raspberry Pi it is the non-Open-Source proprietary code that is
provided by the chip manufacturer, including parts of the boot loader
and the 3D drivers among other things.

> Why do you need an AI?

Why not?

> Most compilers leave fingerprints on executables you don't need an AI to
> detect them. I remember decompiling in the early 80's but complex modern
> code can often be a challenge to naively reverse engineer a high level
> understanding from even if you do have source code. Take away sensible
> variable and function names and you are stuffed.

He's talking about something that you can give a pile of object code
from an unknown source (I mean _really_ unknown--it could be for Z/OS
or a VAX or Intel or Alpha or any other architecture, compiled from C
or PL/I or Fortran or pick a language at random, with it figuring from
there what the code does.
Re: AI and decompilation? [message #403716 is a reply to message #403704] Mon, 04 January 2021 20:42 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: J. Clarke

On Mon, 4 Jan 2021 23:00:54 +0000, Pancho
<Pancho.Dontmaileme@outlook.com> wrote:

> On 04/01/2021 22:50, Dan Espen wrote:
>> Pancho <Pancho.Dontmaileme@outlook.com> writes:
>>
>>> On 04/01/2021 17:51, gareth evans wrote:
>>>> On 04/01/2021 13:08, Pancho wrote:
>>>> > On 04/01/2021 11:00, gareth evans wrote:
>>>> >> Thinking back to my first job, nearly 50 years ago now,
>>>> >> when I had to dis-assemble DEC's paper tape BASIC
>>>> >> interpreter in order to enhance it, I guess that
>>>> >> dis-assemblers and decompilers must now be ten-a-penny,
>>>> >> especially for programs running under Windows where
>>>> >> the structure of Windows programs is well-known with
>>>> >> an assumption that C was the source language?
>>>> >>
>>>> >> But I wonder if Artificial Intelligence could, after
>>>> >> being fed with numerous instruction sets, take a
>>>> >> block of binary, and analyse its source without
>>>> >> any prior knowledge of the instruction set?
>>>> >>
>>>> >> I am particularly interested in the Binary Blob
>>>> >> provided for Raspberry Pi computers, with a view to
>>>> >> getting detailed knowledge of the video processors
>>>> >> employed therein.
>>>> >>
>>>> > I think a lot of the problem is defining the question.
>>>> >
>>>> > What do you want it to do?
>>>> >
>>>> I don't want it to do anything. I want to play at a low level
>>>> with the thing ... large oaks from little acorns grow.
>>>>
>>>
>>> Play with what thing? What is an instruction set, what is the Binary
>>> Blob? Why do you need an AI?
>>>
>>> Most compilers leave fingerprints on executables you don't need an AI
>>> to detect them. I remember decompiling in the early 80's but complex
>>> modern code can often be a challenge to naively reverse engineer a
>>> high level understanding from even if you do have source code. Take
>>> away sensible variable and function names and you are stuffed.
>>
>> I've had more than one experience in putting those meaningful variable
>> names right back. It's actually pretty easy, a somewhat rote process.
>> Find the read input instruction. Since you know the layout of the input
>> record, you now have labels to many of the references to that input
>> area.
>>
>> I think you can work out how to proceed.
>>
>>
> Without the source how do you know any meaningful variable names in the
> first place?

You start with the inputs and outputs and work into the algorithms and
eventually maybe you can make sense of it.
Re: AI and decompilation? [message #403717 is a reply to message #403704] Mon, 04 January 2021 20:55 Go to previous messageGo to next message
Dan Espen is currently offline  Dan Espen
Messages: 3867
Registered: January 2012
Karma: 0
Senior Member
Pancho <Pancho.Dontmaileme@outlook.com> writes:

> On 04/01/2021 22:50, Dan Espen wrote:
>> Pancho <Pancho.Dontmaileme@outlook.com> writes:
>>
>>> On 04/01/2021 17:51, gareth evans wrote:
>>>> On 04/01/2021 13:08, Pancho wrote:
>>>> > On 04/01/2021 11:00, gareth evans wrote:
>>>> >> Thinking back to my first job, nearly 50 years ago now,
>>>> >> when I had to dis-assemble DEC's paper tape BASIC
>>>> >> interpreter in order to enhance it, I guess that
>>>> >> dis-assemblers and decompilers must now be ten-a-penny,
>>>> >> especially for programs running under Windows where
>>>> >> the structure of Windows programs is well-known with
>>>> >> an assumption that C was the source language?
>>>> >>
>>>> >> But I wonder if Artificial Intelligence could, after
>>>> >> being fed with numerous instruction sets, take a
>>>> >> block of binary, and analyse its source without
>>>> >> any prior knowledge of the instruction set?
>>>> >>
>>>> >> I am particularly interested in the Binary Blob
>>>> >> provided for Raspberry Pi computers, with a view to
>>>> >> getting detailed knowledge of the video processors
>>>> >> employed therein.
>>>> >>
>>>> > I think a lot of the problem is defining the question.
>>>> >
>>>> > What do you want it to do?
>>>> >
>>>> I don't want it to do anything. I want to play at a low level
>>>> with the thing ... large oaks from little acorns grow.
>>>>
>>>
>>> Play with what thing? What is an instruction set, what is the Binary
>>> Blob? Why do you need an AI?
>>>
>>> Most compilers leave fingerprints on executables you don't need an AI
>>> to detect them. I remember decompiling in the early 80's but complex
>>> modern code can often be a challenge to naively reverse engineer a
>>> high level understanding from even if you do have source code. Take
>>> away sensible variable and function names and you are stuffed.
>> I've had more than one experience in putting those meaningful
>> variable
>> names right back. It's actually pretty easy, a somewhat rote process.
>> Find the read input instruction. Since you know the layout of the input
>> record, you now have labels to many of the references to that input
>> area.
>> I think you can work out how to proceed.
>>
> Without the source how do you know any meaningful variable names in
> the first place?

The programs were reading our files.
We already had record layouts for those files.

--
Dan Espen
Re: AI and decompilation? [message #403720 is a reply to message #403716] Mon, 04 January 2021 20:59 Go to previous messageGo to next message
Dan Espen is currently offline  Dan Espen
Messages: 3867
Registered: January 2012
Karma: 0
Senior Member
J. Clarke <jclarke.873638@gmail.com> writes:

> On Mon, 4 Jan 2021 23:00:54 +0000, Pancho
> <Pancho.Dontmaileme@outlook.com> wrote:
>
>> On 04/01/2021 22:50, Dan Espen wrote:
>>> Pancho <Pancho.Dontmaileme@outlook.com> writes:
>>>
>>>> On 04/01/2021 17:51, gareth evans wrote:
>>>> > On 04/01/2021 13:08, Pancho wrote:
>>>> >> On 04/01/2021 11:00, gareth evans wrote:
>>>> >>> Thinking back to my first job, nearly 50 years ago now,
>>>> >>> when I had to dis-assemble DEC's paper tape BASIC
>>>> >>> interpreter in order to enhance it, I guess that
>>>> >>> dis-assemblers and decompilers must now be ten-a-penny,
>>>> >>> especially for programs running under Windows where
>>>> >>> the structure of Windows programs is well-known with
>>>> >>> an assumption that C was the source language?
>>>> >>>
>>>> >>> But I wonder if Artificial Intelligence could, after
>>>> >>> being fed with numerous instruction sets, take a
>>>> >>> block of binary, and analyse its source without
>>>> >>> any prior knowledge of the instruction set?
>>>> >>>
>>>> >>> I am particularly interested in the Binary Blob
>>>> >>> provided for Raspberry Pi computers, with a view to
>>>> >>> getting detailed knowledge of the video processors
>>>> >>> employed therein.
>>>> >>>
>>>> >> I think a lot of the problem is defining the question.
>>>> >>
>>>> >> What do you want it to do?
>>>> >>
>>>> > I don't want it to do anything. I want to play at a low level
>>>> > with the thing ... large oaks from little acorns grow.
>>>> >
>>>>
>>>> Play with what thing? What is an instruction set, what is the Binary
>>>> Blob? Why do you need an AI?
>>>>
>>>> Most compilers leave fingerprints on executables you don't need an AI
>>>> to detect them. I remember decompiling in the early 80's but complex
>>>> modern code can often be a challenge to naively reverse engineer a
>>>> high level understanding from even if you do have source code. Take
>>>> away sensible variable and function names and you are stuffed.
>>>
>>> I've had more than one experience in putting those meaningful variable
>>> names right back. It's actually pretty easy, a somewhat rote process.
>>> Find the read input instruction. Since you know the layout of the input
>>> record, you now have labels to many of the references to that input
>>> area.
>>>
>>> I think you can work out how to proceed.
>>>
>>>
>> Without the source how do you know any meaningful variable names in the
>> first place?
>
> You start with the inputs and outputs and work into the algorithms and
> eventually maybe you can make sense of it.

Yep.

One place I was working they had a lost source code program
reconstructed from object code and they were complaining no one
could work on it because of the variable and routine names.

Seemed easy enough to me and I fixed it up in a day or 2.

--
Dan Espen
Re: AI and decompilation? [message #403726 is a reply to message #403652] Tue, 05 January 2021 04:07 Go to previous messageGo to next message
Arne Luft is currently offline  Arne Luft
Messages: 321
Registered: March 2012
Karma: 0
Senior Member
gareth evans <headstone255@yahoo.com> writes:
> Thinking back to my first job, nearly 50 years ago now,
> when I had to dis-assemble DEC's paper tape BASIC
> interpreter in order to enhance it, I guess that
> dis-assemblers and decompilers must now be ten-a-penny,
> especially for programs running under Windows where
> the structure of Windows programs is well-known with
> an assumption that C was the source language?
>
> But I wonder if Artificial Intelligence could, after
> being fed with numerous instruction sets, take a
> block of binary, and analyse its source without
> any prior knowledge of the instruction set?
>
> I am particularly interested in the Binary Blob
> provided for Raspberry Pi computers, with a view to
> getting detailed knowledge of the video processors
> employed therein.

Why would you do that instead of reading a reference manual for the
target architecture?

--
https://www.greenend.org.uk/rjk/
Re: AI and decompilation? [message #403728 is a reply to message #403726] Tue, 05 January 2021 04:47 Go to previous messageGo to next message
Ahem A Rivet's Shot is currently offline  Ahem A Rivet's Shot
Messages: 4843
Registered: January 2012
Karma: 0
Senior Member
On Tue, 05 Jan 2021 09:07:21 +0000
Richard Kettlewell <invalid@invalid.invalid> wrote:

> gareth evans <headstone255@yahoo.com> writes:

>> I am particularly interested in the Binary Blob
>> provided for Raspberry Pi computers, with a view to
>> getting detailed knowledge of the video processors
>> employed therein.
>
> Why would you do that instead of reading a reference manual for the
> target architecture?

The documentation for the GPU on the RPi has not been published, he
seeks to reverse engineer it from the binary code that implements a
published API on it.

--
Steve O'Hara-Smith | Directable Mirror Arrays
C:\>WIN | A better way to focus the sun
The computer obeys and wins. | licences available see
You lose and Bill collects. | http://www.sohara.org/
Re: AI and decompilation? [message #403729 is a reply to message #403668] Tue, 05 January 2021 05:28 Go to previous messageGo to next message
The Natural Philosoph is currently offline  The Natural Philosoph
Messages: 238
Registered: January 2012
Karma: 0
Senior Member
On 04/01/2021 17:52, Scott Lurndal wrote:
> Martin Gregorie <martin@mydomain.invalid> writes:
>> On Mon, 04 Jan 2021 11:05:55 -0500, Dennis Lee Bieber wrote:
>>
>>> On Mon, 4 Jan 2021 11:00:29 +0000, gareth evans <headstone255@yahoo.com>
>>> declaimed the following:
>>>
>>>> Thinking back to my first job, nearly 50 years ago now,
>>>> when I had to dis-assemble DEC's paper tape BASIC interpreter in order
>>>> to enhance it, I guess that dis-assemblers and decompilers must now be
>>>> ten-a-penny,
>>>> especially for programs running under Windows where the structure of
>>>> Windows programs is well-known with an assumption that C was the source
>>>> language?
>>>>
>>> Actually, I think the use of disassemblers et al has fallen away.
>>> Modern processors have so many peephole optimizations and out-of-order
>>> execution streams that converting an executable back to assembly source
>>> is almost meaningless -- and getting back to a high-level language is
>>> near impossible. One would have to be an expert at the assembly for a
>>> processor to have any chance of understanding the result.
>>
>> The retro-computing guys - those who are fans of the MC6800 and MC6809
>> microprocessors anyway, anyway, seem to be getting a rather good semi-
>> interactive disassembler up and running.
>
> Security experts have several very powerful disassemblers and decompilers
> they use for Intel/AMD/ARM processors.
>
> https://en.wikibooks.org/wiki/X86_Disassembly/Disassemblers_ and_Decompilers
>
Yes. I am certain that certain compilers and certain languages leave a
fingerprint, Always THAT resister, used to do THAT job, always that
particular sequence of assembly to mimic that high level construct.
I cut my teeth on microprocessor assembly. The C. Some things that are
neat in assembler are ugly as sin in C. Take a call table. In assembler,
you set up a range of memory whose contents contain the addresses of
subroutines. You load the accumulator with a number, left shift it once,
add it to the content of a register set to point to the base of that
memory block, and use that register as pointing to an address whose
contents are the address you want to 'call' Simple, efficient and
provided you ensure nothing out of bounds is in the accumulator, bomb proof.

Now try that in C, you need an array of pointers to functions, and a
simple check on the index you engage, followed by a declaration to call
the function whose address is in the array of pointers to functions. I
never ever managed to get an 8 bit compiler to actually do that. People
just don't call the contents of an array of pointers to functions.

Its easier by far to set up a switch statement, which takes care of out
of bounds defaults, and ends up producing a chain of if..else if.. else
conditional calls to hardwired functions.

That's how you write it, because its pretty much as fast on a pipelined
processor, RAM is cheap and comprehensibility beats programming elegance
hands down in the real world.

I've examined a lot of compiled machine code and its pretty easy to tell
what language it is, and what roughly it was written as. Stack based
variables is a bit of a give away pointing to C or a similar langauge.
highly optimised compilers of course automatically obfuscate things, but
that's the fun isn't it?

I gave up writing assembler for *86 CPUs when the Gnu compiler was
patently doing a better job than I would in assembler, and the ability
to write something long winded and easy to understand and have the
compiler completely rearrange it and turn it into three lines of
incomprehensible assembler, was to be respected.

I think it is up to a limited point entirely possible to make an AI that
could replace machine code with editable and compilable source code.
But there will always be the Problem Of Induction. Many many possible
constructs in source using an infinite number of random variable and
function names, could compile to the same object code. And there is no
way to reinstate the comments either, so it becomes an exercise
ultimately in hand editing and reinstating the comments manually -
almost as big a job as writing from scratch.

I suspect this is how Linux writers write freeware drivers for
proprietary hardware. Disassemble the manufacturers drivers, and at
least mimic the program flow, if not the actual source code.


--
“I know that most men, including those at ease with problems of the
greatest complexity, can seldom accept even the simplest and most
obvious truth if it be such as would oblige them to admit the falsity of
conclusions which they have delighted in explaining to colleagues, which
they have proudly taught to others, and which they have woven, thread by
thread, into the fabric of their lives.”

― Leo Tolstoy
Re: AI and decompilation? [message #403730 is a reply to message #403697] Tue, 05 January 2021 05:29 Go to previous messageGo to next message
The Natural Philosoph is currently offline  The Natural Philosoph
Messages: 238
Registered: January 2012
Karma: 0
Senior Member
On 04/01/2021 21:57, Pancho wrote:
> Most compilers leave fingerprints on executables you don't need an AI to
> detect them. I remember decompiling in the early 80's but complex modern
> code can often be a challenge to naively reverse engineer a high level
> understanding from even if you do have source code. Take away sensible
> variable and function names and you are stuffed.
+1001


--
"First, find out who are the people you can not criticise. They are your
oppressors."
- George Orwell
Re: AI and decompilation? [message #403731 is a reply to message #403717] Tue, 05 January 2021 05:38 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Pancho

On 05/01/2021 01:55, Dan Espen wrote:
> Pancho <Pancho.Dontmaileme@outlook.com> writes:
>
>> On 04/01/2021 22:50, Dan Espen wrote:
>>> Pancho <Pancho.Dontmaileme@outlook.com> writes:
>>>
>>>> On 04/01/2021 17:51, gareth evans wrote:
>>>> > On 04/01/2021 13:08, Pancho wrote:
>>>> >> On 04/01/2021 11:00, gareth evans wrote:
>>>> >>> Thinking back to my first job, nearly 50 years ago now,
>>>> >>> when I had to dis-assemble DEC's paper tape BASIC
>>>> >>> interpreter in order to enhance it, I guess that
>>>> >>> dis-assemblers and decompilers must now be ten-a-penny,
>>>> >>> especially for programs running under Windows where
>>>> >>> the structure of Windows programs is well-known with
>>>> >>> an assumption that C was the source language?
>>>> >>>
>>>> >>> But I wonder if Artificial Intelligence could, after
>>>> >>> being fed with numerous instruction sets, take a
>>>> >>> block of binary, and analyse its source without
>>>> >>> any prior knowledge of the instruction set?
>>>> >>>
>>>> >>> I am particularly interested in the Binary Blob
>>>> >>> provided for Raspberry Pi computers, with a view to
>>>> >>> getting detailed knowledge of the video processors
>>>> >>> employed therein.
>>>> >>>
>>>> >> I think a lot of the problem is defining the question.
>>>> >>
>>>> >> What do you want it to do?
>>>> >>
>>>> > I don't want it to do anything. I want to play at a low level
>>>> > with the thing ... large oaks from little acorns grow.
>>>> >
>>>>
>>>> Play with what thing? What is an instruction set, what is the Binary
>>>> Blob? Why do you need an AI?
>>>>
>>>> Most compilers leave fingerprints on executables you don't need an AI
>>>> to detect them. I remember decompiling in the early 80's but complex
>>>> modern code can often be a challenge to naively reverse engineer a
>>>> high level understanding from even if you do have source code. Take
>>>> away sensible variable and function names and you are stuffed.
>>> I've had more than one experience in putting those meaningful
>>> variable
>>> names right back. It's actually pretty easy, a somewhat rote process.
>>> Find the read input instruction. Since you know the layout of the input
>>> record, you now have labels to many of the references to that input
>>> area.
>>> I think you can work out how to proceed.
>>>
>> Without the source how do you know any meaningful variable names in
>> the first place?
>
> The programs were reading our files.
> We already had record layouts for those files.
>

Yes, I understand how you can disassemble a simple program. I did it
myself in the 1980s.

However modern programs are much more complex. They are built upon many
levels of indirection, libraries, composition, inheritance, function
pointers, events, etc, etc... We use structure, design patterns and such
like to allow us to recognise complex ideas quickly. That gets lost in
compilation.

I just can't see how I would reverse engineer an understanding of
anything but the most simple disassembly in any reasonable time frame.
Re: AI and decompilation? [message #403732 is a reply to message #403704] Tue, 05 January 2021 05:51 Go to previous messageGo to next message
The Natural Philosoph is currently offline  The Natural Philosoph
Messages: 238
Registered: January 2012
Karma: 0
Senior Member
On 04/01/2021 23:00, Pancho wrote:
> On 04/01/2021 22:50, Dan Espen wrote:
>> Pancho <Pancho.Dontmaileme@outlook.com> writes:
>>
>>> On 04/01/2021 17:51, gareth evans wrote:
>>>> On 04/01/2021 13:08, Pancho wrote:
>>>> > On 04/01/2021 11:00, gareth evans wrote:
>>>> >> Thinking back to my first job, nearly 50 years ago now,
>>>> >> when I had to dis-assemble DEC's paper tape BASIC
>>>> >> interpreter in order to enhance it, I guess that
>>>> >> dis-assemblers and decompilers must now be ten-a-penny,
>>>> >> especially for programs running under Windows where
>>>> >> the structure of Windows programs is well-known with
>>>> >> an assumption that C was the source language?
>>>> >>
>>>> >> But I wonder if Artificial Intelligence could, after
>>>> >> being fed with numerous instruction sets, take a
>>>> >> block of binary, and analyse its source without
>>>> >> any prior knowledge of the instruction set?
>>>> >>
>>>> >> I am particularly interested in the Binary Blob
>>>> >> provided for Raspberry Pi computers, with a view to
>>>> >> getting detailed knowledge of the video processors
>>>> >> employed therein.
>>>> >>
>>>> > I think a lot of the problem is defining the question.
>>>> >
>>>> > What do you want it to do?
>>>> >
>>>> I don't want it to do anything. I want to play at a low level
>>>> with the thing ... large oaks from little acorns grow.
>>>>
>>>
>>> Play with what thing? What is an instruction set, what is the Binary
>>> Blob? Why do you need an AI?
>>>
>>> Most compilers leave fingerprints on executables you don't need an AI
>>> to detect them. I remember decompiling in the early 80's but complex
>>> modern code can often be a challenge to naively reverse engineer a
>>> high level understanding from even if you do have source code. Take
>>> away sensible variable and function names and you are stuffed.
>>
>> I've had more than one experience in putting those meaningful variable
>> names right back.  It's actually pretty easy, a somewhat rote process.
>> Find the read input instruction.  Since you know the layout of the input
>> record, you now have labels to many of the references to that input
>> area.
>>
>> I think you can work out how to proceed.
>>
>>
> Without the source how do you know any meaningful variable names in the
> first place?

Well you have hints. From what the code does...lets say you have code
that loads data from two stack based memory locations adds them together
and used then to access what is clearly an array, - that gives a strong
hint that the original variables can be integers, and the index one is
simply a temporary way to get a value into that array, so you call that
'i' or 'arrayIndex' pro tem...

Then once you have an idea as to what data that array holds, you can
update it and the index to something more meaningful.

The whole process is actually covered in philosophy: It is the problem
of induction. How do you work back from results to causes?

Given that the answer to Life The Universe and Everything was '42', what
in fact was the question? (40+2)? (6x7)?

There are an infinite number of expressions that give that answer, and
an infinite number that don't.

This is where Karl Poppers philosophy of science steps in. Instead of
regarding there to be One True Reason why science works, namely that
scientists are in the business of discovering the Truth, he pointed out
that just because stuff worked (and 6x7 does indeed give 42) that was no
reason to suppose that some other completely different construct might
not work equally as well, and that had indeed happened with relativity
and Newtonian gravity.

The Problem of Induction is that many theories can give the same
predicted result. Sherlock Holmes is a sham. The Dog That Didnt Bark in
the Night didn't bark, allegedly, because it knew the thief. Why? It
might have been abducted by aliens, drugged, actually out hunting
rabbits, in a soundproof box, or the Russians did it using a robot. or
just too plumb wore out with old age to care.

The truth is not provable. All we have is stuff that works. Given
running machine code, there are an infinite number of source codes that
might have produced it, and an infinite number that did not.

We aren't there, ultimately, to reproduce *the* exact source, but to
arrive at *an* editable source, that we can use.
Like science, and religion, it doesn't have to be true, to be useful,
and like science, and religion, its ultimate content will be forever
truth-indecidable.

--
"First, find out who are the people you can not criticise. They are your
oppressors."
- George Orwell
Re: AI and decompilation? [message #403733 is a reply to message #403728] Tue, 05 January 2021 06:13 Go to previous messageGo to next message
Arne Luft is currently offline  Arne Luft
Messages: 321
Registered: March 2012
Karma: 0
Senior Member
Ahem A Rivet's Shot <steveo@eircom.net> writes:
> Richard Kettlewell <invalid@invalid.invalid> wrote:
>> gareth evans <headstone255@yahoo.com> writes:
>>> I am particularly interested in the Binary Blob
>>> provided for Raspberry Pi computers, with a view to
>>> getting detailed knowledge of the video processors
>>> employed therein.
>>
>> Why would you do that instead of reading a reference manual for the
>> target architecture?
>
> The documentation for the GPU on the RPi has not been published,
> he seeks to reverse engineer it from the binary code that implements a
> published API on it.

I was under the impression it was a VideoCore IV, which appears to be
sufficiently documented for GNU toolchain port.

https://docs.broadcom.com/doc/12358545
https://github.com/itszor/vc4-toolchain

--
https://www.greenend.org.uk/rjk/
Re: AI and decompilation? [message #403734 is a reply to message #403652] Tue, 05 January 2021 06:26 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Adrian Caspersz

On 04/01/2021 11:00, gareth evans wrote:
>
> But I wonder if Artificial Intelligence could, after
> being fed with numerous instruction sets, take a
> block of binary, and analyse its source without
> any prior knowledge of the instruction set?

If that became possible, it would not be a far step for an AI machine to
self-analyse itself or another AI machine. It could make clones and
unwittingly modify them.

Who knows where that could lead, or what mutations could happen? Life?

>
> I am particularly interested in the Binary Blob
> provided for Raspberry Pi computers, with a view to
> getting detailed knowledge of the video processors
> employed therein.

The Chinese would be very interested in you.

I'm sure some of the architecture is provided in layers, some public
like frame buffers and some not like acceleration features. So your
machine code experiments could be done on the former, to learn to walk
first. Or choose another more open graphics chipset if you need more
documentation to get to first base. Perhaps there is on a low end mobile
phone?

Here's a manual way of reverse engineering random chinese hardware.

[016] IT9919 Hacking - part 1 - Reading firmware with flashrom
https://www.youtube.com/watch?v=j7JRosD_ua8

Your AI solution would have to replicate the ability of the human.

--
Adrian C
Re: AI and decompilation? [message #403736 is a reply to message #403703] Tue, 05 January 2021 06:45 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: gareth evans

On 04/01/2021 22:50, Dan Espen wrote:
> Pancho <Pancho.Dontmaileme@outlook.com> writes:
>
>> Most compilers leave fingerprints on executables you don't need an AI
>> to detect them. I remember decompiling in the early 80's but complex
>> modern code can often be a challenge to naively reverse engineer a
>> high level understanding from even if you do have source code. Take
>> away sensible variable and function names and you are stuffed.
>
> I've had more than one experience in putting those meaningful variable
> names right back. It's actually pretty easy, a somewhat rote process.
> Find the read input instruction. Since you know the layout of the input
> record, you now have labels to many of the references to that input
> area.
>
> I think you can work out how to proceed.

ISTR that my attack on the executable started by seeking out lines
of code that might be subroutine calls, "JSR PC, address" in the
PDP11 code. This served to create a number of identifiable and
separate blocks from which to proceed.

Of course, this was much easier as it was a stand-alone paper
tape program with no operating system underneath to muddy the
water.
Re: AI and decompilation? [message #403737 is a reply to message #403732] Tue, 05 January 2021 06:52 Go to previous messageGo to next message
Martin Gregorie is currently offline  Martin Gregorie
Messages: 69
Registered: April 2013
Karma: 0
Member
On Tue, 05 Jan 2021 10:51:35 +0000, The Natural Philosopher wrote:

> We aren't there, ultimately, to reproduce *the* exact source, but to
> arrive at *an* editable source, that we can use.
> Like science, and religion, it doesn't have to be true, to be useful,
> and like science, and religion, its ultimate content will be forever
> truth-indecidable.

+1


--
--
Martin | martin at
Gregorie | gregorie dot org
Re: AI and decompilation? [message #403738 is a reply to message #403715] Tue, 05 January 2021 06:54 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: gareth evans

On 05/01/2021 01:38, J. Clarke wrote:
>
> He's talking about something that you can give a pile of object code
> from an unknown source (I mean _really_ unknown--it could be for Z/OS
> or a VAX or Intel or Alpha or any other architecture, compiled from C
> or PL/I or Fortran or pick a language at random, with it figuring from
> there what the code does.
>

Indeed!

I've discussed this before (And probably too often according to
my biographers and stalkers! but I'm interested in computers for
themselves, as wonderful complex machines, and not interested in
what you can use them for.

My frustration lies with the Raspberry Pi series that come,
for very little outlay of pennies, with a multi processor
graphics chip which is believed to exceed the capabilities of
the associated ARM processor but about which no detailed
information is forthcoming.
Re: AI and decompilation? [message #403739 is a reply to message #403726] Tue, 05 January 2021 06:56 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: gareth evans

On 05/01/2021 09:07, Richard Kettlewell wrote:
> gareth evans <headstone255@yahoo.com> writes:
>> Thinking back to my first job, nearly 50 years ago now,
>> when I had to dis-assemble DEC's paper tape BASIC
>> interpreter in order to enhance it, I guess that
>> dis-assemblers and decompilers must now be ten-a-penny,
>> especially for programs running under Windows where
>> the structure of Windows programs is well-known with
>> an assumption that C was the source language?
>>
>> But I wonder if Artificial Intelligence could, after
>> being fed with numerous instruction sets, take a
>> block of binary, and analyse its source without
>> any prior knowledge of the instruction set?
>>
>> I am particularly interested in the Binary Blob
>> provided for Raspberry Pi computers, with a view to
>> getting detailed knowledge of the video processors
>> employed therein.
>
> Why would you do that instead of reading a reference manual for the
> target architecture?
>

Because no such manuals are available. The BroadCom GPUs are
a closely guarded proprietary secret to hoi polloi.
Re: AI and decompilation? [message #403741 is a reply to message #403732] Tue, 05 January 2021 07:06 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: gareth evans

On 05/01/2021 10:51, The Natural Philosopher wrote:
>
> The whole process is actually covered in philosophy: It is the problem
> of induction. How do you work back from results to causes?
>
> Given that the answer to Life The Universe and Everything was '42', what
> in fact was the question? (40+2)? (6x7)?
>
> There are an infinite number of expressions that give that answer, and
> an infinite number that don't.
>
> This is where Karl Poppers philosophy of science steps in. Instead of
> regarding there to be One True Reason why science works, namely that
> scientists are in the business of discovering the Truth, he pointed out
> that just because stuff worked (and 6x7 does indeed give 42) that was no
> reason to suppose that some other completely different construct might
> not work equally as well, and that had indeed happened with relativity
> and Newtonian gravity.
>
> The Problem of Induction is that many theories can give the same
> predicted result. Sherlock Holmes is a sham. The Dog That Didnt Bark in
> the Night didn't bark, allegedly, because it knew the thief. Why? It
> might have been abducted by aliens, drugged, actually out hunting
> rabbits, in a soundproof box, or the Russians did it using a robot. or
> just too plumb wore out with old age to care.
>
> The truth is not provable. All we have is stuff that works. Given
> running machine code, there are an infinite number of source codes that
> might have produced it, and an infinite number that did not.
>
> We aren't there, ultimately, to reproduce *the* exact source, but to
> arrive at *an* editable source, that we can use.
> Like science, and religion, it doesn't have to be true, to be useful,
> and like science, and religion, its ultimate content will be forever
> truth-indecidable.
>

That's an interesting and thought-provoking aside!
Re: AI and decompilation? [message #403742 is a reply to message #403733] Tue, 05 January 2021 07:12 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: gareth evans

On 05/01/2021 11:13, Richard Kettlewell wrote:
> Ahem A Rivet's Shot <steveo@eircom.net> writes:
>> Richard Kettlewell <invalid@invalid.invalid> wrote:
>>> gareth evans <headstone255@yahoo.com> writes:
>>>> I am particularly interested in the Binary Blob
>>>> provided for Raspberry Pi computers, with a view to
>>>> getting detailed knowledge of the video processors
>>>> employed therein.
>>>
>>> Why would you do that instead of reading a reference manual for the
>>> target architecture?
>>
>> The documentation for the GPU on the RPi has not been published,
>> he seeks to reverse engineer it from the binary code that implements a
>> published API on it.
>
> I was under the impression it was a VideoCore IV, which appears to be
> sufficiently documented for GNU toolchain port.
>
> https://docs.broadcom.com/doc/12358545
> https://github.com/itszor/vc4-toolchain
>

The first of those does not produce anything.

Does the second describe the GPU in some detail and describe
the instruction set such that I might produce my own binary blob
to do something completely different?

Also, AIUI, a different GPU has been incorporated into the
64-bit RPis.

Anyway, thanks for your input.
Re: AI and decompilation? [message #403743 is a reply to message #403726] Tue, 05 January 2021 07:32 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: J. Clarke

On Tue, 05 Jan 2021 09:07:21 +0000, Richard Kettlewell
<invalid@invalid.invalid> wrote:

> gareth evans <headstone255@yahoo.com> writes:
>> Thinking back to my first job, nearly 50 years ago now,
>> when I had to dis-assemble DEC's paper tape BASIC
>> interpreter in order to enhance it, I guess that
>> dis-assemblers and decompilers must now be ten-a-penny,
>> especially for programs running under Windows where
>> the structure of Windows programs is well-known with
>> an assumption that C was the source language?
>>
>> But I wonder if Artificial Intelligence could, after
>> being fed with numerous instruction sets, take a
>> block of binary, and analyse its source without
>> any prior knowledge of the instruction set?
>>
>> I am particularly interested in the Binary Blob
>> provided for Raspberry Pi computers, with a view to
>> getting detailed knowledge of the video processors
>> employed therein.
>
> Why would you do that instead of reading a reference manual for the
> target architecture?

Because there are features not described in the reference manual.
Re: AI and decompilation? [message #403744 is a reply to message #403731] Tue, 05 January 2021 07:46 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Bob Eager

On Tue, 05 Jan 2021 10:38:16 +0000, Pancho wrote:

> I just can't see how I would reverse engineer an understanding of
> anything but the most simple disassembly in any reasonable time frame.

One of my former colleagues did a Ph.D. on it:

https://kar.kent.ac.uk/61349/

--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org
Re: AI and decompilation? [message #403746 is a reply to message #403729] Tue, 05 January 2021 08:06 Go to previous messageGo to next message
Anonymous
Karma:
Originally posted by: Thomas Koenig

The Natural Philosopher <tnp@invalid.invalid> schrieb:
> The C. Some things that are
> neat in assembler are ugly as sin in C.

One thing that is hard to do with C is to have different entries
to the same function, something like:

bar:
.cfi_startproc
... do something
foo:
... do something else

ret

and then either call foo or bar.
Re: AI and decompilation? [message #403747 is a reply to message #403734] Tue, 05 January 2021 08:07 Go to previous messageGo to previous message
Anonymous
Karma:
Originally posted by: Thomas Koenig

Adrian Caspersz <email@here.invalid> schrieb:

> If that became possible, it would not be a far step for an AI machine to
> self-analyse itself or another AI machine. It could make clones and
> unwittingly modify them.

The solution to the halting problem :-)
Pages (3): [1  2  3    »]  Switch to threaded view of this topic Create a new topic Submit Reply
Previous Topic: “A damn stupid thing to do”—the origins of C
Next Topic: Re: New Light on the Legend of Mel
Goto Forum:
  

-=] Back to Top [=-
[ Syndicate this forum (XML) ] [ RSS ] [ PDF ]

Current Time: Fri Apr 19 12:25:58 EDT 2024

Total time taken to generate the page: 0.08235 seconds