?
Solved

Reverse engineering in C++

Posted on 2003-03-26
20
Medium Priority
?
1,212 Views
Last Modified: 2007-12-19
Hullo everyone. I just read about reverse engineering on some websites today. does the term mean that i can get C++ code from an executable? I am really curious how this works, because until today, I was thinking of "creating" something like that in the future...but the dream's already been dashed. any info out there???
0
Comment
Question by:greeek_god
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 3
  • +5
20 Comments
 
LVL 2

Expert Comment

by:skyDaemon
ID: 8214667
Actually decompiling or reverse-engineering someone's code is generally considered illegal and in violation of copyrights, licenses etc.  The only legal application I can think of would be porting between languages.  Let's say I wrote a program in vb and I decided I wanted it in c++.  I could run your decompiler on the executable and probably get something that works, even if it isn't pretty.  If I wrote the original vb app, then I'm obviously authorized to decompile my own code in a different format.

Technically it is possible to do, though I don't have an example of a generic program which actually does it.  As for the c++ code you generate, it wouldn't look like the original, it would probably just represent a standardized set of code which happens to reduce to the same compiled executable.  It would be "similar" to the original though.

0
 

Expert Comment

by:unknownmat
ID: 8214915
skyDaemon,

What do you mean "generally considered illegal" ?  Is it 'legally' illegal ... Has someone been taken to court over it?

I don't know quite how it works with software, but wasn't there a big case about reverse-engineering microchips, and the courts upheld the rights to reverse engineer... Anyway, just a blurb I read quite a while ago, but I would be very skeptical of someone telling me that I was not 'allowed' to de-compile (insofar as that's possible) code...

Would it be similarly 'generally illegal' to look at someone's dis-assembly?

Matt
0
 
LVL 2

Expert Comment

by:skyDaemon
ID: 8215170
Let's be clear.  Reverse-engineering is wholly different from decompiling.  We are talking about decompiling explicitly here, not reverse-engineering.  Decompiling, makes no attempt whatsoever to differentiate a product.  The product is intended to be a direct copy of the original.  In a court context, it is usually incredibly hard to prove that someone actually used your code.  Use of a product of this nature should be sufficient to remove that doubt and would likely get you busted if it became known that a product such as this was used in the creation of a commercial product.  It is the case of creating commercial products where someone cares enough to actually sue you.  If you were to simply look at decompiled code you should be okay.  Some license agreements try to restrict that as well, but it would be pretty hard to prove except in unusual circumstances.  They likely don't have the ability to restrict that anyway.

As for decompiling.  There are no decompiling rights that have ever been upheld.  Rights associated with reverse engineering are on the basis of either the product being sufficiently different from the original to call it another product, or in the situation where processes used were so ununique that you really can't claim it as unique property.  Anyone may logically think through a process and attempt to reinvent it.  Decompiling is the epitome of the situation where none of these things was done or even attempted.

0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
LVL 2

Expert Comment

by:bkrahmer
ID: 8215917
To use skyDaemon's example, I would just like to say that if I had a massive amount of code to port from one language to another, it would be easier by probably a magnitude or so to just create a compiler that would output the other language from source instead of an executable.  

I have played with a couple of decompilers that will give you c/c++ code for a windows .exe.  The main problems are that the output code does not have meaningful variable and function names, and that it would be hard to figure out all of the weird optimizations that compilers can do to production code.

Quite an interesting topic, but one that doesn't ever get fully solved.  There isn't much of a market for such a tool.  Good engineers that need to figure out a competitors voodoo magic are typically trying to solve one small problem, and can read assembly and figure out the problem directly.

brian
0
 
LVL 12

Expert Comment

by:Salte
ID: 8216658
I would say a decompiler or exe2cpp program would be near impossible to write. Even if the program was originally written in C++ it might be hard and if it wasn't written in C++ in the first place it is very difficult.

Disassembler is more possible and there are disassemblers out there. Some systems even come with one provided. The .NET sdk has a disassembler.

Alf
0
 
LVL 10

Expert Comment

by:substand
ID: 8216690
I don't know of any decompilers that will decompile to a langauge other than assembly language (as I've seen assembly ones), although I'm almost sure someone has written one, as it wouldn't be too difficult to do (time consuming is another story)...

however, unless the code is open source, it is illegal under the (i think its called) Digital Millenium Copyright Act.  My understanding is that basically, as long as you come up with something, whether or not you actually copyright it, the intellectual copyright is yours.  However, if you do not legally copyright it or send it to yourself in a dated envelope (or some other means to show what date you made it to prove you created it before the next guy), it would be hard if not impossible to prove in court.

It would be fairly easy for any major company to sue you (assuming they found out) for decomiling thier code, as most companies will patent any algorithm that is not generic, mostly as a "prize" for thier programmers (i've known of even "small" companies that do this, so I assume larger ones do)...  All they have to do is show that your algorithm is the same as thiers (even if it's in another langauge) and that they did it first.

It is illegal even to decrypt a file that you were not given the key to, much less decompile code.

It is not "generally illegal," it is totally illegal unless the code/program is distributed under some sort of open source license.

I hope this helps you out.  Sorry I feel so strongly about it.  I worked for a company not too long ago that couldn't pay the lawyers fees to defend ourself in court.  Someone we wrote a program (somewhat of a VC) for stole some code I wrote (and we had a contract saying we retained the ownership of the code- only we couldn't defend it with a lawyer) and I was supposed to get royalties for 15% of the sales value.  The code has made a couple million dollars so far, so I feel strongly about this issue.  That was worth 300k to me.

0
 
LVL 8

Expert Comment

by:fl0yd
ID: 8217132
substand: "as it wouldn't be too difficult to do"

Nothing could be farther from the truth. It is plain _impossible_ to get the c++ source code from an executable. End of story.

If you want to do reverse engineering, get a solid understanding of assembly language for your target platform, an advanced disassembler [IDAPro is probably among the best there are], a good debugger [SoftICE is great, especially if you want to trace into kernel modules] and be prepared to pour some major effort into it. Nothing that would help you there, debug information compiled into the source code may help, but is rather unlikely to ever happen.

.f
0
 

Expert Comment

by:andy4
ID: 8217295
It is not possible to create a true cpp decompiler, because exe's don't not have enough information in them - a whole lot of data written in high level language is lost during compilation and linking and you will never ever get it back to source, because it just isn't there.  And then there is a problem to mach statements in assembly language to statements that are available in high level languages.
If someone would have decided to spend a great deal of time just for the sake of doing it then he would get his share of satisfaction at least with borland's BCB, Delphi and MFC applications because they leave way too much useless information in the final files, e.g. Unit names, form names, GUI element names.

Dissasembler however is a whole different story. It is possible to disassemble any windows exe and a bunch of other files. An excellent dissasebler is IDA from dataresuce.com. If you need to take a peak at algorithms used in hostile program then this is the tool for you, but you need to be able to read assembly code, which takes a while to get used to, at lest it did for me.  

Now and then probably many of us have dreamed about writing a decompiler for some language or another ;)) Ilfak at datarescue has spent years writing that decompiler. Many young people believe that it's a snap to create a dissasebler and even a decompiler, but then again one can believe anything if he knows little about it.

True decompilers however are possible for interpreted languages, e.g. Java and MS .NET. I have played around with insides of a java decompiler myself with excellent results the most fun was to revert tricks created by obfuscators such as zelix class master...
0
 
LVL 8

Expert Comment

by:fl0yd
ID: 8217405
In addition to andy4's post, IDA Pro is able to decompile java classes :) If memory serves me correctly, there should also be a visual basic decompiler around, but as you said, c++-decompilers are impossible to do, even disassemblers are harder than one would expect. Many of which will fail to correctly disassemble this code:

    mov eax, ebx
    jp @F
    DB 05h
@@:
    cmp eax, 1

.f
0
 
LVL 10

Expert Comment

by:substand
ID: 8217465
floyd: how do you figure that a c++ decomiler is impossible to do?

I admit, you won't get the exact source code of the file you are decompiling.  in fact, the decomiler wouldn't know which language the source was written in in the first place..

however, i still stand by my statement that it's not hard to do.  all you do is write a "backwards" compiler, one that takes assembly language as input and "compiles" it to c++.  

why is that impossible?  

saying that it is impossible to write a program like that is the same as saying it is impossible to write a program that converts c++ to asm, which every c++ compiler does.  

again, why is that impossible?  

0
 
LVL 8

Expert Comment

by:fl0yd
ID: 8217577
Funny thing would be to write a proper de-optimizer in your de-compiler. Even more funny stuff is on the road to find out class definitions, not even talking about public, private, protected. I'm also very curious how you would tackle templates. Even more curious, how you would revert to the code, when it is a compile time constant. Just a short fully functional c++ program will illustrate, why it is _IMPOSSIBLE_:

template<unsigned int n>
struct FactT {
    enum { Value = n * FactT<n - 1>::Value };
};
template<>
struct FactT<1> {
    enum { Value = 1 };
};
template<>
struct FactT<0> {
    enum { Value = 1 };
};

#define Factorial(n) FactT<n>::Value

int main( int argc, char* argv[] ) {
    return Factorial( 5 );
}

Have fun writing a de-compiler for that :P

.f
0
 
LVL 8

Expert Comment

by:fl0yd
ID: 8217666
Saying that writing a de-compiler is possible because you can write a compiler is like saying: Since I can throw a ball up into the air and it will fall down, landing on a sandy beach, creating a bit of heat due to friction, the opposite would be possible as well, i.e. the ball sitting on the beach, all of a sudden decides to gather heat energy from its environment, then converting it into kinetic energy and miraculously jumping off of the ground. I haven't witnessed that before, but maybe I was just not looking when it did happen. Maybe writing a decompiler is a lot easier than I thought, I will try to run my compiler's code in opposite direction and see what I get. With your argument it should be a valid apporach.

.f
0
 
LVL 12

Accepted Solution

by:
Salte earned 80 total points
ID: 8217681
substand,

sigh, the best way to get you to understand that it is impossible is to tell you to try to write one yourself.

If you really look into the problem you will see that it is indeed very difficult.

It might work for a very limited set of executables provided you make a zillion assumptions and all of those assumptions must turn out to be correct for those executables.

In practice you can't make a zillion assumptions without having at least one of them break and so we say it is impossible.

You are welcome to try to prove me wrong but I am afraid the more you try the more you will realize that it is impossible.

Let's start out simple, you have some code like this (assuming pentium here):

mov eax,val[ebx]
add eax,val2[ecx]
mov val3[ebp],eax

This code fragment seem easy enough. but what are those val[ebx] thingies? In high level language you have two types of constructs that can lead to an offset + register value addressing. Either the register contain an index value and the 'offset' is really a memory address to the start of an array or the register contain a pointer to some location in memory and offset is simply an offset in a struct or class where that register is pointing to.

In assembly they are both the same.

Now, you might guess that a 'small offset' value is a struct access while a 'large offset' value is the base of an array but where do you set your limits for guessing and what about the grey area in between?

Worse, this is very simple access, it is straight class access to a simple variable or straight array access to a global array in memory. There are a sillion more ways to access data, such as accessing arrays in structs or accessing a struct member of an element of an array of structs etc.

Yes, you can provide some good guesses in some of these situations and it can lead to possible C++ code. For one thing you will not be able to guarantee that the C++ code generates exactly the same executable since that would depend on the particular C++ compiler and options given to it but you might try to claim that in a sense the program "is the same" whatever that means. However, you would only be able to do so in those situations where your guesses turned out to be correct guesses, places where you guess wrong would quickly lead to bizarre code.

Another problem would be that the C++ code you end up with - in general - be completely unreadable and actually even assembly listing would be more readable than the C++ code you would generate from your generator. So the purpose of it all would really be very questionable. A disassembler would generate more readable code than your C++ code and it would also guarantee that the assembled code will be identical with the .exe that was taken on input.

So it would be a lot of work and only work in 2% of the .exe files out there and not at all on any of any size where it might be useful (it would only work on very small files where you don't really need it) and in addition in the few cases it does work it produces less readable code than a disassembler would and it would be impossible to guarantee that if you compile that code with a compiler you will get back the original .exe.

In short, it is impossible.

Alf
0
 
LVL 12

Expert Comment

by:Salte
ID: 8217713
fl0yd,

remember to first de-link your executable with an anti linker tool. What? You don't have it? Well, I am sure substand has one inside one of his magical sleeves :-)

You could try to invoke the linker in reverse and giving the options before the command etc and hope for the best...and then again maybe you shouldn't.

Alf
0
 
LVL 8

Expert Comment

by:Exceter
ID: 8219103
>> You could try to invoke the linker in reverse and giving the options before the command etc...

The implications of this principle are boundless. The internet is no longer safe... I am going to go reverse engineer a message encrypted with a 1024 bit key by running the public key through PGP in reverse...

Exceter
0
 
LVL 12

Expert Comment

by:Salte
ID: 8219377
The implications of this principle are boundless. The internet is no longer safe... I am going to go reverse engineer a message encrypted with a 1024 bit key by running the public key through PGP in reverse...


One problem with that Exceter is that "PGP" is the same when spelled in reverse.... so it might not work ;-)

Alf
0
 
LVL 8

Expert Comment

by:fl0yd
ID: 8219404
Omg, you are right! Darn, why haven't any of those crypto-gurus thought about that before?

Anyway, thanks to substand, I have found a way to repair that car I ran into a wall just the other day. Thanks, I was just about giving up and have it picked up by the guy from the local junk yard.

Salte:

I tried your suggestions but none of those really seemed to work out as expected. I'll keep trying, there has got to be a way to get the source code for my favourite DOS game that refuses to run on my windows machine.
Once I'm through with that I'll have my old pure-asm-hacks run through the de-compiler and hope for it to create a great system of class-hierarchies. I'll keep you informed how things are progressing.

.f
0
 
LVL 8

Expert Comment

by:Exceter
ID: 8219412
Ooo... good point. :-)
0
 
LVL 12

Expert Comment

by:Salte
ID: 8219558
Perhaps I should put on my serious mode again,

I hope subtand realize by now that it is quite impossible and that his argument as to why it should work isn't quite acceptable. Just because you have a compiler that can translate from language L to machine code doesn't mean that you can have another translater that translate from machine code to L.

For one thing we aren't talking about mathematical one-to-one mappings here we're more talking about a many-to-many type of mappings where many different source programs can map to the same or similar exe and the same program can map to many different exes by using different versions of compilers that all can understand the language L and translate it to machine code. So, in mathematical terms, since the mapping is not one to one but rather many to many there aren't any inverse mapping and so a decompiler isn't necessarily possible - and in fact even though it may be possible in theory is quite impossible in practice due to reasons I mentioned earlier.

Also, I hope that the original question for this thread is also answered by now, there are no C++ decompiler and making one would be very hard even if you let it be made so that it can only decompile a small subset of .exe files out there - and typically the small subset would only be small .exe files which you can understand quickly enough by simply reading the hex code. Anything beyond that would be near impossible for the decompiler to decompile in a manner that would make sense to a human reader, the 'C++ code' output from it would be just as hard - if not harder - to read as the original machine code found in the .exe file.

It is also an issue about legality as trying to decompile your favorite game so you can make an alternative implementation would be very illegal.

In saying so I consider this case closed and although it has been fun trying to explore the possibilities of a 'what if' world where this reverse logic worked I believe both substand and greek_god has got the message and we don't have to pound it in further.

Alf
0
 
LVL 8

Expert Comment

by:Exceter
ID: 8219894
>> I believe both substand and greek_god has got the message and we don't have to pound it in further.

Agreed.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Container Orchestration platforms empower organizations to scale their apps at an exceptional rate. This is the reason numerous innovation-driven companies are moving apps to an appropriated datacenter wide platform that empowers them to scale at a …
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question