Read C++ executable/binary file and convert it in to C++ code

Hi,

Many times while working, I get in to a situation where there is a severe problem in the C++ app I am supporting. And I need to investigate the root cause of the problem and a possible fix for it. However, I dont have access to C++ source code as it is in control of different team. I only have executable (I know it is C++ executable bcoz they have told me so) and log file generated by it. Most of the times its not possible to find out the problem just by looking at log file. App generally runs on Linux but for testing purpose we also run a version of it on Windows. I understand C++ quite a lot. Is there any way/tools which I can use to read/open this binary executable and understand what exactly is the source code (functions, variables, threads, flow, runtime data state etc.)?

Thank you.
James BondSoftware ProfessionalAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

käµfm³d 👽Commented:
I think you'd be better off trying to get a copy of the source code (or maybe the debug symbols) from the other team.
0
n2fcCommented:
The executable has been translated to the machine language of the target machine. It may also reference system calls to the target Operating System.

To debug the object code would require intimate knowledge of both of these, i.e. the target machine language & operating system...

In addition, some code may reside within DLL's, further complicating the issue...

You can try running the code within a run-time debugger, but your best shot is still going to be careful analysis of the logfile the original author saw fit to generate!
0
fawtrey1Commented:
Reverse engineering has been done since programming came along. That's why I like Perl, PHP and Python. You see the final code. Going from a compiled code that goes from source to P-Code then binary machine code.. is almost impossible to reverse. Going back to Assembly would be possible with some success but not prefect or easy.

If you could convert it back to C++ it would not have the meaningful labels needed to follow the code flow. Not impossible but not easy. Maybe buggy on it's on.

 You need to talk and work with the Developers to get you in the loop on resolving the problems. You need to identify the problem not rewrite the code is that right? Have them put in more log information to help you find the issue via a trace mode to follow the program flow.

Good Luck
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

James BondSoftware ProfessionalAuthor Commented:
"That's why I like Perl, PHP and Python." - can you please elaborate on this?
0
phoffricCommented:
>> where there is a severe problem in the C++ app I am supporting.
    Exactly what is your role and how do you support the C++ app?

>> And I need to investigate the root cause of the problem and a possible fix for it. However, I dont have access to C++ source code as it is in control of different team.
     In this case, no one is expecting you to be able to fix the source code since you say you do not have access to it.

     I'll assume that your role is a tester who reports problems to the development team, since as you mentioned, they are the only ones with access to source code. In this case, here is my interpretation of the words you are using:
investigate - try to find as many scenarios as possible that cause the problem to occur.
root cause -from these scenarios try to identify a minimal set of operations steps including a minimal set of known inputs to the app so as to be able to reproduce the problem. You may find that there is more than one minimal set that causes the problem from completely different causes.
possible fix -since you didn't seem to know about reverse engineering that was mentioned in another post, then no one should be expecting you to do that. Although you cannot "fix" the problem, you may be able to find a work-around that can be put into the standard operating procedures (SOP) until a development "fix" can be made.

   With these thoughts, you should talk to your boss and clarify exactly what is expected of you. Get an example scenario of how you are expected to complete your task - you need to know exactly what constitutes success.

   You may find that you are supposed to get a debug version of the app along with access to source code so that you can use a debugger. Or, if you are not permitted to have access to source code, then the development team could still give you an app that enables you to run the program with Valgrind that may help identify severe problems.
   http://valgrind.org/
   http://valgrind.org/docs/manual/quick-start.html
0
peprCommented:
"That's why I like Perl, PHP and Python." - can you please elaborate on this?

As fawtrey1 wrote few sentences later

is almost impossible to reverse.

To simplify it further, it wil probably be easier to rewrite it from scratch on your own. The reason is that the optimizer of the C++ compiler may reorganize the binary code so much that you will not be able to find the original intention.

Moreover, the reverse engineering is often forbidden in the license agreement.
0
fawtrey1Commented:
Perl,PHP and Python execute the code  as it reads the source on the fly. You always have the source and can make changes. It's compiled into at run time into memory. I would like to see a Visual Basic program that would run source on the fly. I have a ton of Visual Basic code.
0
James BondSoftware ProfessionalAuthor Commented:
@phoffric
I heard about reverse engineering in @fawtrey1 post. But he mentioned its not worth using it with C++ code so I didn't pursue it. What do you think? From c++ a.out, can we go back to c++ code using reverse engineering and if yes are there any standard tools, processes available?

@fawtrey1
I have still failed to understand what you are trying to convey. Are you saying that code written using high level languages like C, C++, Java are impossible to reverse and with Perl, Python its possible to reverse or vice versa? Sorry about that.
0
phoffricCommented:
>> I heard about reverse engineering
   I have never heard of anyone needing to do this in business as the cost is prohibitive when there are developers available. If they are not available, then I do not understand your organization at all.

   Exactly what is your role and how do you support the C++ app?
0
peprCommented:
Python and the like languages compiles the sources to a higher-level bytecode that reflects the input source somehow. Moreover, in Python, you can even in runtime get access to the classes that are represented as internal objects. All variable names are available as strings. Moreover, there usually are some tools for decompiling the bytecode. One of the main reasons for compilation here is to avoid repeating the interpretation of lines of the source codes (as it is done say with shell scripts) -- i.e. to get speed.

C++ produces code that is close to hardware (pure binary for the target procesor). The processor instructions do not reflect the original Object Oriented design, for example. Think about templates (or macros). They are defined once, and can be used many times. It usually means they are expanded textually or as binary results of the compilation (i.e. inline). There is no trace later, that the binary piece was once a templated function, for example.

In other words, the binary that comes from C++ looses a lot of information that were contained in source files during compilation and linking. The optimization can be done in several phases where the later phases change even the binary code from the earlier optimization phase, and it can remove traces of the C++ that could be partly visible in call sequences and possibly in how the data are organized. The ultimate goal of a C++ compiler is to produce binaries that is close to the best assembler program you can write by hand. It differs a lot from how the ideas look when you read the C++ source files of the program.
0
James BondSoftware ProfessionalAuthor Commented:
@phroffic
My role is to develop and maintain c++ apps and provide prod support. Many of our apps are integrated with other c++ apps which are written and owned by other teams. I wanted to know if something like i asked in this post is possible and if yes I would like to add it to my skillset. In that way i could have served my org more effectively. The intention of asking this question was to get legitimate information and not education. Reading other posts i have understood that its not viable to reverse engineer c++ binary.
0
James BondSoftware ProfessionalAuthor Commented:
@pepr
Thank you for your explaination. Regarding speed you mentioned python compiles source code to reduce repeat instructions in order to get speed. But you also said it compiles in to higher level byte code which i guess is far away from machine code generated by c++ compiler. I have not used Python much. Are you saying python is faster than c++/java?
0
phoffricCommented:
>> I wanted to know if something like i asked in this post is possible ... reverse engineer c++ binary
    Although I have not done it myself, it is possible. You can always tediously convert machine code into assembly code and then try to convert that into some roughly meaningful C++ code. Here is the first Google entry on reverse engineering a C++ binary:
    http://blog.flip-edesign.com/_rst/Reverse_Engineering_a_Compiled_C++_Binary_Part_1.html

>> if yes I would like to add it to my skillset
   Always nice to add something that interests you to your skillset.

>> In that way i could have served my org more effectively.
   Reverse engineering C++ code is certainly the least effective mechanism in helping your organization if your organization has teams that have the source code and do their own developing.

   If some libraries are written by third parties, then either they have bugs or you need support from them in using their API more effectively. If a specific to third party library forum exists, then that may be the best place to get help. I would expect that any attempt by you to reverse engineer a binary would require management approval for two reasons: 1) most inefficient way to solve a problem, and 2) may be illegal.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
James BondSoftware ProfessionalAuthor Commented:
@phroffic
Agree with your both reasons and by now fully understand them. This way of identifying problem i intend to do only in exceptional situations like lack of support from other app owners and my manager really want to do this no matter how long it takes. Thank you for your inputs.
0
phoffricCommented:
>> no matter how long it takes
I wouldn't go that far. If a manager has a deadline, then the deadline will dictate "how long it takes". Bottom line usually is that if you or your manager does not think that the deadline can be met, then expert(s) experienced with reverse engineering of binaries will be called in to meet the deadline.
0
peprCommented:
>> Are you saying python is faster than c++/java?

No. I wanted to say that the goal of every C or C++ compiler is to produce machine code instead of bytecode. Moreover, a good C++ compiler allows heavy optimization of the generated machine code. As machines are faster, more precise, and better in tedious work, the optimization may be impossible to do manually, or impossible to understand the result from human-head point of view.

For Python and Java -- the goals of the languages are different. For Java, portability of .jar is one of the main goals; for Python, a good, performance-efficient, higher level "interpreter" as the alternative to shell and the like languages was one of the goals. Both Java and Python want to make programming easier. The tradeof is also the speed of execution. Because of the other goals, the optimization of the compilation at the machine code level was not the goal. This way, the generated code reflects better the ideas written in the original source code -- i.e. the reverse engineering would go a more straigthforward transformation. Another example, .net code (C#) even reflects the Object Oriented features.

Python and Java cannot be as fast as C++ in principle as they at least always use more levels of indirection when working with variables. The real-time behaviour is also more difficult to guess in the languages that use garbage collection. The more complex a program is, the less suitable Python and Java are for the project when compared with C++.
0
käµfm³d 👽Commented:
Python and Java cannot be as fast as C++ in principle as they at least always use more levels of indirection when working with variables.
In Jeffrey Ricter's book, CLR via C#, 4th ed., the author professes that even though C# code is compiled into an intermediary code (like Java), it can run faster (in some cases) than C/C++ due to its "just-in-time" compilation. Not sure if the same can be said for Java, but I thought I'd mention it since C# is similar to Java in many respects.
0
phoffricCommented:
There is "just-in-time" also for Java, which converts the interpretive byte code into native machine code after the first pass of byte-code interpretation. And, of course, very well-written Java code may surpass poorly written C/C++ code by using, for example, poor choices of algorithms.
0
peprCommented:
... even though C# code is compiled into an intermediary code (like Java), it can run faster (in some cases) than C/C++ due to its "just-in-time" compilation.

I have also read that, anyway, I did not see any hard evidence. The truth is that C# (because of the Just In Time compiler) is a kind of more portable to processors with better instruction set. This, in my opinion, can be the only source of the "faster than C++". That is because one usually compiles a C++ program to some common-denominator of processors (x86 or amd64).

From another point of view, what can be written in Java/C#, can also be written in C++. On the other hand, what can be written in C++ cannot alwayw be written in Java/C# -- think about the inevitable indirection when working with objects in Java (i.e. one skip more when dereferencing).

This is about plain code. Another point of view should consider libraries. The comparison should compare also efficiency of standard libraries.

In my opinion, Java programs are sometimes OOOP examples (Object Over-Oriented Programming ;)
0
cupCommented:
Back to original problem.

In the long run, it is probably better to talk to the owner of the code and ask them if you can have a copy of the source.  You'll just waste a lot of time playing politics with this sort of thing.  One ploy is to go up as many levels of management as you can and say that you can't do the job if no source code for that build is provided.

I've been in this situation before where there is a demarcation of who owns which bit and thou shalt not touch someone else's code.  You're not touching it: just having a look.  What could possibly be wrong with that?  You could shame them: what is there to hide?  We're all working for the same company, the code isn't top secret, is it so badly written that you are too embarrassed to show it to me.   To defend themselves, they'll normally say something like "Of course not, we'll gladly show it to you if you promise that you won't change it".  Once you've gotten over that hurdle, there will not be a problem looking at code.

  Once you find the bug, don't modify it: feed it back to the team which owns the source.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C++

From novice to tech pro — start learning today.