Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Process restarting using the core file

Posted on 2003-03-30
6
Medium Priority
?
231 Views
Last Modified: 2010-04-21
Hi,

I had a process which dumped core, not because of any "fault" (segmentation fault, illegal instruction etc.) but because I "told" it to do so by sending the SIGQUIT signal.

Now, I'm trying to "restart" the process from the state it was in when the core was dumped.

I try to do this as follows:
1) There is a process (say P) which reads in register values, memory dumps etc. from the core file.
2) P does a fork(), the child does a PTRACE_TRACEME before exec()ing the original executable
3) P then stops the child at every instruction (PTRACE_SINGLESTEP) and waits for the child to reach the address of main() [P checks the eip register of the child after every instruction].
4) Now that the child is stopped at main(), P modifies the address space of the child with values that were there in the core file. This includes the stack (which is also present in the core file)
5) P also sets the registers of the child to the values as found in the core file
6) P does a PTRACE_DETACH

In theory I believe this should have been sufficient to get the process to restart in the state it was in when core was dumped.

Unfortunately, when I try this the child executes many instructions but eventually encounters a segmentation fault and dumps core (this time it wasn't intentional!!). It seems that this fault is encountered the first time the child executes an instruction in its "own" memory area (address 0x804ba84). This address seems to be invalid.

Does anyone have any ideas about possible flaws in the method stated above? Or what could be causing the child to go to an invalid instruction??

I searched for information on restarting from the core but most people seem to suggest that you can't restart using the core file, just use the core to examine faults etc. There is a Solaris utility called undump() but that seems to restart from main() and thus only restores values of global variables.

A basic question is that considering that the core file DOES have the stack, WHAT prevents us from being able to restart the process from the exact point it was when the core was dumped?

I apologize for any vagueness in the description above, but any help will be greatly appreciated.

Sincerely,

-- Asim

0
Comment
Question by:asim_shankar
  • 3
  • 2
6 Comments
 
LVL 51

Expert Comment

by:ahoffmann
ID: 8237176
did you try to do it with dbx?
0
 

Author Comment

by:asim_shankar
ID: 8237237
Unfortunately, I'm not very familiar with dbx. Is it possible to continue a process checkpointed by the core using dbx?? If so, could you please tell me how.

However, I'm trying to do this (continue a process checkpointed by the core file) programatically. So if it can be done by dbx, I'd like to know HOW dbx does it.
0
 

Author Comment

by:asim_shankar
ID: 8237317
Oh, I'd like to clarify that I understand that there are issues with the "meaning" of restarting a process (what happens to open files/sockets etc.) - many of which are nicely put in http://sources.redhat.com/ml/gdb/2001-12/msg00130.html and are probably reasons why debuggers cannot restart a process from the state in the core file.

However, I'm looking to restart only simple compute processes. A process like:

main()
{
   int i;
   while(1)
      printf("%d\n",i++);
}

IMHO, I should have sufficient information in the core file and with the approach outlined above should be able to make it. Opinions?
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 5

Expert Comment

by:bryanh
ID: 8263139
I think even a program that simple has lots of state that you are not restoring (and Linux doesn't give you a way to restore).  The GNU C library is fairly complex.

And maybe you just have a bug in your code that reloads registers.  Maybe you're not restoring one, or restoring one that shouldn't be restored.  Segment registers or such.

This shouldn't be too hard to debug.  Unless you're doing something weird, address 0x804ba84 ought to contain instructions.  In the dump of the 2nd crash, is there a 0x804ba84?  Sounds like an address space problem to me -- your program exists in 0x804ba84 in one address space, and you are trying to branch to 0x804ba84 in another one.
0
 

Author Comment

by:asim_shankar
ID: 8264108
Well, I was able to debug my program, and it was quite a stupid mistake. So the state of things now is that I CAN restart a process (like the baove mentioned above) without any trouble. So I guess there IS enough information in the core file to restart a process.

However, there is one problem with the method outlined above. I stop the child at main(), so any memory pages that were allocated during the original execution aren't part of the address space now. The core file contains the values of these addresses but since they are not part of the child's address space yet, I can't write to them.

Is there a way to add pages to the child's address space from the parent?
0
 
LVL 5

Accepted Solution

by:
bryanh earned 750 total points
ID: 8270849
I think what we know now is that there is enough information in the core file to restart a certain kind of process that failed in a certain way.  I still think the scope of processes/failures where this works is quite narrow.

>Is there a way to add pages to the child's address space
>from the parent?

I rather doubt it.  That would be a ptrace kind of thing, and of course ptrace doesn't have that function.
0

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Have you ever been frustrated by having to click seven times in order to retrieve a small bit of information from the web, always the same seven clicks, scrolling down and down until you reach your target? When you know the benefits of the command l…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
This lesson discusses how to use a Mainform + Subforms in Microsoft Access to find and enter data for payments on orders. The sample data comes from a custom shop that builds and sells movable storage structures that are delivered to your property. …
The Relationships Diagram is a good way to get an overall view of what a database is keeping track of. It is also where relationships are defined. A relationship specifies how two tables connect to each other. As you build tables in Microsoft Ac…
Suggested Courses

577 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question