Solved

opendir after writing a file causes segmentation fault

Posted on 2006-06-29
15
772 Views
Last Modified: 2012-06-21
I have a C program that creates files in a directory. At a later time, I try to open the directory to get the file names using opendir().  When I make the call a segmentation fault occurs.  If files already exist in the directory, prior to creating any new files, opendir() works correctly.  If there are no files in the directory prior to writing files, opendir() works correctly.  It only fails when I create files in the directory, then perform opendir() on the directory.

The files are created rw-r--r-- and can be read without a problem with vi.  I've verified that I always closedir() after a successful opendir().  I've verified that I call close() after the successful creat().  No errors are being generated from any of the calls.

The routine that creates the files is performed by a detached thread, while the routine that reads the directory is the boss thread.  The debug log shows that the creating thread has completed the write and exited long before the read occurs.

Can anyone shed some light on what is causing the Segmentation fault?  My project is on hold until I can resolve this issue.
0
Comment
Question by:GWIC100
15 Comments
 
LVL 8

Expert Comment

by:manish_regmi
Comment Utility
does the segfault  occurs in your code or inside opendir (in libc).
can you put some code snippet so that we can help.

regards
Manish Regmi
0
 
LVL 27

Expert Comment

by:Nopius
Comment Utility
> I've verified that I always closedir() after a successful opendir().
Of course you have done all readdir() before closing DIR *?
Also please DOUBLE CHECK that you have not done DOUBLE closedir(), otherwise you will see coredump on free() libc function.

> The files are created rw-r--r-- and can be read without a problem with vi.
it's hardly  the reason of your problem

> The routine that creates the files is performed by a detached thread, while the routine that reads the directory is the boss thread.
Are you using "*_r" sufexed versions of functions (and also you should link with libc_r)?
Otherwise that functions are not reentrand, and your code becomes thread unsafe. Your should either use some semaphores and all work from opendir() to closedir() should be done in one thread at a time (no double  opendir() is permitted). Or use reentrant functions as I suggested before.
0
 

Author Comment

by:GWIC100
Comment Utility
As far as I can determine, there are no *_r functions for the open/read/closedir() functions on my system (RH9). To compensate, I use mutex locks to control access to directory both during the read and the write processes.

Worker Code:

strcpy(thisFile,SHM->responseQueue);
strcat(thisFile,fileName);

dbug("worker","Locking response queue",SHM->responseQueue);
while (pthread_mutex_trylock(&SHM->ResDirLock)) setTimer(SHM->lockWait);
dbug("worker","Queue Locked",SHM->responseQueue);
if((RSP = creat(thisFile,0666)) > 0) {
  dbug("worker","File openned",thisFile);
  write contents
  rc = close(RSP);
  if (rc) dbug("worker","Failed to write file", thisFile);
  else dbug("workern","File successfully written", thisFile);
}
pthread_mutex_unlock(&SHM->ResDirLock);
dbug("worker","Queue Unlocked",SHM->responseQueue);


Boss Code:

QDIR=NULL;
dbug("boss","Locking response directory",SHM->responseQueue);
while(pthread_mutex_trylock(&SHM->ResDirLock)) setTimer(SHM->lockWait);
dbug("boss","Queue Locked",SHM->responseQueue);

QDIR = opendir(SHM->responseQueue);
dbug("boss","Checking opendir()","Status");
if (QDIR) {
  dbug("boss","Queue openned",SHM->responseQueue);
  while ((QENTRY = readdir(QDIR))) {
    if (strcmp(QENTRY->d_name),".") && strcmp(QENTRY->d_name,"..")) {
      process entry
    }
  }
}
pthread_mutex_unlock(&SHM->ResDirLock);
dbug("boss","Queue Unlocked",SHM->responseQueue);

The debug log output:

***** worker() *****: Begin
  worker->Locking response queue: /tmp/sisd/response/
  worker->Queue Locked: /tmp/sisd/response/
  worker->File openned: /tmp/sisd/response/12324
  worker->File successfully written: /tmp/sisd/response/12324
  worker->Queue Unlocked: /tmp/sisd/response/
***** worker() *****: End
  .
  .
  .
***** ftpServices()*****: Begin
  ftpServices->Locking response directory: /tmp/sisd/response/
  ftpServices->Queue Locked: /tmp/sisd/response/
0
 

Author Comment

by:GWIC100
Comment Utility
Some more info -

I reconfigured to capture the core dump, then loaded it into gdb.  The segfault occurred in the malloc_consolidate() function of libc.  Presuming that I may have allocated memory but didn't deallocate it, I searched my code and found all malloc() calls followed by free() calls. Therefore, all allocated memory is freed before the worker exits.  The boss still has memory allocated in link lists that it will not free until the program terminates.
0
 
LVL 27

Expert Comment

by:Nopius
Comment Utility
strcpy(thisFile,SHM->responseQueue);
strcat(thisFile,fileName);

thisFile is large enough buffer to append to it?
It's better to use 'strncpy' 'strncat' instead.

For more details wait till monday, I'll look to your code then.
0
 

Author Comment

by:GWIC100
Comment Utility
Yes, thisFile is dimensioned char *[128].  If you look at the debug output  you'll notice that the length of its contents is less that 1/2 that.  Additionally, strncpy/cat functions would still need to have sufficient storage in thisFile to store the total length of the path.
0
 
LVL 27

Expert Comment

by:Nopius
Comment Utility
I don't see closedir() before
while(pthread_mutex_trylock(&SHM->ResDirLock)) setTimer(SHM->lockWait);

if (QDIR) {
  dbug("boss","Queue openned",SHM->responseQueue);
  while ((QENTRY = readdir(QDIR))) {
    if (strcmp(QENTRY->d_name),".") && strcmp(QENTRY->d_name,"..")) {
      process entry
    }
  }
  // *** HERE ***
  closedir(QDIR);
}
pthread_mutex_unlock(&SHM->ResDirLock);
0
What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

 

Author Comment

by:GWIC100
Comment Utility
It was there, I just didn't type it in.  Should read:

while(pthread_mutex_trylock(&SHM->ResDirLock)) setTimer(SHM->lockWait);

if (QDIR) {
  dbug("boss","Queue openned",SHM->responseQueue);
  while ((QENTRY = readdir(QDIR))) {
    if (strcmp(QENTRY->d_name),".") && strcmp(QENTRY->d_name,"..")) {
      process entry
    }
  }
  closedir(QDIR);
}
pthread_mutex_unlock(&SHM->ResDirLock);
0
 
LVL 27

Expert Comment

by:Nopius
Comment Utility
Nice code. Seems to be workable. The only question, I have, do you use 'exec' or 'system' finctions inside boss code in 'process entry' part?

According to documentation:
A successful call to any of the exec functions will close any directory streams  that are open in the calling process.  See exec(2).
0
 

Author Comment

by:GWIC100
Comment Utility
No.  I read the directory and store its contents in a structured link list,  then close the directory so I'm working from the list.  This allows me to track the status of the worker processing the entry.  When the worker has completed its write, it locks the list,updates the status of the entry to tell the boss he's ready for more work, then unlocks the list.  The boss is the only one that can add or delete entries to the list.  Each time status is changed on the list or entries are added or deleted, I lock it with a mutex so the parties don't step on each other.

From the log, it appears that everything works correctly until the worker creates a file in the responseQueue.  Even though it locks the directory for the write, then releases the lock after, it seems as though it is somehow still associated with the resource after it exits, causing a corruption of shared memory with the boss.  I just don't know how to identify what is happening.  DDD doesn't help when threads switch, so I'm basicly blind beyond the debug statements that I've inserted.  If I just had a clue as to where I should be looking, I might be able to salvage this code.  If I can't locate it by EOB today, I'm switching to a different methodology to get the project up.
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> Yes, thisFile is dimensioned char *[128]
I hope you didn't mean :

char* thisFile[128];

???
0
 

Author Comment

by:GWIC100
Comment Utility
No, I realized after I hit enter that I had mistyped the definition.  It should be  char[128].
0
 

Author Comment

by:GWIC100
Comment Utility
After extensive debugging, it appears that using pthreads in commination with malloc/calloc-ing memory from both the Boss and Work thread is incompatible.  I remove the entire process that scans the directory from the code and submit only the file name to the child process to handle.  File scanning was handled in a separate process. When I tried to ma/calloc memory in either the child or the parent thread for any reason, I still got spurious core dumps on libc consolidate_malloc().  I don't have time to really identify the root cause and have moved on to a new solution in a different language.

This question will be closed.
0
 

Accepted Solution

by:
ee_ai_construct earned 0 total points
Comment Utility
PAQ / Refund
ee ai construct, community support moderator
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Have you thought about creating an iPhone application (app), but didn't even know where to get started? Here's how: ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ Important pre-programming comments: I’ve never tri…
An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.
Video by: Grant
The goal of this video is to provide viewers with basic examples to understand and use for-loops in the C programming language.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now