Link to home
Start Free TrialLog in
Avatar of GWIC100
GWIC100Flag for United States of America

asked on

opendir after writing a file causes segmentation fault

I have a C program that creates files in a directory. At a later time, I try to open the directory to get the file names using opendir().  When I make the call a segmentation fault occurs.  If files already exist in the directory, prior to creating any new files, opendir() works correctly.  If there are no files in the directory prior to writing files, opendir() works correctly.  It only fails when I create files in the directory, then perform opendir() on the directory.

The files are created rw-r--r-- and can be read without a problem with vi.  I've verified that I always closedir() after a successful opendir().  I've verified that I call close() after the successful creat().  No errors are being generated from any of the calls.

The routine that creates the files is performed by a detached thread, while the routine that reads the directory is the boss thread.  The debug log shows that the creating thread has completed the write and exited long before the read occurs.

Can anyone shed some light on what is causing the Segmentation fault?  My project is on hold until I can resolve this issue.
Avatar of manish_regmi
manish_regmi

does the segfault  occurs in your code or inside opendir (in libc).
can you put some code snippet so that we can help.

regards
Manish Regmi
Avatar of Arty K
> I've verified that I always closedir() after a successful opendir().
Of course you have done all readdir() before closing DIR *?
Also please DOUBLE CHECK that you have not done DOUBLE closedir(), otherwise you will see coredump on free() libc function.

> The files are created rw-r--r-- and can be read without a problem with vi.
it's hardly  the reason of your problem

> The routine that creates the files is performed by a detached thread, while the routine that reads the directory is the boss thread.
Are you using "*_r" sufexed versions of functions (and also you should link with libc_r)?
Otherwise that functions are not reentrand, and your code becomes thread unsafe. Your should either use some semaphores and all work from opendir() to closedir() should be done in one thread at a time (no double  opendir() is permitted). Or use reentrant functions as I suggested before.
Avatar of GWIC100

ASKER

As far as I can determine, there are no *_r functions for the open/read/closedir() functions on my system (RH9). To compensate, I use mutex locks to control access to directory both during the read and the write processes.

Worker Code:

strcpy(thisFile,SHM->responseQueue);
strcat(thisFile,fileName);

dbug("worker","Locking response queue",SHM->responseQueue);
while (pthread_mutex_trylock(&SHM->ResDirLock)) setTimer(SHM->lockWait);
dbug("worker","Queue Locked",SHM->responseQueue);
if((RSP = creat(thisFile,0666)) > 0) {
  dbug("worker","File openned",thisFile);
  write contents
  rc = close(RSP);
  if (rc) dbug("worker","Failed to write file", thisFile);
  else dbug("workern","File successfully written", thisFile);
}
pthread_mutex_unlock(&SHM->ResDirLock);
dbug("worker","Queue Unlocked",SHM->responseQueue);


Boss Code:

QDIR=NULL;
dbug("boss","Locking response directory",SHM->responseQueue);
while(pthread_mutex_trylock(&SHM->ResDirLock)) setTimer(SHM->lockWait);
dbug("boss","Queue Locked",SHM->responseQueue);

QDIR = opendir(SHM->responseQueue);
dbug("boss","Checking opendir()","Status");
if (QDIR) {
  dbug("boss","Queue openned",SHM->responseQueue);
  while ((QENTRY = readdir(QDIR))) {
    if (strcmp(QENTRY->d_name),".") && strcmp(QENTRY->d_name,"..")) {
      process entry
    }
  }
}
pthread_mutex_unlock(&SHM->ResDirLock);
dbug("boss","Queue Unlocked",SHM->responseQueue);

The debug log output:

***** worker() *****: Begin
  worker->Locking response queue: /tmp/sisd/response/
  worker->Queue Locked: /tmp/sisd/response/
  worker->File openned: /tmp/sisd/response/12324
  worker->File successfully written: /tmp/sisd/response/12324
  worker->Queue Unlocked: /tmp/sisd/response/
***** worker() *****: End
  .
  .
  .
***** ftpServices()*****: Begin
  ftpServices->Locking response directory: /tmp/sisd/response/
  ftpServices->Queue Locked: /tmp/sisd/response/
Avatar of GWIC100

ASKER

Some more info -

I reconfigured to capture the core dump, then loaded it into gdb.  The segfault occurred in the malloc_consolidate() function of libc.  Presuming that I may have allocated memory but didn't deallocate it, I searched my code and found all malloc() calls followed by free() calls. Therefore, all allocated memory is freed before the worker exits.  The boss still has memory allocated in link lists that it will not free until the program terminates.
strcpy(thisFile,SHM->responseQueue);
strcat(thisFile,fileName);

thisFile is large enough buffer to append to it?
It's better to use 'strncpy' 'strncat' instead.

For more details wait till monday, I'll look to your code then.
Avatar of GWIC100

ASKER

Yes, thisFile is dimensioned char *[128].  If you look at the debug output  you'll notice that the length of its contents is less that 1/2 that.  Additionally, strncpy/cat functions would still need to have sufficient storage in thisFile to store the total length of the path.
I don't see closedir() before
while(pthread_mutex_trylock(&SHM->ResDirLock)) setTimer(SHM->lockWait);

if (QDIR) {
  dbug("boss","Queue openned",SHM->responseQueue);
  while ((QENTRY = readdir(QDIR))) {
    if (strcmp(QENTRY->d_name),".") && strcmp(QENTRY->d_name,"..")) {
      process entry
    }
  }
  // *** HERE ***
  closedir(QDIR);
}
pthread_mutex_unlock(&SHM->ResDirLock);
Avatar of GWIC100

ASKER

It was there, I just didn't type it in.  Should read:

while(pthread_mutex_trylock(&SHM->ResDirLock)) setTimer(SHM->lockWait);

if (QDIR) {
  dbug("boss","Queue openned",SHM->responseQueue);
  while ((QENTRY = readdir(QDIR))) {
    if (strcmp(QENTRY->d_name),".") && strcmp(QENTRY->d_name,"..")) {
      process entry
    }
  }
  closedir(QDIR);
}
pthread_mutex_unlock(&SHM->ResDirLock);
Nice code. Seems to be workable. The only question, I have, do you use 'exec' or 'system' finctions inside boss code in 'process entry' part?

According to documentation:
A successful call to any of the exec functions will close any directory streams  that are open in the calling process.  See exec(2).
Avatar of GWIC100

ASKER

No.  I read the directory and store its contents in a structured link list,  then close the directory so I'm working from the list.  This allows me to track the status of the worker processing the entry.  When the worker has completed its write, it locks the list,updates the status of the entry to tell the boss he's ready for more work, then unlocks the list.  The boss is the only one that can add or delete entries to the list.  Each time status is changed on the list or entries are added or deleted, I lock it with a mutex so the parties don't step on each other.

From the log, it appears that everything works correctly until the worker creates a file in the responseQueue.  Even though it locks the directory for the write, then releases the lock after, it seems as though it is somehow still associated with the resource after it exits, causing a corruption of shared memory with the boss.  I just don't know how to identify what is happening.  DDD doesn't help when threads switch, so I'm basicly blind beyond the debug statements that I've inserted.  If I just had a clue as to where I should be looking, I might be able to salvage this code.  If I can't locate it by EOB today, I'm switching to a different methodology to get the project up.
>> Yes, thisFile is dimensioned char *[128]
I hope you didn't mean :

char* thisFile[128];

???
Avatar of GWIC100

ASKER

No, I realized after I hit enter that I had mistyped the definition.  It should be  char[128].
Avatar of GWIC100

ASKER

After extensive debugging, it appears that using pthreads in commination with malloc/calloc-ing memory from both the Boss and Work thread is incompatible.  I remove the entire process that scans the directory from the code and submit only the file name to the child process to handle.  File scanning was handled in a separate process. When I tried to ma/calloc memory in either the child or the parent thread for any reason, I still got spurious core dumps on libc consolidate_malloc().  I don't have time to really identify the root cause and have moved on to a new solution in a different language.

This question will be closed.
ASKER CERTIFIED SOLUTION
Avatar of ee_ai_construct
ee_ai_construct
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial