asked on

problem with readdir and lstat on Red Hat 7.2

I am wondering if anyone has had a similar probelm, and if what I think is going wrong makes sense.

One of the programs I maintain searches a given directory for symbolic links to files that I need to process. I have a function that scans the directory using 'readdir' and I then do a 'lstat' to get some necessary information about the link. I was not having a problem until I moved to Red Hat 7.2 with kernal 2.4.9-31 (had been using 6.2, kernal 2.2).

Here is some sample code:

.
.
.
DIR *dfd;
struct dirent *dp;
struct stat fileStats;

/* 'sDir' is the name of the directory */
if ( (dfd = opendir(sDir)) == NULL )
return -1;

for ( dp = readdir(dfd); dp != NULL; dp = readdir(dfd) ) {

.
.
.
/* The file to be checked is put into 'sFile' */
if ( lstat(sFile, &fileStats) != 0 ) {
/*
Now, every once in a while, lstat fails,
and errno is set to ENOENT.
*/

/* end sample code */

So, every once in a while, lstat fails and the error is that there is no such file/directory.

I have never had this problem before, but this is what I think is happening: There is another program that runs in paralel to mine that is placing the link in the directory I am scanning. It seems that there may be a synchronization problem with the OS giving me the name of a link before the OS has created the inode. I was wondering if calling sync on the link would help, but I haven't tried it yet (that would be in the code of the other program).

Has anyone had a similar problem? Could my explanation be a possibility?

Thank you,

Marc

MFCRich

This suggests to me that sometime between the 'readdir' call and the 'lstat' call the directory entry is removed. I doubt sync calls would affect this.

Also I believe that readdir may buffer several entries on one call and then return pointers to the buffered entries on subsequent calls.

marcjb

ASKER

I should have added that if I run my program later on, it will find these files, so it does not seem that the file is deleted between the readdir and lstat.

ahoffmann

is this on a NFS mounted directory?

marcjb

ASKER

no. ext 3

ahoffmann

can you write a timestamp when each process acesses the directory?

marcjb

ASKER

maybe. the other program is not under my domain. also, I have a feeling that if this is a sync problem, there is a good chance that this is all happening in less than a second.

ASKER CERTIFIED SOLUTION

bryanh

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

marcjb

ASKER

Thank you for your insight, bryanh. Your evaluation is similar to a theory that one of my co-workers had, and leads me to believe that this is most likely the issue.

In reference to the sync I mentioned, I was refering to the POSIX function 'fsync'. fsync copies all in-core parts of a file to disk, and while it makes no mention of directories or directory entries, I thought it might be worth a shot.

As it is, the overhead of doing the locking myself just isn't worth it. This is a very intermitant problem that (I believe) is only likely to occur in our testing, and not under the conditions that our system is normally run under.

Thank you again for your help,

Marc

bryanh

>fsync copies all in-core parts of a file to disk

Indeed, but there's no equivalent for symbolic links. You'll find that fsync() takes a file descriptor as its argument. Since you can't open() a symbolic link, you can't get a file descriptor so as to use fsync() on it.

marcjb

ASKER

you can use 'open' on a symbolic link as long as 'O_NOFOLLOW' is not specified. Also, you can use 'fileno' (non-standard) to get a file descriptor. I have used fsync on links before without a problem, but this may be one of the cases where the behavior is technically undefined, but they implement it to work anyway.