Solved

problem with readdir and lstat on Red Hat 7.2

Posted on 2002-04-05
10
434 Views
Last Modified: 2012-05-04
I am wondering if anyone has had a similar probelm, and if what I think is going wrong makes sense.

One of the programs I maintain searches a given directory for symbolic links to files that I need to process.  I have a function that scans the directory using 'readdir' and I then do a 'lstat' to get some necessary information about the link.  I was not having a problem until I moved to Red Hat 7.2 with kernal 2.4.9-31 (had been using 6.2, kernal 2.2).


Here is some sample code:

.
.
.
    DIR *dfd;
    struct dirent *dp;
    struct stat fileStats;

    /* 'sDir' is the name of the directory */
    if ( (dfd = opendir(sDir)) == NULL )
        return -1;

    for ( dp = readdir(dfd); dp != NULL; dp = readdir(dfd) ) {

.
.
.
    /* The file to be checked is put into 'sFile' */
        if ( lstat(sFile, &fileStats) != 0 ) {
            /*
              Now, every once in a while, lstat fails,
              and errno is set to ENOENT.
            */

/* end sample code */

So, every once in a while, lstat fails and the error is that there is no such file/directory.

I have never had this problem before, but this is what I think is happening:  There is another program that runs in paralel to mine that is placing the link in the directory I am scanning.  It seems that there may be a synchronization problem with the OS giving me the name of a link before the OS has created the inode.  I was wondering if calling sync on the link would help, but I haven't tried it yet (that would be in the code of the other program).

Has anyone had a similar problem?  Could my explanation be a possibility?

Thank you,

Marc
0
Comment
Question by:marcjb
  • 5
  • 2
  • 2
  • +1
10 Comments
 
LVL 4

Expert Comment

by:MFCRich
Comment Utility
This suggests to me that sometime between the 'readdir' call and the 'lstat' call the directory entry is removed. I doubt sync calls would affect this.

Also I believe that readdir may buffer several entries on one call and then return pointers to the buffered entries on subsequent calls.
0
 
LVL 3

Author Comment

by:marcjb
Comment Utility
I should have added that if I run my program later on, it will find these files, so it does not seem that the file is deleted between the readdir and lstat.

0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
is this on a NFS mounted directory?
0
 
LVL 3

Author Comment

by:marcjb
Comment Utility
no.  ext 3
0
 
LVL 51

Expert Comment

by:ahoffmann
Comment Utility
can you write a timestamp when each process acesses the directory?
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 3

Author Comment

by:marcjb
Comment Utility
maybe.  the other program is not under my domain.  also, I have a feeling that if this is a sync problem, there is a good chance that this is all happening in less than a second.
0
 
LVL 5

Accepted Solution

by:
bryanh earned 100 total points
Comment Utility
I don't have an answer, but I can rule out your theory and maybe give you some insight.  

First of all, Linux has a tradition of using the same word to mean many things, and "inode" is an example.  It means 1) a block in a filesystem that represents a file, and 2) a block in memory that represents an image of a file.  It's the same word because a long, long, time ago in Unix, (2) was nothing but a cached copy of (1).

So I believe you're suggesting that the directory entry gets created before the inode(1) gets created.  Inodes(1) always exist.  They may not always contain useful information, but it's impossible not to find one, given an inode number.  A directory entry consists of a name and an inode number.

But a bug in the creation of inodes(2) could cause this behavior.  There's a copy of some of the directory structure cached in memory.  It's composed of dentries.  Each dentry points to an inode(2).

When you create a filesystem object (e.g. a symlink), you first look up the name to see if it exists already.  If it doesn't, you create a "negative dentry," which is an entry in the directory cache signifying that a directory entry by that names does _not_ exist.  A negative dentry doesn't point to an inode(2), obviously.  Then you proceed to create the symlink, and a directory entry for it, and an inode(2) for it and make the dentry point to the inode(2), so it is no longer negative.

So it's conceivable that another process could see the directory entry (readdir reads the actual directory -- not the directory cache), but stat (which does use the directory cache) would find the still-negative dentry and declare the filesystem object non-existent.

Linux is supposed to stop this from happening by holding a directory lock while looking up the name, finding it unused, creating the filesystem object, and reflecting its existence in the cache.  But maybe it's failing to do that.


I don't know of any operation in Linux that fits the description "call sync on the link," but I don't think any kind of syncing to disk would affect anything if the problem is in the directory cache.

The only workaround I could suggest is adding your own locking between the guy creating the symlink and the guy looking for it -- after the symlink() system call returns, I bet the readdir() and stat() results are consistent.
0
 
LVL 3

Author Comment

by:marcjb
Comment Utility
Thank you for your insight, bryanh.  Your evaluation is similar to a theory that one of my co-workers had, and leads me to believe that this is most likely the issue.

In reference to the sync I mentioned, I was refering to the POSIX function 'fsync'.  fsync copies all in-core parts of a file to disk, and while it makes no mention of directories or directory entries, I thought it might be worth a shot.

As it is, the overhead of doing the locking myself just isn't worth it.  This is a very intermitant problem that (I believe) is only likely to occur in our testing, and not under the conditions that our system is normally run under.

Thank you again for your help,

Marc
0
 
LVL 5

Expert Comment

by:bryanh
Comment Utility
>fsync copies all in-core parts of a file to disk

Indeed, but there's no equivalent for symbolic links.  You'll find that fsync() takes a file descriptor as its argument.  Since you can't open() a symbolic link, you can't get a file descriptor so as to use fsync() on it.
0
 
LVL 3

Author Comment

by:marcjb
Comment Utility
you can use 'open' on a symbolic link as long as 'O_NOFOLLOW' is not specified.  Also, you can use 'fileno' (non-standard) to get a file descriptor.  I have used fsync on links before without a problem, but this may be one of the cases where the behavior is technically undefined, but they implement it to work anyway.
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now