Solved

problem with readdir and lstat on Red Hat 7.2

Posted on 2002-04-05
10
462 Views
Last Modified: 2012-05-04
I am wondering if anyone has had a similar probelm, and if what I think is going wrong makes sense.

One of the programs I maintain searches a given directory for symbolic links to files that I need to process.  I have a function that scans the directory using 'readdir' and I then do a 'lstat' to get some necessary information about the link.  I was not having a problem until I moved to Red Hat 7.2 with kernal 2.4.9-31 (had been using 6.2, kernal 2.2).


Here is some sample code:

.
.
.
    DIR *dfd;
    struct dirent *dp;
    struct stat fileStats;

    /* 'sDir' is the name of the directory */
    if ( (dfd = opendir(sDir)) == NULL )
        return -1;

    for ( dp = readdir(dfd); dp != NULL; dp = readdir(dfd) ) {

.
.
.
    /* The file to be checked is put into 'sFile' */
        if ( lstat(sFile, &fileStats) != 0 ) {
            /*
              Now, every once in a while, lstat fails,
              and errno is set to ENOENT.
            */

/* end sample code */

So, every once in a while, lstat fails and the error is that there is no such file/directory.

I have never had this problem before, but this is what I think is happening:  There is another program that runs in paralel to mine that is placing the link in the directory I am scanning.  It seems that there may be a synchronization problem with the OS giving me the name of a link before the OS has created the inode.  I was wondering if calling sync on the link would help, but I haven't tried it yet (that would be in the code of the other program).

Has anyone had a similar problem?  Could my explanation be a possibility?

Thank you,

Marc
0
Comment
Question by:marcjb
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 2
  • 2
  • +1
10 Comments
 
LVL 4

Expert Comment

by:MFCRich
ID: 6921418
This suggests to me that sometime between the 'readdir' call and the 'lstat' call the directory entry is removed. I doubt sync calls would affect this.

Also I believe that readdir may buffer several entries on one call and then return pointers to the buffered entries on subsequent calls.
0
 
LVL 3

Author Comment

by:marcjb
ID: 6921535
I should have added that if I run my program later on, it will find these files, so it does not seem that the file is deleted between the readdir and lstat.

0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6921747
is this on a NFS mounted directory?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 3

Author Comment

by:marcjb
ID: 6921778
no.  ext 3
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 6921793
can you write a timestamp when each process acesses the directory?
0
 
LVL 3

Author Comment

by:marcjb
ID: 6921805
maybe.  the other program is not under my domain.  also, I have a feeling that if this is a sync problem, there is a good chance that this is all happening in less than a second.
0
 
LVL 5

Accepted Solution

by:
bryanh earned 100 total points
ID: 6935837
I don't have an answer, but I can rule out your theory and maybe give you some insight.  

First of all, Linux has a tradition of using the same word to mean many things, and "inode" is an example.  It means 1) a block in a filesystem that represents a file, and 2) a block in memory that represents an image of a file.  It's the same word because a long, long, time ago in Unix, (2) was nothing but a cached copy of (1).

So I believe you're suggesting that the directory entry gets created before the inode(1) gets created.  Inodes(1) always exist.  They may not always contain useful information, but it's impossible not to find one, given an inode number.  A directory entry consists of a name and an inode number.

But a bug in the creation of inodes(2) could cause this behavior.  There's a copy of some of the directory structure cached in memory.  It's composed of dentries.  Each dentry points to an inode(2).

When you create a filesystem object (e.g. a symlink), you first look up the name to see if it exists already.  If it doesn't, you create a "negative dentry," which is an entry in the directory cache signifying that a directory entry by that names does _not_ exist.  A negative dentry doesn't point to an inode(2), obviously.  Then you proceed to create the symlink, and a directory entry for it, and an inode(2) for it and make the dentry point to the inode(2), so it is no longer negative.

So it's conceivable that another process could see the directory entry (readdir reads the actual directory -- not the directory cache), but stat (which does use the directory cache) would find the still-negative dentry and declare the filesystem object non-existent.

Linux is supposed to stop this from happening by holding a directory lock while looking up the name, finding it unused, creating the filesystem object, and reflecting its existence in the cache.  But maybe it's failing to do that.


I don't know of any operation in Linux that fits the description "call sync on the link," but I don't think any kind of syncing to disk would affect anything if the problem is in the directory cache.

The only workaround I could suggest is adding your own locking between the guy creating the symlink and the guy looking for it -- after the symlink() system call returns, I bet the readdir() and stat() results are consistent.
0
 
LVL 3

Author Comment

by:marcjb
ID: 6936638
Thank you for your insight, bryanh.  Your evaluation is similar to a theory that one of my co-workers had, and leads me to believe that this is most likely the issue.

In reference to the sync I mentioned, I was refering to the POSIX function 'fsync'.  fsync copies all in-core parts of a file to disk, and while it makes no mention of directories or directory entries, I thought it might be worth a shot.

As it is, the overhead of doing the locking myself just isn't worth it.  This is a very intermitant problem that (I believe) is only likely to occur in our testing, and not under the conditions that our system is normally run under.

Thank you again for your help,

Marc
0
 
LVL 5

Expert Comment

by:bryanh
ID: 6938906
>fsync copies all in-core parts of a file to disk

Indeed, but there's no equivalent for symbolic links.  You'll find that fsync() takes a file descriptor as its argument.  Since you can't open() a symbolic link, you can't get a file descriptor so as to use fsync() on it.
0
 
LVL 3

Author Comment

by:marcjb
ID: 6941862
you can use 'open' on a symbolic link as long as 'O_NOFOLLOW' is not specified.  Also, you can use 'fileno' (non-standard) to get a file descriptor.  I have used fsync on links before without a problem, but this may be one of the cases where the behavior is technically undefined, but they implement it to work anyway.
0

Featured Post

Secure Your Active Directory - April 20, 2017

Active Directory plays a critical role in your company’s IT infrastructure and keeping it secure in today’s hacker-infested world is a must.
Microsoft published 300+ pages of guidance, but who has the time, money, and resources to implement? Register now to find an easier way.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Linux , how can I inspect a tar GZ file 3 156
Trying to install Node.js on Linux instance. Erroring.... 2 80
Openwrt vnstat 9 226
awk file 6 102
Have you ever been frustrated by having to click seven times in order to retrieve a small bit of information from the web, always the same seven clicks, scrolling down and down until you reach your target? When you know the benefits of the command l…
The purpose of this article is to fix the unknown display problem in Linux Mint operating system. After installing the OS if you see Display monitor is not recognized then we can install "MESA" utilities to fix this problem or we can install additio…
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…
Finding and deleting duplicate (picture) files can be a time consuming task. My wife and I, our three kids and their families all share one dilemma: Managing our pictures. Between desktops, laptops, phones, tablets, and cameras; over the last decade…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question