atomic locking over NFS with link(2),stat(2)

Posted on 1998-09-07
Last Modified: 2013-12-26
Because of the potential for the loss of RPC reply packets on NFS, the O_EXCL option of open does not work.  For this reason, one can/should open (O_CREAT|O_EXCL) a [temporary] file, link(2) this to the name of the file we really wanted to open [discarding the return value of the link(2) call], and then stat the newer name (the name we really wanted to create).  If the hard link count is 2, the process has worked.  Otherwise the process has failed.

What I'm not so clear on it what the backout strategy for failure is that allows us to retry later.  If the strategy fails and the link count is >2, presumably one should unlink() the new name.   Is this right?  But what if the link count is only 1?  Do I unlink anything then?   What actions do I take if the stat(2) fails?   What are the pros/cons of using lstat/fstat here instead of stat?  Which of these is correct?

(This lockfile strategy doesn't have to be compatible with any MTA or anything; I just need to atomically create a lockfile whose name is fixed).
Question by:JYoungman
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 2
  • +1

Expert Comment

ID: 1293103
As far as i know over NFS, advisory locks are recommended.

Your can also try setting the file access bits to 6664. For  normal files the (setuid and setgid) have their impact on the locking mechanism.


Expert Comment

ID: 1293104
From the man pages on HPUX:

man lockf(2)...

Only advisory record locking is implemented for NFS files.


Author Comment

ID: 1293105
Advisiory record locks are completely inappropriate for this form of locking.  The lockfile is just that -- a lockfile, whose existence signals the "locked" state of some other file.  Hence record locks on the lockfile are not useful at all.   In addition, not all systems provide even advisory locks over NFS.  Setting the setuid bit to request mandatory locking adds less still.

If my question was unclear, I apologise (please say so if it was!).

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!


Expert Comment

ID: 1293106
The man pages for open on HPUX and Solaris does not state any restriction in using the O_EXCL flag!  AIX 4.1 states:
The O_EXCL flag is not fully supported for Network File Systems (NFS). The NFS protocol does not guarantee the designed function of the O_EXCL flag.

I assume you are sure of the support for O_EXCL flag on your UNIX(which is it any way?).

I am not clear of your logic.  I assume you want to create the [temporary] file in the local file system; and link to this a new file on the NFS file system;  But hard link is not allowed across file systems, right?

Author Comment

ID: 1293107
There is no "your UNIX".  The idea is to write *portable* programs.  The program I work on works on at lease twelve different varieties of Unix and at least one OS that isn't Unix at all.

As for O_EXCL, it is NOT POSSIBLE to make it work correctly and reliably over NFS because NFS is nonstateful and has no RPC call for "open" at all; the create RPC call still isn't enough since
the O_EXCL flag can't be passed on to the server.

Hence the link(2) scheme I outline above.


Expert Comment

ID: 1293108
Why discard the return value of link?  Surely if link fails, you can be certain you haven't got the lock.  Similarly, if you can't lstat the new file, you should assume you haven't got a lock.  If lstat suceeds though, check the thing isn't a symlink, and check the link count.  If it is a symlink, or the link isn't 2, you haven't got the lock, and must assume something else is messing with your locking mechanism.

I'm probably missing something, my knowledge of low-level NFS is zero, though as an alternative locking mechanism, at least on systems that support symlinks, how about using symlink(lock_file, lock_file) to grab a lock?

Author Comment

ID: 1293109
It's imperative to discard the return value of link() because this is being done over NFS.   One can get an RPC failure and an error value returned from link(), even if the actual filesystem operation on the remote server succeeded.  The RPC reply packet(s) may get lost or dropped, for example, leading to a failure report when in fact the hard link did get made.  That is why the check is made with stat(2) afterward -- if the link count is 2, the link(2) operation must have succeeded on the server even if the reply had been lost.  If the stat(2) RPC fails, then the link may or may not have can't tell.

The thing with symlink() is that it's no more likely to succeed than link(), and you can't check the result by looking at the link count on the target.


Expert Comment

ID: 1293110
Ok. now I sort of understand a bit more.  I think it's unsafe to assume anything about the lock file iff the link count isn't two, and you can't ensure that it is linked to the original file (maybe stat/lstat the tmp file and the lock and compare st_dev and st_ino might do this).  If other processes are trying to create the lock using the same mechanism, and all you're doing is  checking the link count, there are race points which I can't see a way of overcoming.
So how about encoding a lock key within a symlink such as host:pid:time, and then readlink(2) it back, if it matches you got the lock.

Author Comment

ID: 1293111
Now we're rolling :-)
System call return values occur after the = sign.

Host A                            Host B
open("A:100", O_CREAT|O_EXCL)=0  
                                   symlink("B:100", "")=-1
                                   (RCP reply lost but operation
                                   succeeded on server)
symlink("A:100", "")=-1

So at this point both parties will do a readlink(2) on "" and determine that it actually points to B:100.  A knows that wasn't its file and so determines that it has failed to acquire the lock.

OK, I understand how that works.  It seems to me that the same approach with hard links has all the same advantages as well as working on Unix systems which don't support symbolic links.
Another problem is that the hostname of machine A may itself be longer than the filename length limit of the NFS server (or for the case of locking a local file, be longer than its own filename length limit).   Hence I think I'll use a hard link.

Now that we're on the same page, what should lockers do if they determine that the readlink() or stat() call has returned the wrong pathname/failed (as appropriate for symbolic and hard links respectively)?  What is the correct backout strategy?


Expert Comment

ID: 1293112
There is no need to create any temporary files.  The lock-key is stored directly in the symlink, eg.
 sprintf(lock_key, "%s:%-d.%ld", this_host, getpid(), time(0));
 symlink(lock_key, lock_file);
 if ((len = readlink(lock_file, buf, sizeof(buf))) != -1) {
  buf[len] = '\0;
  if (! strcmp(lock_key, buf)) {
   /* lock obtained, do the stuff */
So, this doesn't hit old 14char filename limits.

As to what happens if stat/readlink fails, I'm unsure.  I must stress NFS is not my gig, but I would be surprised if the kernel doesn't attempt to re-satisfy a "read" operation if it doesn't get a response within a certain time. And, I assume that the call will block until the kernel does get a response or a critical amount of time has passed.  I remember from way back being throughly hacked off when anyone shutdown any NFS server I had mounted filesystems from, because there was a godd chance my machine would hang until the NFS box was back to life.

The problem I have with link mechanism, is the link count will not indicate who obtained the lock ie.
 process A                       process B
 link(tmp_fileA, lock_file)
                                      link(tmp_fileB, lock_file)  /* this link fails */
                                      stat(lock_file, &buf)         /* this suceeds */
                                      buf.st_nlink == 2
stat(lock_file, &buf) /* this also suceeds */
buf.st_nlink == 2

So who has the lock?  If you resort to comparing the inode of tmp_file against the inode of lock_file, you'll get closer to resolving the problem, but 32bit rather than 16bit inode numbers are a recent innovation.

Author Comment

ID: 1293113
You'd stat your *own* temporary file, not the lockfile itself.
But this doesn't solve the problem of a clash of temporary file names; I'm not sure that's solvable.   Well, you've earned the points.


Accepted Solution

ecw earned 100 total points
ID: 1293114
Well I don't think I have earned the points, it's an interesting question, which I'd like to get to the bottom of.  I'm only answering so you can close the question.  I'd be very interested if you come up with a perffect solution, if I come across an alternative, I'll let you know.


Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Unix / Linux grid computing 5 171
fizzArray2 challenge 1 110
Sed question 2 139
Windows shortcut location resolution on double click open 4 74
This is to be the first in a series of articles demonstrating the development of a complete windows based application using the MFC classes.  I’ll try to keep each article focused on one (or a couple) of the tasks that one may meet.   Introductio…
Introduction: Dialogs (1) modal - maintaining the database. Continuing from the ninth article about sudoku.   You might have heard of modal and modeless dialogs.  Here with this Sudoku application will we use one of each type: a modal dialog …
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
In an interesting question ( here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner…
Suggested Courses

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question