Solved

atomic locking over NFS with link(2),stat(2)

Posted on 1998-09-07
12
806 Views
Last Modified: 2013-12-26
Because of the potential for the loss of RPC reply packets on NFS, the O_EXCL option of open does not work.  For this reason, one can/should open (O_CREAT|O_EXCL) a [temporary] file, link(2) this to the name of the file we really wanted to open [discarding the return value of the link(2) call], and then stat the newer name (the name we really wanted to create).  If the hard link count is 2, the process has worked.  Otherwise the process has failed.

What I'm not so clear on it what the backout strategy for failure is that allows us to retry later.  If the strategy fails and the link count is >2, presumably one should unlink() the new name.   Is this right?  But what if the link count is only 1?  Do I unlink anything then?   What actions do I take if the stat(2) fails?   What are the pros/cons of using lstat/fstat here instead of stat?  Which of these is correct?

(This lockfile strategy doesn't have to be compatible with any MTA or anything; I just need to atomically create a lockfile whose name is fixed).
0
Comment
Question by:JYoungman
  • 5
  • 4
  • 2
  • +1
12 Comments
 
LVL 3

Expert Comment

by:elfie
ID: 1293103
As far as i know over NFS, advisory locks are recommended.

Your can also try setting the file access bits to 6664. For  normal files the (setuid and setgid) have their impact on the locking mechanism.


0
 
LVL 3

Expert Comment

by:elfie
ID: 1293104
From the man pages on HPUX:

man lockf(2)...

Only advisory record locking is implemented for NFS files.


0
 
LVL 2

Author Comment

by:JYoungman
ID: 1293105
Advisiory record locks are completely inappropriate for this form of locking.  The lockfile is just that -- a lockfile, whose existence signals the "locked" state of some other file.  Hence record locks on the lockfile are not useful at all.   In addition, not all systems provide even advisory locks over NFS.  Setting the setuid bit to request mandatory locking adds less still.

If my question was unclear, I apologise (please say so if it was!).

0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
LVL 2

Expert Comment

by:seedy
ID: 1293106
The man pages for open on HPUX and Solaris does not state any restriction in using the O_EXCL flag!  AIX 4.1 states:
The O_EXCL flag is not fully supported for Network File Systems (NFS). The NFS protocol does not guarantee the designed function of the O_EXCL flag.

I assume you are sure of the support for O_EXCL flag on your UNIX(which is it any way?).

I am not clear of your logic.  I assume you want to create the [temporary] file in the local file system; and link to this a new file on the NFS file system;  But hard link is not allowed across file systems, right?
0
 
LVL 2

Author Comment

by:JYoungman
ID: 1293107
There is no "your UNIX".  The idea is to write *portable* programs.  The program I work on works on at lease twelve different varieties of Unix and at least one OS that isn't Unix at all.

As for O_EXCL, it is NOT POSSIBLE to make it work correctly and reliably over NFS because NFS is nonstateful and has no RPC call for "open" at all; the create RPC call still isn't enough since
the O_EXCL flag can't be passed on to the server.

Hence the link(2) scheme I outline above.

0
 
LVL 5

Expert Comment

by:ecw
ID: 1293108
Why discard the return value of link?  Surely if link fails, you can be certain you haven't got the lock.  Similarly, if you can't lstat the new file, you should assume you haven't got a lock.  If lstat suceeds though, check the thing isn't a symlink, and check the link count.  If it is a symlink, or the link isn't 2, you haven't got the lock, and must assume something else is messing with your locking mechanism.

I'm probably missing something, my knowledge of low-level NFS is zero, though as an alternative locking mechanism, at least on systems that support symlinks, how about using symlink(lock_file, lock_file) to grab a lock?
0
 
LVL 2

Author Comment

by:JYoungman
ID: 1293109
It's imperative to discard the return value of link() because this is being done over NFS.   One can get an RPC failure and an error value returned from link(), even if the actual filesystem operation on the remote server succeeded.  The RPC reply packet(s) may get lost or dropped, for example, leading to a failure report when in fact the hard link did get made.  That is why the check is made with stat(2) afterward -- if the link count is 2, the link(2) operation must have succeeded on the server even if the reply had been lost.  If the stat(2) RPC fails, then the link may or may not have worked...you can't tell.

The thing with symlink() is that it's no more likely to succeed than link(), and you can't check the result by looking at the link count on the target.

0
 
LVL 5

Expert Comment

by:ecw
ID: 1293110
Ok. now I sort of understand a bit more.  I think it's unsafe to assume anything about the lock file iff the link count isn't two, and you can't ensure that it is linked to the original file (maybe stat/lstat the tmp file and the lock and compare st_dev and st_ino might do this).  If other processes are trying to create the lock using the same mechanism, and all you're doing is  checking the link count, there are race points which I can't see a way of overcoming.
So how about encoding a lock key within a symlink such as host:pid:time, and then readlink(2) it back, if it matches you got the lock.
0
 
LVL 2

Author Comment

by:JYoungman
ID: 1293111
Now we're rolling :-)
System call return values occur after the = sign.

Host A                            Host B
open("A:100", O_CREAT|O_EXCL)=0  
                                   open(B:200,O_CREAT|O_EXCL)=0
                                   symlink("B:100", "z.foo")=-1
                                   (RCP reply lost but operation
                                   succeeded on server)
symlink("A:100", "z.foo")=-1
(EEXIST)

So at this point both parties will do a readlink(2) on "z.foo" and determine that it actually points to B:100.  A knows that wasn't its file and so determines that it has failed to acquire the lock.

OK, I understand how that works.  It seems to me that the same approach with hard links has all the same advantages as well as working on Unix systems which don't support symbolic links.
Another problem is that the hostname of machine A may itself be longer than the filename length limit of the NFS server (or for the case of locking a local file, be longer than its own filename length limit).   Hence I think I'll use a hard link.

Now that we're on the same page, what should lockers do if they determine that the readlink() or stat() call has returned the wrong pathname/failed (as appropriate for symbolic and hard links respectively)?  What is the correct backout strategy?

0
 
LVL 5

Expert Comment

by:ecw
ID: 1293112
There is no need to create any temporary files.  The lock-key is stored directly in the symlink, eg.
 sprintf(lock_key, "%s:%-d.%ld", this_host, getpid(), time(0));
 symlink(lock_key, lock_file);
 if ((len = readlink(lock_file, buf, sizeof(buf))) != -1) {
  buf[len] = '\0;
  if (! strcmp(lock_key, buf)) {
   /* lock obtained, do the stuff */
   unlink(lock_file);
  }
 }
So, this doesn't hit old 14char filename limits.

As to what happens if stat/readlink fails, I'm unsure.  I must stress NFS is not my gig, but I would be surprised if the kernel doesn't attempt to re-satisfy a "read" operation if it doesn't get a response within a certain time. And, I assume that the call will block until the kernel does get a response or a critical amount of time has passed.  I remember from way back being throughly hacked off when anyone shutdown any NFS server I had mounted filesystems from, because there was a godd chance my machine would hang until the NFS box was back to life.

The problem I have with link mechanism, is the link count will not indicate who obtained the lock ie.
 process A                       process B
 link(tmp_fileA, lock_file)
                                      link(tmp_fileB, lock_file)  /* this link fails */
                                      stat(lock_file, &buf)         /* this suceeds */
                                      buf.st_nlink == 2
stat(lock_file, &buf) /* this also suceeds */
buf.st_nlink == 2

So who has the lock?  If you resort to comparing the inode of tmp_file against the inode of lock_file, you'll get closer to resolving the problem, but 32bit rather than 16bit inode numbers are a recent innovation.
0
 
LVL 2

Author Comment

by:JYoungman
ID: 1293113
You'd stat your *own* temporary file, not the lockfile itself.
But this doesn't solve the problem of a clash of temporary file names; I'm not sure that's solvable.   Well, you've earned the points.

0
 
LVL 5

Accepted Solution

by:
ecw earned 100 total points
ID: 1293114
Well I don't think I have earned the points, it's an interesting question, which I'd like to get to the bottom of.  I'm only answering so you can close the question.  I'd be very interested if you come up with a perffect solution, if I come across an alternative, I'll let you know.

Ed.
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Need a Shell script to start a service checking the port 6 52
WinWaitActive parameters 12 31
modThree challenge 4 97
Message not shown 5 67
In this article, I'll describe -- and show pictures of -- some of the significant additions that have been made available to programmers in the MFC Feature Pack for Visual C++ 2008.  These same feature are in the MFC libraries that come with Visual …
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
Established in 1997, Technology Architects has become one of the most reputable technology solutions companies in the country. TA have been providing businesses with cost effective state-of-the-art solutions and unparalleled service that is designed…

861 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question