Solved

atomic locking over NFS with link(2),stat(2)

Posted on 1998-09-07
12
747 Views
Last Modified: 2013-12-26
Because of the potential for the loss of RPC reply packets on NFS, the O_EXCL option of open does not work.  For this reason, one can/should open (O_CREAT|O_EXCL) a [temporary] file, link(2) this to the name of the file we really wanted to open [discarding the return value of the link(2) call], and then stat the newer name (the name we really wanted to create).  If the hard link count is 2, the process has worked.  Otherwise the process has failed.

What I'm not so clear on it what the backout strategy for failure is that allows us to retry later.  If the strategy fails and the link count is >2, presumably one should unlink() the new name.   Is this right?  But what if the link count is only 1?  Do I unlink anything then?   What actions do I take if the stat(2) fails?   What are the pros/cons of using lstat/fstat here instead of stat?  Which of these is correct?

(This lockfile strategy doesn't have to be compatible with any MTA or anything; I just need to atomically create a lockfile whose name is fixed).
0
Comment
Question by:JYoungman
  • 5
  • 4
  • 2
  • +1
12 Comments
 
LVL 3

Expert Comment

by:elfie
Comment Utility
As far as i know over NFS, advisory locks are recommended.

Your can also try setting the file access bits to 6664. For  normal files the (setuid and setgid) have their impact on the locking mechanism.


0
 
LVL 3

Expert Comment

by:elfie
Comment Utility
From the man pages on HPUX:

man lockf(2)...

Only advisory record locking is implemented for NFS files.


0
 
LVL 2

Author Comment

by:JYoungman
Comment Utility
Advisiory record locks are completely inappropriate for this form of locking.  The lockfile is just that -- a lockfile, whose existence signals the "locked" state of some other file.  Hence record locks on the lockfile are not useful at all.   In addition, not all systems provide even advisory locks over NFS.  Setting the setuid bit to request mandatory locking adds less still.

If my question was unclear, I apologise (please say so if it was!).

0
 
LVL 2

Expert Comment

by:seedy
Comment Utility
The man pages for open on HPUX and Solaris does not state any restriction in using the O_EXCL flag!  AIX 4.1 states:
The O_EXCL flag is not fully supported for Network File Systems (NFS). The NFS protocol does not guarantee the designed function of the O_EXCL flag.

I assume you are sure of the support for O_EXCL flag on your UNIX(which is it any way?).

I am not clear of your logic.  I assume you want to create the [temporary] file in the local file system; and link to this a new file on the NFS file system;  But hard link is not allowed across file systems, right?
0
 
LVL 2

Author Comment

by:JYoungman
Comment Utility
There is no "your UNIX".  The idea is to write *portable* programs.  The program I work on works on at lease twelve different varieties of Unix and at least one OS that isn't Unix at all.

As for O_EXCL, it is NOT POSSIBLE to make it work correctly and reliably over NFS because NFS is nonstateful and has no RPC call for "open" at all; the create RPC call still isn't enough since
the O_EXCL flag can't be passed on to the server.

Hence the link(2) scheme I outline above.

0
 
LVL 5

Expert Comment

by:ecw
Comment Utility
Why discard the return value of link?  Surely if link fails, you can be certain you haven't got the lock.  Similarly, if you can't lstat the new file, you should assume you haven't got a lock.  If lstat suceeds though, check the thing isn't a symlink, and check the link count.  If it is a symlink, or the link isn't 2, you haven't got the lock, and must assume something else is messing with your locking mechanism.

I'm probably missing something, my knowledge of low-level NFS is zero, though as an alternative locking mechanism, at least on systems that support symlinks, how about using symlink(lock_file, lock_file) to grab a lock?
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 2

Author Comment

by:JYoungman
Comment Utility
It's imperative to discard the return value of link() because this is being done over NFS.   One can get an RPC failure and an error value returned from link(), even if the actual filesystem operation on the remote server succeeded.  The RPC reply packet(s) may get lost or dropped, for example, leading to a failure report when in fact the hard link did get made.  That is why the check is made with stat(2) afterward -- if the link count is 2, the link(2) operation must have succeeded on the server even if the reply had been lost.  If the stat(2) RPC fails, then the link may or may not have worked...you can't tell.

The thing with symlink() is that it's no more likely to succeed than link(), and you can't check the result by looking at the link count on the target.

0
 
LVL 5

Expert Comment

by:ecw
Comment Utility
Ok. now I sort of understand a bit more.  I think it's unsafe to assume anything about the lock file iff the link count isn't two, and you can't ensure that it is linked to the original file (maybe stat/lstat the tmp file and the lock and compare st_dev and st_ino might do this).  If other processes are trying to create the lock using the same mechanism, and all you're doing is  checking the link count, there are race points which I can't see a way of overcoming.
So how about encoding a lock key within a symlink such as host:pid:time, and then readlink(2) it back, if it matches you got the lock.
0
 
LVL 2

Author Comment

by:JYoungman
Comment Utility
Now we're rolling :-)
System call return values occur after the = sign.

Host A                            Host B
open("A:100", O_CREAT|O_EXCL)=0  
                                   open(B:200,O_CREAT|O_EXCL)=0
                                   symlink("B:100", "z.foo")=-1
                                   (RCP reply lost but operation
                                   succeeded on server)
symlink("A:100", "z.foo")=-1
(EEXIST)

So at this point both parties will do a readlink(2) on "z.foo" and determine that it actually points to B:100.  A knows that wasn't its file and so determines that it has failed to acquire the lock.

OK, I understand how that works.  It seems to me that the same approach with hard links has all the same advantages as well as working on Unix systems which don't support symbolic links.
Another problem is that the hostname of machine A may itself be longer than the filename length limit of the NFS server (or for the case of locking a local file, be longer than its own filename length limit).   Hence I think I'll use a hard link.

Now that we're on the same page, what should lockers do if they determine that the readlink() or stat() call has returned the wrong pathname/failed (as appropriate for symbolic and hard links respectively)?  What is the correct backout strategy?

0
 
LVL 5

Expert Comment

by:ecw
Comment Utility
There is no need to create any temporary files.  The lock-key is stored directly in the symlink, eg.
 sprintf(lock_key, "%s:%-d.%ld", this_host, getpid(), time(0));
 symlink(lock_key, lock_file);
 if ((len = readlink(lock_file, buf, sizeof(buf))) != -1) {
  buf[len] = '\0;
  if (! strcmp(lock_key, buf)) {
   /* lock obtained, do the stuff */
   unlink(lock_file);
  }
 }
So, this doesn't hit old 14char filename limits.

As to what happens if stat/readlink fails, I'm unsure.  I must stress NFS is not my gig, but I would be surprised if the kernel doesn't attempt to re-satisfy a "read" operation if it doesn't get a response within a certain time. And, I assume that the call will block until the kernel does get a response or a critical amount of time has passed.  I remember from way back being throughly hacked off when anyone shutdown any NFS server I had mounted filesystems from, because there was a godd chance my machine would hang until the NFS box was back to life.

The problem I have with link mechanism, is the link count will not indicate who obtained the lock ie.
 process A                       process B
 link(tmp_fileA, lock_file)
                                      link(tmp_fileB, lock_file)  /* this link fails */
                                      stat(lock_file, &buf)         /* this suceeds */
                                      buf.st_nlink == 2
stat(lock_file, &buf) /* this also suceeds */
buf.st_nlink == 2

So who has the lock?  If you resort to comparing the inode of tmp_file against the inode of lock_file, you'll get closer to resolving the problem, but 32bit rather than 16bit inode numbers are a recent innovation.
0
 
LVL 2

Author Comment

by:JYoungman
Comment Utility
You'd stat your *own* temporary file, not the lockfile itself.
But this doesn't solve the problem of a clash of temporary file names; I'm not sure that's solvable.   Well, you've earned the points.

0
 
LVL 5

Accepted Solution

by:
ecw earned 100 total points
Comment Utility
Well I don't think I have earned the points, it's an interesting question, which I'd like to get to the bottom of.  I'm only answering so you can close the question.  I'd be very interested if you come up with a perffect solution, if I come across an alternative, I'll let you know.

Ed.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Introduction: Load and Save to file, Document-View interaction inside the SDI. Continuing from the second article about sudoku.   Open the project in visual studio. From the class view select CSudokuDoc and double click to open the header …
Introduction: Displaying information on the statusbar.   Continuing from the third article about sudoku.   Open the project in visual studio. Status bar – let’s display the timestamp there.  We need to get the timestamp from the document s…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now