Need help on Unix script

Hi ,

I need to design an interface , where we process a set of input files and then load them into the database tables . After loading , we need to wait until we get next input file from other interface , until then we should not proceed with our task. Need help how this can be achieved.

Process flow
Step 1 ;  Load the input file from \usr\interface\input\dat1.csv , dat2.csv , dat3.csv , dat4.csv
Step 2: After loading wait for the input file from other interface \usr\interface\remote\input\list.csv
Step 3: If file is found or ftpied in the remote folder in step 3 , load the list.csv in the database tables.
Step 4: Compare the data loaded from step1 and step 4.

I need help on the step2 , how to interrupt the unix process to wait for an infinite time until a  file is ftpied in the remote directory.

Any help or guidance is really appreciated.
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

1) Use the wait command to suspend process for x seconds, check for file, if present process. otherwise wait again in loop.
2) Schedule the process with crontab to execute every x minutes, when process starts,  check for file, if present process. otherwise exit.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
One thing to look out for: when checking for the existence of a file arriving in a directory, there's a potential race condition between when the file is created and when its contents are completely available. There are various ways to avoid the problem.

1) If the file is structured so that its integrity can be independently verified, say by a checksum field appearing at the end or other internal signifiers, then your processing task can just test for this and go back to waiting if the file is not intact.

2) Some FTP servers allow you to trigger an action when the transfer completes. One simple action that could be triggered is a name change of the transferred file or a link created in the same or another directory -- these would be reliable indicators your program could use to know when it is safe to start processing a file.

3) If files are transferred in sequence, you can send a tiny sentinel file after the main file transfer is complete. The existence of the sentinel file signifies that the contents of the main file are complete while the contents of the sentinel file don't matter.
If you can read the size of the remote file, one option for checking completeness is to read the size every few seconds - while it is changing, keep waiting - once it settles to a single value, you can assume it is complete.

This assumes that the file is being written as quickly as the system can (e.g. it is being copied from one place to another).  If it is being generated dynamically by another process, and that process can pause for a while before it generates more output, that checking system will not work so well.
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Hi Sam,

I assume the file will be "in use" until the FTP finishes, so you could probably use a command like "fuser" or "lsof" to confirm when the file is no longer being FTP'd, and is therefore ready for processing.
One of the reasons I avoid suggesting things like checking whether the file is busy or growing or stuff like that is that these methods won't distinguish between successful file transfers and ones that failed.
Fair point, jmcg.

If your options 2 or 3 are available to sam (e.g. if he has the ability to change the way things are FTP'd), then great, but if not, then "file in use" testing could be useful, but to avoid the issue you just raised, he could use your option 1 afterwards.  But using your option 1 alone could result in:
a) An unnecessary amount of resource used for testing, especially if the FTP'd file is huge.  (But I guess this depends on how the checksum is implemented.)
b) Waiting forever if an FTP completes, but the file has a bad checksum for some reason.
Sam did mention waiting an infinite time as a possibility, didn't he?

People designing protocols for data returning from a satellite have to think about how to make that process robust when there's no feasible way for the recipient to say, "huh? I think I missed that. Can you send it again?" Someone receiving data with no capability of affecting the sender is in much the same position. Bad, fragile, uncorrectable data - what can you do about it? I think it's a bad idea to act on it as if it were correct data. Doing nothing at all or raising an error condition are better.


Sam hasn't been back here to comment on anything we've said. In looking at the question, I almost wonder if we should have been pointing him to the sleep(1) manpage. Perhaps not to sleep for an infinite time, but to periodically check to see whether the awaited event has occurred.
HI jmcg,

> "Sam did mention waiting an infinite time as a possibility, didn't he?"
Yes, but that was "to wait for an infinite time until a file is ftpied", not to wait for an infinite time after the FTP had terminated.  As I mentioned, using your option 1 alone would result in the script waiting forever even if an FTP completes (including dying part way through), where the file has a bad checksum for whatever reason (which would be the case if the FTP died part way through).

> "I think it's a bad idea to act on it as if it were correct data. Doing nothing at all or raising an error condition are better."
True, which is why I've suggested that "'file in use' testing could be useful, but to avoid the issue you just raised, he could use your option 1 afterwards."

Forgive me if I've misunderstood what you're saying.
sam_2012Author Commented:
Thanks a lot for the inputs . I feel the check for the correctness of the file , use the checksum of the file in remote as well as in local. If they are same , then do the processing. Your inputs on this. Here again , I assume that the file is completely transferred .
sam_2012Author Commented:
The checks mentioned by tel2 can be done on the file after the wait time , i mean , after the wait time check for the file , if present do the checks mentioned by tel2.  Will that be fine.
If you have control over both ends of the ftp, and can control the format of the file being sent, then a simple check would be to include a file size and checksum value in the first chunk of the file. Reading those would be pretty cheap and there's no point spending processing power on the checksumming if the file size is not right. Perhaps I should not have mentioned a checksum being at the end of the got us off track from the idea of a file structured so that its integrity could be determined.

If the file does not progress or arrives with an incorrect checksum, are you able to tell the sender to try again? If the process is periodic in nature anyway, and a bad transfer can be ignored in favor of a later one, you only need to raise an alarm after some number of transfers have failed to arrive correctly.

If we were talking about programs rather than shell scripts, we could bring up the idea of a network socket connection between the two ends. That has its own issues to consider, but would get rid of the FTP middleman. If you were writing a high-frequency trading app, for instance, you might want the receiver to start acting on arriving data before the end data shows up.

If your script is going to access the other end of the connection to do a checksum, it should first be doing a size comparison, as tel2 originally suggested. The size comparison lets you know when the transfer is presumably complete and the checksum gives the additional assurance that the file has arrived intact -- but it will be extremely rare that the checksums differ.
Hi jmcg,

> "If your script is going to access the other end of the connection to do a checksum..."

Why would it need to access the other end?  I thought checksums could be done without needing to see any other data.  Just perform the checksum algorithm on the file, and make sure the result equals the checksum value in the file.  Am I wrong?

> " should first be doing a size comparison, as tel2 originally suggested."

That was simon who suggested that.  (But I'm willing to take full credit for it.)
I suggested the "file in use" checking, because I thought it would be simpler.  What is wrong with doing a "file in use" check, followed by a checksum check after the file is no longer in use, as I have suggested, jmcg?
I was just going by what Sam was asking
use the checksum of the file in remote as well as in local

Communication is hard.

We're running a shell script. Getting the size of a file locally is on the order of one system call. Checking file-in-use requires a lot more work, but still local. Checking the file size at the remote location is even more involved, while calculating a checksum on the file locally may be expensive enough that you only want to do it if there's a good chance it will be a match. Running a remote checksum command at the other end is even more elaborate.

But it's all down in the noise in these days where we think nothing of using Google to look up a phone number, define a word, or watch Mother's Day doodles.
sam_2012Author Commented:
Thank a you guys . It was a good discussion , we have agreed that when we check for the file existence , it should be the completly transferred file and if any error we will use the send mail utility to send the intimation for the ops team to send it again.
sam_2012Author Commented:
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Unix OS

From novice to tech pro — start learning today.