Solved

pipes and buffering

Posted on 1997-05-20
8
1,955 Views
Last Modified: 2010-05-18
Is there a way to turn off buffering from a process that is read in as a pipe. (Other than useing $| =1 or FILE->autoflush(1) ) ?
I use a tail to monitor some files that are being appended to, and the data seems to be buffered....I am not getting the most recent data, and I need it.
So, it seems that the data gets buffered between the process that is executed, and the pipe that I am reading from. (Maybe the shell?? (BASH, I am using LINUX) )
Any insight would be appreciated.
Thanks.
0
Comment
Question by:melick
  • 4
  • 3
8 Comments
 
LVL 2

Expert Comment

by:mkornell
ID: 1204071
Which program is creating the file you are reading?  That program is likely buffering its writing, and unless you have access to the source code to fix that, you're out of luck.

using:
open (FILE, "/bin/tail -f <$log_file|") || die ("FUBAR! = $!");

should get you unbuffered input from the point where tail picks it up.  If it is buffered before being written to $log_file, you have to fix that.
0
 
LVL 5

Expert Comment

by:julio011597
ID: 1204072
I won't submit this as an answer since mkornell has already given the right one.

About the insights here are a couple:

- you can control buffering on the writing side only;
- on the reading side, you can just chose wether to read or not, and when you go reading, you just may hope there's something there waiting for you.

HTH, julio
0
 

Author Comment

by:melick
ID: 1204073
Thanks (both of you) for your help, but I don't think that I have quite gotten it yet.
First, the program that is writing to the logs is a serial logging program that simply takes data from some cyclades boxes, and puts the data into the log files, every time there is a return.  In fact, I only care about the data once it is in the log files.
The tail works fine for a single file, but doesn't seem to work for multiple files. (really weird) It seems to read off the last file without buffering, but doesn't work the same for the other files that are used as arguments to the tail.
Any ideas?
Thanks in advance,
mike.

0
 
LVL 5

Accepted Solution

by:
julio011597 earned 150 total points
ID: 1204074
This _cannot_ work!

I mean, 'tail' is intended to work on _one_ file at a time, and if you give it more then one file name, it would be as giving it the last one: this is the way tail works.

So, your problem has nothing to do with buffering; you just get the last file's lines because the other files are just ignored.

To do the job, you have to start a tail for each log file, and open a pipe for each tail.
Then set up a way to poll those pipes for incoming data:
select(2) may be a good starting point - select() may tell you if there's something to read (or write) from a stream.

So, the basic concept is:

1. open a reading pipe for each tail;
2. select() on the pipes until there's something new to read;
3. read incoming data from the pipe pointed out by select();
4. goto step 2.

If your job is someway simpler (e.g. each time a new line is appended to one log, the same happens for all the log files), you don't even need to select(); you may just:

1. open a reading pipe for each tail;
2. read blocking on the first pipe (as you probably do now);
3. first read returned? go reading on all other pipes, since there must be something new to read;
4. goto step 2.

This way is a bit more unsafe, since it based upon the belief that the process writing to the logs keeps always working good.

HTH, julio
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Author Comment

by:melick
ID: 1204075
I'm not sure that I agree (maybe I need more convincing).  According to the man page (for the GNU version of tail), more than one file may be given as arguments to the tail command.  In fact it even states that the tail output will contain headers to indicate which  file the output is from. (unless turned off with the -q or -quiet option).  In fact I use this method with multiple tails, and do receive data from all of the different files, but there is the buffering problem (or what I have labelled as a buffering problem), so that I am not receiving the data as it is coming in.

I agree with the rest of the answer, in fact I even use the select command already, so that my program is non blocking.

Any more ideas?

Thanks in advance, mike.

BTW, the first way is closer to the way I am doing the work right now...just the multiple files issue...
0
 
LVL 5

Expert Comment

by:julio011597
ID: 1204076
Right, GNU tail lets you do it (i was talking about standard tail), so my answer is meaningless.

Still i do not fully understand what you mean with "I am not receiving the data as it is coming in".

If you have this problem inside your program only - i.e. running tail's from shells would work fine -, i guess there's something wrong in your code.

If the problem occurs when you run tails from the shells too, then there could be something wrong on the writing side.

I'm afraid any further help needs a deeper description of what happens and when, and the code being shown.

Cheers, julio
0
 

Author Comment

by:melick
ID: 1204077
Ok, the data becomes important to me once it goes into the logs.
After the serial logging program is done with it. (It wouldn't matter to me if the logging program buffered the data or not.)

The data is not available to me through the pipe (which has a tail command in it) when it is put into the logs. (This statement is false if I am only looking at one file, but true if I am looking at multiple files.)

ie) this problem does not exist when I run the tails from the shells...

I'll try to give a deeper description...

file A and B are log files...so I open the logs with something like

$| = 1;
select (PIPE); $| = 1;  # Maybe wrong syntax, I don't have the
                        # code here
open (PIPE,"tail -f -n 0 A B|")
# tail forever, read 0 lines to begin, on file A and B

(In my real code I do a select to see if a read will block, but
I have tried it without the select, and the data still seems buffered.)

while ( 1 )
{
  $input = <PIPE>;
  print "$input";
}

# Just an aside...do I have to actually do an open on a
# filehandle before I can select it, and turn off the buffering?
# I just thought of that...never tried it after I opened it...

And with that, data seems to be held up somewhere if I put it
into file A, but seems to get through almost immediately if I
put it into file B.

BTW, data from file A does eventually get through (that's why I
thought buffering in the first place.)

Any new ideas?
Thanks for the help, mike
0
 
LVL 5

Expert Comment

by:julio011597
ID: 1204078
Well, i must say that i'm not a Perl expert (i usually program C), but think there are at least a couple of problems with your code:

1. you should open before selecting, otherwise your PIPE filehandle is meaningless when you select it;

2. you should select STDOUT again before printing, or specify a filehandle for printing, because print() without a filehandle prints to the last selected filehandle.

So, i'd try this:

--//--
$| = 1;

open(PIPE, "tail -f -n 0 A B |");
select(PIPE); $| = 1;
select(STDOUT);

while(1) {
  $input = <PIPE>;
  print "$input";
}
--//--

Hope this helps a bit, julio
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

On Microsoft Windows, if  when you click or type the name of a .pl file, you get an error "is not recognized as an internal or external command, operable program or batch file", then this means you do not have the .pl file extension associated with …
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now