read file line by line

can i read a file line by line and exlude lines that are greater than a certain size?(3mb?)
the file is very huge =~ 26gb
VlearnsAsked:
Who is Participating?
 
Gerwin Jansen, EE MVETopic Advisor Commented:
Can you try:

awk '{ if (length < 3000000) print $0 }' file.txt > output.txt
0
 
dpearsonCommented:
Yes files are normally read line by line.

You can try something like this, which reads a file line by line and skips any really long lines (this is in Java since you cross posted there):

BufferedReader input = new BufferedReader(new FileReader(inputFile));

int limit = 1000*3000 ;
String line ;
while ((line = input.readLine()) != null) {
   if (line.length() > limit)
     continue ;
   // Do something with the lines you want
}

input.close() ;

Hope that helps,

Doug
0
 
tel2Commented:
Hi Vlearns,

Or if you prefer Perl, this seems to work:
    perl -ne 'print if length() < 3000000' infile.txt >outfile.txt
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
CEHJCommented:
can i read a file line by line and exlude lines that are greater than a certain size?
Depends on what you mean by 'exclude' . Ordinarily, you will be accumulating those very long lines in order for the routine to know their length, which of course includes the awk example posted. To optimise that, you can scan the whole file and index the positions of the line feeds. On a second pass, take only the offsets which are close enough for comfort and process those
IOW, on the first pass, only one character (byte?) is held in memory at one time (other than system buffers of course)
0
 
tel2Commented:
Hi CEHJ,
Ordinarily, you will be accumulating those very long lines in order for the routine to know their length, which of course includes the awk example posted.
Depends on what you mean by 'accumulating'.  Do you mean more than one of those long lines will be in memory at a time?
If so, why will they be?
Or if not, what's the problem with the awk/Perl solutions above?

Thanks.
tel2
0
 
CEHJCommented:
Do you mean more than one of those long lines will be in memory at a time?
No - but some of those lines i certainly wouldn't like sitting in my editor, even if they were alone ;)

greater than a certain size?(3mb?)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.