Vlearns
asked on
read file line by line
can i read a file line by line and exlude lines that are greater than a certain size?(3mb?)
the file is very huge =~ 26gb
the file is very huge =~ 26gb
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
can i read a file line by line and exlude lines that are greater than a certain size?Depends on what you mean by 'exclude' . Ordinarily, you will be accumulating those very long lines in order for the routine to know their length, which of course includes the awk example posted. To optimise that, you can scan the whole file and index the positions of the line feeds. On a second pass, take only the offsets which are close enough for comfort and process those
IOW, on the first pass, only one character (byte?) is held in memory at one time (other than system buffers of course)
Hi CEHJ,
If so, why will they be?
Or if not, what's the problem with the awk/Perl solutions above?
Thanks.
tel2
Ordinarily, you will be accumulating those very long lines in order for the routine to know their length, which of course includes the awk example posted.Depends on what you mean by 'accumulating'. Do you mean more than one of those long lines will be in memory at a time?
If so, why will they be?
Or if not, what's the problem with the awk/Perl solutions above?
Thanks.
tel2
Do you mean more than one of those long lines will be in memory at a time?No - but some of those lines i certainly wouldn't like sitting in my editor, even if they were alone ;)
greater than a certain size?(3mb?)
You can try something like this, which reads a file line by line and skips any really long lines (this is in Java since you cross posted there):
BufferedReader input = new BufferedReader(new FileReader(inputFile));
int limit = 1000*3000 ;
String line ;
while ((line = input.readLine()) != null) {
if (line.length() > limit)
continue ;
// Do something with the lines you want
}
input.close() ;
Hope that helps,
Doug