Link to home
Start Free TrialLog in
Avatar of Vlearns
Vlearns

asked on

read file line by line

can i read a file line by line and exlude lines that are greater than a certain size?(3mb?)
the file is very huge =~ 26gb
Avatar of dpearson
dpearson

Yes files are normally read line by line.

You can try something like this, which reads a file line by line and skips any really long lines (this is in Java since you cross posted there):

BufferedReader input = new BufferedReader(new FileReader(inputFile));

int limit = 1000*3000 ;
String line ;
while ((line = input.readLine()) != null) {
   if (line.length() > limit)
     continue ;
   // Do something with the lines you want
}

input.close() ;

Hope that helps,

Doug
ASKER CERTIFIED SOLUTION
Avatar of Gerwin Jansen
Gerwin Jansen
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
can i read a file line by line and exlude lines that are greater than a certain size?
Depends on what you mean by 'exclude' . Ordinarily, you will be accumulating those very long lines in order for the routine to know their length, which of course includes the awk example posted. To optimise that, you can scan the whole file and index the positions of the line feeds. On a second pass, take only the offsets which are close enough for comfort and process those
IOW, on the first pass, only one character (byte?) is held in memory at one time (other than system buffers of course)
Hi CEHJ,
Ordinarily, you will be accumulating those very long lines in order for the routine to know their length, which of course includes the awk example posted.
Depends on what you mean by 'accumulating'.  Do you mean more than one of those long lines will be in memory at a time?
If so, why will they be?
Or if not, what's the problem with the awk/Perl solutions above?

Thanks.
tel2
Do you mean more than one of those long lines will be in memory at a time?
No - but some of those lines i certainly wouldn't like sitting in my editor, even if they were alone ;)

greater than a certain size?(3mb?)