asked on

read file line by line

can i read a file line by line and exlude lines that are greater than a certain size?(3mb?)
the file is very huge =~ 26gb

dpearson

Yes files are normally read line by line.

You can try something like this, which reads a file line by line and skips any really long lines (this is in Java since you cross posted there):

BufferedReader input = new BufferedReader(new FileReader(inputFile));

int limit = 1000*3000 ;
String line ;
while ((line = input.readLine()) != null) {
if (line.length() > limit)
continue ;
// Do something with the lines you want
}

input.close() ;

Hope that helps,

Doug

ASKER CERTIFIED SOLUTION

Gerwin Jansen

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

tel2

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

CEHJ

can i read a file line by line and exlude lines that are greater than a certain size?

Depends on what you mean by 'exclude' . Ordinarily, you will be accumulating those very long lines in order for the routine to know their length, which of course includes the awk example posted. To optimise that, you can scan the whole file and index the positions of the line feeds. On a second pass, take only the offsets which are close enough for comfort and process those
IOW, on the first pass, only one character (byte?) is held in memory at one time (other than system buffers of course)

tel2

Hi CEHJ,

Ordinarily, you will be accumulating those very long lines in order for the routine to know their length, which of course includes the awk example posted.

Depends on what you mean by 'accumulating'. Do you mean more than one of those long lines will be in memory at a time?
If so, why will they be?
Or if not, what's the problem with the awk/Perl solutions above?

Thanks.
tel2

CEHJ

Do you mean more than one of those long lines will be in memory at a time?

No - but some of those lines i certainly wouldn't like sitting in my editor, even if they were alone ;)

greater than a certain size?(3mb?)