read file line by line

can i read a file line by line and exlude lines that are greater than a certain size?(3mb?)
the file is very huge =~ 26gb
VlearnsAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dpearsonCommented:
Yes files are normally read line by line.

You can try something like this, which reads a file line by line and skips any really long lines (this is in Java since you cross posted there):

BufferedReader input = new BufferedReader(new FileReader(inputFile));

int limit = 1000*3000 ;
String line ;
while ((line = input.readLine()) != null) {
   if (line.length() > limit)
     continue ;
   // Do something with the lines you want
}

input.close() ;

Hope that helps,

Doug
Gerwin Jansen, EE MVETopic Advisor Commented:
Can you try:

awk '{ if (length < 3000000) print $0 }' file.txt > output.txt

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
tel2Commented:
Hi Vlearns,

Or if you prefer Perl, this seems to work:
    perl -ne 'print if length() < 3000000' infile.txt >outfile.txt
Acronis True Image 2019 just released!

Create a reliable backup. Make sure you always have dependable copies of your data so you can restore your entire system or individual files.

CEHJCommented:
can i read a file line by line and exlude lines that are greater than a certain size?
Depends on what you mean by 'exclude' . Ordinarily, you will be accumulating those very long lines in order for the routine to know their length, which of course includes the awk example posted. To optimise that, you can scan the whole file and index the positions of the line feeds. On a second pass, take only the offsets which are close enough for comfort and process those
IOW, on the first pass, only one character (byte?) is held in memory at one time (other than system buffers of course)
tel2Commented:
Hi CEHJ,
Ordinarily, you will be accumulating those very long lines in order for the routine to know their length, which of course includes the awk example posted.
Depends on what you mean by 'accumulating'.  Do you mean more than one of those long lines will be in memory at a time?
If so, why will they be?
Or if not, what's the problem with the awk/Perl solutions above?

Thanks.
tel2
CEHJCommented:
Do you mean more than one of those long lines will be in memory at a time?
No - but some of those lines i certainly wouldn't like sitting in my editor, even if they were alone ;)

greater than a certain size?(3mb?)
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Perl

From novice to tech pro — start learning today.