Avatar of pvinodp
pvinodp asked on

search a pattern in a large file using java

Following is my code in a particular function.
 
     Scanner sc = new Scanner(new File("/exports/nos_issues/9518/aaa_.log"));
      String str="10:37:10.719 [net.jradius.freeradius.FreeRadiusProcessor(29)] DEBUG net.jradius.log.Log4JRadiusLogger - >>> packets";
      Pattern ptr = Pattern.compile(str);
      long toto=0;
      try{
         while(sc.findWithinHorizon(ptr ,0000) != null)
            toto++;
      }finally{
         sc.close();
      }
     
      System.out.println(toto);
     

I get the following error :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
      at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:39)
      at java.nio.CharBuffer.allocate(CharBuffer.java:312)
      at java.util.Scanner.makeSpace(Scanner.java:816)
      at java.util.Scanner.readInput(Scanner.java:771)
      at java.util.Scanner.findWithinHorizon(Scanner.java:1659)
      at Cl.main(Cl.java:38)

in the class Cl the line number 38 points to sc.close();
JavaSearch Engine Optimization (SEO)

Avatar of undefined
Last Comment
CEHJ

8/22/2022 - Mon
CEHJ

This is confusing - you're looking for what is clearly a timestamped logfile entry. So first of all, why would there be any more than one occurrence?
ASKER
pvinodp

the string pattern is just another example... it can be any string.
CEHJ

OK. The file is a text file containing lines?
Your help has saved me hundreds of hours of internet surfing.
fblack61
ASKER
pvinodp

yes.
What i am trying to do is . divide the file into segments and then search each segment.

I aware of the risk of cutting my pattern while dividing it into segemtns.
CEHJ

OK. The file is a text file containing lines?
yes.
In that case, the file makes sense in terms of lines, so i'm wondering why you're trying to span multiple lines ..?
ASKER
pvinodp

I want to create a job to search each segment. and in future allocate job to be executed on separate system.
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
CEHJ

Segments don't actually have anything to do with a pattern spanning multiple lines. You can think of a segment of a text file as simply a file with fewer lines. The problem is the same (even if the problem space is smaller)
ASKER
pvinodp

but i intend not to do line by line search..
I want to divide the whole file into many portions and then search
CEHJ

I want to divide the whole file into many portions and then search
How?
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
CEHJ

Take the following file. Divide it equally into two and have each worker thread search each segment for the phrase 'THE CAT SAT ON THE MAT'. Neither will find it

jSx62KM9qB20iMmM1WdlKsM0tKuRbCwEmsyLcZPJ4dSvOHmDovpdoqxe21RpbafryY1
PBga3epRxM3v25usrWwirvVHiNZ28PnguDEFuZo4bKavq2R0T64vi4hnPIUUXMKagtnyKSNUQNyx
URwwKDGcTjfKtcTzw7uHaWSmmZeen51ZuOzmzNe76LnNumzLCkfyIOm9GYA0VsbGD47zkzoku033HSzCtrHHrs0XDBaOHQWL7hXKLfLLLqlpIGYH0kDctda3 lH2XAYfg0J
b THE CAT SAT ON THE MAT
EDGdSmXv18OTJEyeqPFXBCW7ATHsl66SGaFNNYgC5UvtSDPPr4KwDNRYVQDzWsCkzuPKuQ
5y7URHNjEO8eZ1siQUraAvpdZF1WM
ScBL4zBKQwZsrXD
IU
ie7xzxW14C6hV9olbQLHvuO7ZOU3Iva3fa0JqW9UCuK1fpRMeRBCUrQEffbnXDwMP
hzrrG8TJ

Open in new window

ASKER
pvinodp

I think that is because the string is divided across the two parts.
ASKER
pvinodp

I dint understand  your question..
HOW?
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
CEHJ


I think that is because the string is divided across the two parts.
 
Yes, so

a. how are you going to divide it?
b. are you going to be looking for patterns such as "MAT.*EDGdSmXv18OT"? And if so - why?
ASKER
pvinodp

in my case the pattern is going to appear many times.. and the I just need the number of occurences..
And as it is a log file from a hardware , the number can be huge .
I might get a file of size 1 gb . and if i make a 5 segments of them , i am at a risk of loosing 10 occurences.
But in my case the count can go to 100+ in a single segment.
CEHJ

You still haven't told me if you're looking for patterns that span lines, and if so, why
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
ASKER
pvinodp

my pattern is a single word and it cannot span lines.
ASKER CERTIFIED SOLUTION
CEHJ

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
ASKER
pvinodp

i think line by line gives some improvement over using scan.
ASKER
pvinodp

Thanks for your input
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
CEHJ

:)