Avatar of pvinodp
pvinodp

asked on 

search a pattern in a large file using java

Following is my code in a particular function.
 
     Scanner sc = new Scanner(new File("/exports/nos_issues/9518/aaa_.log"));
      String str="10:37:10.719 [net.jradius.freeradius.FreeRadiusProcessor(29)] DEBUG net.jradius.log.Log4JRadiusLogger - >>> packets";
      Pattern ptr = Pattern.compile(str);
      long toto=0;
      try{
         while(sc.findWithinHorizon(ptr ,0000) != null)
            toto++;
      }finally{
         sc.close();
      }
     
      System.out.println(toto);
     

I get the following error :

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
      at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:39)
      at java.nio.CharBuffer.allocate(CharBuffer.java:312)
      at java.util.Scanner.makeSpace(Scanner.java:816)
      at java.util.Scanner.readInput(Scanner.java:771)
      at java.util.Scanner.findWithinHorizon(Scanner.java:1659)
      at Cl.main(Cl.java:38)

in the class Cl the line number 38 points to sc.close();
JavaSearch Engine Optimization (SEO)

Avatar of undefined
Last Comment
CEHJ
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

This is confusing - you're looking for what is clearly a timestamped logfile entry. So first of all, why would there be any more than one occurrence?
Avatar of pvinodp
pvinodp

ASKER

the string pattern is just another example... it can be any string.
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

OK. The file is a text file containing lines?
Avatar of pvinodp
pvinodp

ASKER

yes.
What i am trying to do is . divide the file into segments and then search each segment.

I aware of the risk of cutting my pattern while dividing it into segemtns.
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

OK. The file is a text file containing lines?
yes.
In that case, the file makes sense in terms of lines, so i'm wondering why you're trying to span multiple lines ..?
Avatar of pvinodp
pvinodp

ASKER

I want to create a job to search each segment. and in future allocate job to be executed on separate system.
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Segments don't actually have anything to do with a pattern spanning multiple lines. You can think of a segment of a text file as simply a file with fewer lines. The problem is the same (even if the problem space is smaller)
Avatar of pvinodp
pvinodp

ASKER

but i intend not to do line by line search..
I want to divide the whole file into many portions and then search
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

I want to divide the whole file into many portions and then search
How?
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Take the following file. Divide it equally into two and have each worker thread search each segment for the phrase 'THE CAT SAT ON THE MAT'. Neither will find it

jSx62KM9qB20iMmM1WdlKsM0tKuRbCwEmsyLcZPJ4dSvOHmDovpdoqxe21RpbafryY1
PBga3epRxM3v25usrWwirvVHiNZ28PnguDEFuZo4bKavq2R0T64vi4hnPIUUXMKagtnyKSNUQNyx
URwwKDGcTjfKtcTzw7uHaWSmmZeen51ZuOzmzNe76LnNumzLCkfyIOm9GYA0VsbGD47zkzoku033HSzCtrHHrs0XDBaOHQWL7hXKLfLLLqlpIGYH0kDctda3 lH2XAYfg0J
b THE CAT SAT ON THE MAT
EDGdSmXv18OTJEyeqPFXBCW7ATHsl66SGaFNNYgC5UvtSDPPr4KwDNRYVQDzWsCkzuPKuQ
5y7URHNjEO8eZ1siQUraAvpdZF1WM
ScBL4zBKQwZsrXD
IU
ie7xzxW14C6hV9olbQLHvuO7ZOU3Iva3fa0JqW9UCuK1fpRMeRBCUrQEffbnXDwMP
hzrrG8TJ

Open in new window

Avatar of pvinodp
pvinodp

ASKER

I think that is because the string is divided across the two parts.
Avatar of pvinodp
pvinodp

ASKER

I dint understand  your question..
HOW?
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image


I think that is because the string is divided across the two parts.
 
Yes, so

a. how are you going to divide it?
b. are you going to be looking for patterns such as "MAT.*EDGdSmXv18OT"? And if so - why?
Avatar of pvinodp
pvinodp

ASKER

in my case the pattern is going to appear many times.. and the I just need the number of occurences..
And as it is a log file from a hardware , the number can be huge .
I might get a file of size 1 gb . and if i make a 5 segments of them , i am at a risk of loosing 10 occurences.
But in my case the count can go to 100+ in a single segment.
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

You still haven't told me if you're looking for patterns that span lines, and if so, why
Avatar of pvinodp
pvinodp

ASKER

my pattern is a single word and it cannot span lines.
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of pvinodp
pvinodp

ASKER

i think line by line gives some improvement over using scan.
Avatar of pvinodp
pvinodp

ASKER

Thanks for your input
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

:)
Java
Java

Java is a platform-independent, object-oriented programming language and run-time environment, designed to have as few implementation dependencies as possible such that developers can write one set of code across all platforms using libraries. Most devices will not run Java natively, and require a run-time component to be installed in order to execute a Java program.

102K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo