Solved

index and search a HUGE text file w/Java

Posted on 1997-11-20
1
229 Views
Last Modified: 2006-11-17
I have a huge flat (text) file with each row being a record.  This thing is about 300 MILLION records and is a textfile in a UNIX environment.

I want to be able to index this file and then search for a particular string of info...all done by Java.  

We can do the same with a Sybase database, but we want to see the performance using a Java-based search.

Anyone know any Java apps that will do this?  How about non-java apps for simply indexing and searching a file?

We want to keep the methods as close to the methods used to index the WWW as possible.


Much thanks to anyone who can help!

daniel_garrison@hotmail.com
0
Comment
Question by:dgarrison
1 Comment
 
LVL 6

Accepted Solution

by:
jpk041897 earned 50 total points
ID: 1231016
Theres a couple of option you could use, either via a hash table or using JDBC to index on a Data Base.

Java is cetanly not designed as a number crunching language so 300 million records will probably not be eve supported by Java's hashing structure.

Your performance, using Java, will certanly be slower than Sybase, since Sybase uses C+ tree search algorithms on the indexes. (That's fourth root of the number of records times average disk seek time) to obtain the average seek time for one index.

Using JNI, you couold make calls to a C based Indexing database such as MIX softwares C Database toolchest or equivalent products and obtain values close to Sybase.

You could also port such a library, fairly painlesly, from C to Java and obtain slightly slower performace (provided you use a good native code Java Compiler such as Asymetric's Supercede).

Although any of these solutions would probably represent a lot more work than its worth if you allready have a DB app. that does the work for you.

My suggestion would be to use Java and JDBC, either as an applet or as an CGI app. To create the front end and use your Sybase DB as the back end for optimal performance.

Now if what you are looking for is a way to improve on the access time of the seeks, there was an article in Dr. Dobbs some years ago that contained a disk hashing algorithm in Pascal tha could access records in a file up to 1 GBytes in 2 disk accesses and 3 access for anything larger. Such an approach would yield no significant diferences between C++ or Java since procesing time is kept to a minimum. Of course, porting from Pascal to Java is a bit more painful than from C++ to Java.

If your interested in such an approach, I could dig up the article and send you the reference.


0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

For customizing the look of your lightweight component and making it look opaque like it was made of plastic.  This tip assumes your component to be of rectangular shape and completely opaque.   (CODE)
INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
Viewers will learn about the regular for loop in Java and how to use it. Definition: Break the for loop down into 3 parts: Syntax when using for loops: Example using a for loop:

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now