Solved

index and search a HUGE text file w/Java

Posted on 1997-11-20
1
239 Views
Last Modified: 2006-11-17
I have a huge flat (text) file with each row being a record.  This thing is about 300 MILLION records and is a textfile in a UNIX environment.

I want to be able to index this file and then search for a particular string of info...all done by Java.  

We can do the same with a Sybase database, but we want to see the performance using a Java-based search.

Anyone know any Java apps that will do this?  How about non-java apps for simply indexing and searching a file?

We want to keep the methods as close to the methods used to index the WWW as possible.


Much thanks to anyone who can help!

daniel_garrison@hotmail.com
0
Comment
Question by:dgarrison
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 6

Accepted Solution

by:
jpk041897 earned 50 total points
ID: 1231016
Theres a couple of option you could use, either via a hash table or using JDBC to index on a Data Base.

Java is cetanly not designed as a number crunching language so 300 million records will probably not be eve supported by Java's hashing structure.

Your performance, using Java, will certanly be slower than Sybase, since Sybase uses C+ tree search algorithms on the indexes. (That's fourth root of the number of records times average disk seek time) to obtain the average seek time for one index.

Using JNI, you couold make calls to a C based Indexing database such as MIX softwares C Database toolchest or equivalent products and obtain values close to Sybase.

You could also port such a library, fairly painlesly, from C to Java and obtain slightly slower performace (provided you use a good native code Java Compiler such as Asymetric's Supercede).

Although any of these solutions would probably represent a lot more work than its worth if you allready have a DB app. that does the work for you.

My suggestion would be to use Java and JDBC, either as an applet or as an CGI app. To create the front end and use your Sybase DB as the back end for optimal performance.

Now if what you are looking for is a way to improve on the access time of the seeks, there was an article in Dr. Dobbs some years ago that contained a disk hashing algorithm in Pascal tha could access records in a file up to 1 GBytes in 2 disk accesses and 3 access for anything larger. Such an approach would yield no significant diferences between C++ or Java since procesing time is kept to a minimum. Of course, porting from Pascal to Java is a bit more painful than from C++ to Java.

If your interested in such an approach, I could dig up the article and send you the reference.


0

Featured Post

The Ultimate Checklist to Optimize Your Website

Websites are getting bigger and complicated by the day. Video, images, custom fonts are all great for showcasing your product/service. But the price to pay in terms of reduced page load times and ultimately, decreased sales, can lead to some difficult decisions about what to cut.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question