Search Engine Implementation

How I can implement a serch engine in Java. What is the basis of a search ?
LVL 2
muraliramAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

mbormannCommented:
I dont know anything ,but I got this code from somewhere ,see if u can understand it


A Java Applet Search Engine
by Tim Kientzle


Listing One
public class DBBTree {
  protected RandomAccessFile file;
  protected File fileName;
  private int pageSize;  // page size
  private boolean msbFirst; // True = MSB byte order, False = LSB byte order
  /** The constructor takes a File name or File object. */
  public DBBTree(String name) throws IOException {
    this(new File(name));
  }
  public DBBTree(java.io.File name) throws IOException {
    fileName = name;
    try {
      file = new RandomAccessFile(name,"r");
    } catch (SecurityException e) {
      throw new IOException("Wrong permissions for "+name+": "+e);
    } catch (IOException e) {
      throw new IOException("file open failed: " + e);
    }
    /* Read the metadata.  I only read the first three values. */
    pageSize = 256; // Big enough to get all of the metadata
    byte [] metaData = readRawPage(0); // Page 0 holds metadata
    // Read magic number and determine endianness of DB file
    msbFirst = true;  // Try MSB format first...
    int magic = bytesToInt(metaData,0);
    if(magic != 0x053162) { // Failed? Try LSB...
      msbFirst = false;
      magic = bytesToInt(metaData,0);
      System.out.println("Opening LSB format database");
    } else
      System.out.println("Opening MSB format database");
    // Read version number and abort if not a DB file.
    int version = bytesToInt(metaData,4);
    if((magic != 0x053162) || (version != 3)) // BTREE Magic number & version
      throw new IOException("Not a DB file (magic: 0x"
                    + Integer.toHexString(magic) + ", version: " 
                    + Integer.toString(version) + ")");
    // Set actual page size
    pageSize = bytesToInt(metaData,8);
    setCacheSize(8);    // Set the cache to 8 pages as default
    primeCache();       // Pre-read first 8 pages into cache
  }
  /** Search for a key.  A 'key' here is a byte array. Returns the data
   * associated with the key or null if the key wasn't found. */
  public byte [] search(byte [] key) throws IOException {
    return readPage(1).search(key);    // Search always begins on page 1
  }
  /** Read a raw page, return an array of bytes containing that page. */
  protected byte []  readRawPage(int pageno) throws IOException {
    try{
      byte data[] = new byte[pageSize];
      file.seek(pageno*pageSize);
      file.read(data);
      return data;
    } catch (IOException e) {
      throw new IOException("readRawPage("+pageno+") failed: " + e);
    }
  }
  /** The cache is kept in LRU (least-recently-used) order; that is, whenever
   * a page is accessed, it gets moved to the front of the list, and pages
   * are lost when they fall off the end of the list. Happily, this code is
   * simpler than the corresponding C version, thanks to the GC. */
  int cachePageNumbers[];
  byte cachePages[][];
  /** Set the cache size.  By default, it's set to 8 pages. */
  public void setCacheSize(int size) {
    int[] newCachePageNumbers = new int[size];
    if(cachePageNumbers != null)
      for(int i=0;i<size && i<cachePageNumbers.length;i++)
    newCachePageNumbers[i] = cachePageNumbers[i];
    else
      for(int i=0;i<size;i++)
    newCachePageNumbers[i] = 0;
    cachePageNumbers = newCachePageNumbers;
    byte[][] newCachePages = new byte[size] [];
    if(cachePages != null)
      for(int i=0;i<size && i<cachePages.length;i++)
    newCachePages[i] = cachePages[i];
    cachePages = newCachePages;
  }
  /** Fill the cache with the first pages in the database. This can improve
   * perceived performance noticably. In particular, this pulls in page 1,
   * which is the top page of the B-Tree, and is searched on every lookup. */
  public void primeCache() throws IOException {
    int i = 1;
    while(cachePageNumbers[cachePageNumbers.length-1] == 0) {
      readPage(i);
      i++;
    }
  }
  /* Because I'm only caching a few pages, it's perfectly reasonable to use a
   * simple array and just move elements down in the array to maintain LRU
   * order. For a large cache, something more sophisticated would be
   * appropriate. */
  DBBTreePage readPage(int pageno) throws IOException {
    // Try to find page in cache, return it if found
    for(int i=0; i< cachePageNumbers.length; i++) {
      if (cachePageNumbers[i] == pageno) {
    byte page[] = cachePages[i];
    if(i>0) {
      // Move this page to front of cache
      System.arraycopy(cachePageNumbers,0, cachePageNumbers,1,i);
      System.arraycopy(cachePages,0, cachePages,1,i);
      cachePageNumbers[0] = pageno;
      cachePages[0] = page;
    }
    // Return found page
    return new DBBTreePage(this,page);
      }
    }
    // Page wasn't in cache, read it from disk
    byte page[] = readRawPage(pageno);
    // Insert new page at front of cache.
    System.arraycopy(cachePageNumbers,0,
             cachePageNumbers,1,cachePageNumbers.length-1);
    System.arraycopy(cachePages,0, cachePages,1,cachePages.length-1);
    cachePageNumbers[0] = pageno;
    cachePages[0] = page;
    return new DBBTreePage(this,page);
  }
  /** Convert four bytes from a byte array into an integer, using LSB or MSB
   * data order, as appropriate.
   * @param barray Array of bytes
   * @param offset Offset in barray to read integer
   * @return integer formed from indicated bytes */
  int bytesToInt(byte barray[],int offset) {
    if(msbFirst)
      return 16777216 * (((int)barray[offset  ])&255) +
                65536 * (((int)barray[offset+1])&255) +
                  256 * (((int)barray[offset+2])&255) +
                        (((int)barray[offset+3])&255);
    else
      return            (((int)barray[offset  ])&255) +
                  256 * (((int)barray[offset+1])&255) +
                65536 * (((int)barray[offset+2])&255) +
             16777216 * (((int)barray[offset+3])&255);
  }
  /** Convert two bytes from a byte array into a short.
   * @param barray Array of bytes
   * @param offset Offset in barray to read short
   * @return short formed from indicated bytes */
  short bytesToShort(byte barray[],int offset) {
    if(msbFirst)
      return (short)(256 * (((short)barray[offset  ])&255) +
                           (((short)barray[offset+1])&255));
    else
      return (short)(      (((short)barray[offset  ])&255) +
                     256 * (((short)barray[offset+1])&255));
  }
}


Listing Two
  // There doesn't seem to be any one way to open a file that
  // works on all platforms, so this function tries several...
  DBBTree openDatabase(String filename) {
    StringBuffer work;
    DBBTree db;
    // First, try seeing if there's such a thing as a 'current directory'
    try {
      db = new DBBTree(new File(filename));
      System.out.println("    Opened " + filename);
      return db;
    } catch (IOException e) {
      System.out.println("    Couldn't open " + filename);
      System.out.println("    IOException: " + e);
    }
    // Manipulate the document base URL to build a more complete pathname.
    try {
      filename = new java.net.URL(getDocumentBase(),filename).getFile();
    } catch (java.net.MalformedURLException e) {
      // If you can't build a more complete pathname, you're sunk
      return null;
    }
    // Try filename part of file: URL
    try {
      db = new DBBTree(filename);
      System.out.println("    Opened " + filename);
      return db;
    } catch (IOException e) {
      System.out.println("    Couldn't open " + filename);
      System.out.println("    IOException: " + e);
    }
    // Try changing the / to the native separator
    // Note: some platforms lie about the native separator... <sigh>
    work = new StringBuffer(filename);
    try {
      char separator = File.separatorChar;
      System.out.println("    file.separator = " + separator);
      for(int i=0;i<work.length();i++) // Change / to native separator
    if(work.charAt(i) == '/') work.setCharAt(i,separator);
      db = new DBBTree(work.toString());
      System.out.println("    Opened " + work);
      return db;
    } catch (IOException e) {
      System.out.println("    Couldn't open " + work);
      System.out.println("    IOException: " + e);
    }
    // Strip first character ...
    filename = filename.substring(1);
    try {
      db = new DBBTree(filename);
      System.out.println("    Opened " + filename);
      return db;
    } catch (IOException e) {
      System.out.println("    Couldn't open " + filename);
      System.out.println("    IOException: " + e);
    }
    // Everything failed... <sigh>
    return null;
  }
0
mbormannCommented:
http://www.ddj.com/articles/1999/9902/9902b/9902b.htm

http://www.ddj.com/ftp/1999/1999_02/jsearch.txt

I doubt that anybody would give it to u for free,so am locking this here.Also then maybe I could even start my own business and mint money rather than ...

:-)
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

Ravindra76Commented:


WOW. It's also a biggest mystery hunting mefrom long time
0
bava_anandCommented:
Are u going to implement search engine
or search option for your site .

Search engine is capable of searching content from more than one webserver.

where as search option only search the
content in your server only or archives
u specifies.

In any case u need to understand the
concept before implementing in any of the language.

1)Formatting the html file:
First you have to collect all the keyword in the html file and put it in the meta tag of that html page

2)Uploading to the database:
Read meta tag in each html file and
transfer to the database accoring to the
category it comes.

3)Providing Interface:
Porivide a user interface so that user
can enter the keyword,author name or title . After the user press submit , retreive the data from the database and
display in the format that most search
engine usefully follows.

If u need any more clarification, specify me the exact problem.

0
muraliramAuthor Commented:
Thanks for your answer mbormann's answer.

Thanks bava anand for clarifying my idea on the implementation of a  search engine.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.