Solved

Fastest way to search in a 27 GB database ?

Posted on 2013-06-01
20
379 Views
Last Modified: 2013-06-03
Hi experts,

Will it take too long to search for a specific text in a 27 GB database full of text with the almost the fastest processors on a server, several times per second ? Can be there any other method to make the text search more fast ? Thankyou
0
Comment
Question by:AlexMert
  • 6
  • 5
  • 5
  • +1
20 Comments
 
LVL 58

Expert Comment

by:Gary
ID: 39212838
What kind of database?
What kind of text?
What kind of fields?
And so on...
0
 
LVL 57

Expert Comment

by:giltjr
ID: 39212843
CPU does not matter as much as the amount of RAM, speed of the RAM, and what the I/O capabilities of the disk subsystem is.

So what CPU do you have, how much RAM, Speed of RAM and is is single, dual, or triple channel, and what is the disk setup?
0
 
LVL 27

Accepted Solution

by:
d-glitch earned 500 total points
ID: 39212859
If you are doing many searches on a stable (or slowly changing) database you should create an index (and maintain it).
0
 

Author Comment

by:AlexMert
ID: 39212864
Well right now I don't have anything, just an idea that I want to do, and I want to know if I can make searching for an ASCII text, very fast in a 27 GB database full of ASCII text

So @d-glitch, creating a index, will increase the speed of the search ? Sounds interesting
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39212869
Indexed searches will speed things up by factors of 10^4 or more.
0
 
LVL 58

Expert Comment

by:Gary
ID: 39212881
Indexed searches will speed things up by factors of 10^4 or more.

Not necessarily so. If you are searching multiple text fields and using wildcards like %my text% then indexing will have no impact.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 39212972
I agree with GarC123.  It is true that if what you are looking for is index, it will speed up.

However,  if you doing a wild card search or a search across multiple fields, then a index does not help at all because you aren't looking at the index.

Think of it this way.  You want to look up the word "the" in a dictionary.  You can look at the top each page as a "index" because it shows the 1st and last word on each page.  That make it real easy and fast to find the page with the word "the" on it.  That is way faster than looking at each word on each page.

However, if you want to find every definition that has the word "the" in it you have to look at each definition in the whole dictionary.  The "index" does not help at all.
0
 

Author Comment

by:AlexMert
ID: 39213023
No, I just need to look for a fixed specific text in that database. With indexing , how many searches could I make per minute for that word, if the datebase would be 27 GB and a high speed RAm and processor ? What changes would I have ?
0
 
LVL 58

Expert Comment

by:Gary
ID: 39213055
I just need to look for a fixed specific text
This my sentence full of words
If you want to search for sentence or my sentence then an index is useless

Without saying what kind of fields, blob, text etc and the type of queries and the type of server (is this a dedicated database server) then all this is just generic advice.
But anyway you've accepted a generic answer.
0
 

Author Comment

by:AlexMert
ID: 39213083
@GaryC123, as I mentioned in an early post, I don't have anything right now, not even that database, I just want to start a project and I want to know what are my oportunities and possibilities. I can do anything like buying only server dedicated for database and all things dedicated for this, just to can make the searching as fast as possible. But I need to know what to make...
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 58

Expert Comment

by:Gary
ID: 39213090
Maybe you should open a new question.
Ask a vague question you are only going to get vague and probably inaccurate answers.
What is it you are planning on doing?
Over and out.
0
 

Author Comment

by:AlexMert
ID: 39213153
I just want to know if its possible to make searching in a very big database like 27+GB, for a specific text, fast enough to make several searches per second
0
 
LVL 58

Expert Comment

by:Gary
ID: 39213231
Size of the database is irrelevant -  27 Gb might sound big to you but it isn't really - databases can be in the multiple terabytes in size.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39213240
Maybe.  What are the day and what are the searches?

Assume your data is 270M 100 byte sentences.  You make a hash table for your data.
One part of your index is a hashed list of 1M words and the sentances that contain them.
Another part is the most common 100K two word phrases and their sentances.
And maybe you do three word phrases too.
Todo a search, start with the least common word in the search string. What sentences does it appear in?  How many?  You can check them all, or look at the 2nd least common word and look for overlap.  You may be able to do all the work in your small, highly optimized index.
Better, real-world examples will get you better, real-world answers.
0
 

Author Comment

by:AlexMert
ID: 39213334
Ok Thanks alot for all the help, I will optimise as d-glitch said the index and also Gary gived me very much optimism :)
0
 
LVL 57

Expert Comment

by:giltjr
ID: 39213490
Is the data within this "data base" fixed?  

Or are you adding and removing "records"?

What does a "record" consist of?

What will be your key?
0
 

Author Comment

by:AlexMert
ID: 39214495
The data is fixed. I don't know what are records, but I won't add absolutely anything or removing anything from the database. Also there will be no key :) thanks
0
 
LVL 57

Expert Comment

by:giltjr
ID: 39214700
You do realize that the index is nothing but a small file that contains the keys and pointers to the record.

Without a key, you can't have a index.  In fact without a key/index you don't have a data base, you have a flat file.  Which means each search is going to scan the while 27GB file every/each time.

Which gets back to my suggestion of getting a computer with the fastest RAM and CPU you can get, creating a in-memory file system, loading the file into that and go for it.
0
 
LVL 27

Expert Comment

by:d-glitch
ID: 39216300
>>  You do realize that the index is nothing but a small file that contains the keys and
       pointers to the record.

The sort of index I have suggested would be carefully designed and highly optimized to meet the users' performance specifications.  It won't and can't exist until the users deliver those specifications.

>> Size of the database is irrelevant

Really hard to imagine anyone believing that.

======================================================================

Since we have essentially zero information on the database and searches, I have assumed the following:
     The database consists of 270M 100 byte sentences, numbered sequentially.
     There are 2.7B words (avg word len => 9).
     There are 1M unique words.

The index would be a hashed list of every word that appears in the  database.  The data associated with each word would include the number of every sentence that contains the word.  

This index contains 2.7B sentence ID numbers (one for each word in the database).
The sentence ID number takes 28 bits, so this table is comparable in size to the original database.  The enormous advantage is that it is searchable.

Suppose you need to search the database for the following sentence:
       The rings of Uranus were discovered during a syzygy in 1977.

You look up each word in your hashed index.  Two words stand out as uncommon:  Uranus (500K occurrences) and syzygy (2K occurrences).  Proper hashing lets you find data very fast (approaching one try on average).

Your search effort is reduced from 270M sentence comparisons to a maximum of 2K.
But sentence comparisons may still be to computationally intensive.

You could also look for overlaps of Uranus and syzygy.  Are any of the 2K syzygy sentences also on the list of 500K Uranus sentences?  This would reduce your effort to 2K index entry comparisons.  If there are any overlaps, you will still have to do some sentence comparisons.  If not, you are done.
0
 
LVL 57

Expert Comment

by:giltjr
ID: 39217369
d-glitch,

I believe most of the responders here understand what index's can do, however I don't think  AlexMert really understands (no offense AlexMert).  

As you notice one of his responses is "Also there will be no key", thus my response of letting him know what an index really is.

Without really knowing any more  about what the data is we could all be guessing (and making assumptions) about what can and can't be done.
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Prime numbers are natural numbers greater than 1 that have only two divisors (the number itself and 1). By “divisible” we mean dividend % divisor = 0 (% indicates MODULAR. It gives the reminder of a division operation). We’ll follow multiple approac…
When it comes to showing a 404 error page to your visitors, you do not want that generic page to show, and you especially do not want your hosting provider’s ad error page to show either. In this article, I will show you how to enable the custom 40…
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, Just open a new email message.  In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now