how to create a search engine?

Hello, this question might be odd little bit but i'm curious on how to create a search engine like google, of course not like google but something similar and standard, what tools i have to study to make my own search engine, i know this can be done, don't say this is impossible, just i need what things i must know and what plan should i take to make this project.

thank you
david875Asked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
Dave BaldwinConnect With a Mentor Fixer of ProblemsCommented:
The key pieces of a search engine are a web crawler to download pages, a program to extract the information from them that you're looking for and put it into a database, and a website to accept queries and return results from the database.   You will probably use several languages like C/C++ and PHP/HTML/javascript and you will need to understand how to use and organize your database.  You will also need the computer resources to do these things.  The 'big' search engines like Google and Bing have tens of thousands of computers in their own data centers doing these things.
0
 
tbsgadiConnect With a Mentor Commented:
Have a look at this similar question:

http://www.nairaland.com/nigeria/topic-8148.0.html

Gary
0
 
david875Author Commented:
do you have documents on how google's system or algorithm work?
0
Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

 
david875Author Commented:
Do you have any idea what database google use? what about the crowl? does google store every page that anyone submitted? is this the way google shows the answers? i heared that google search engine is based on Perl language. any idea
0
 
tbsgadiConnect With a Mentor Commented:
0
 
david875Author Commented:
i think i've seen this, Do you have any idea what database google use? what about the crowl? does google store every page that anyone submitted? is this the way google shows the answers? i heared that google search engine is based on Perl language. any idea
0
 
Ted BouskillConnect With a Mentor Senior Software DeveloperCommented:
Google uses their own proprietary database http://labs.google.com/papers/bigtable.html

Hadoop is capable of being the basis for a search database: http://hadoop.apache.org/index.html

Before there was Google there used to be 'spiders' or 'bots' for crawling the web: http://en.wikipedia.org/wiki/Web_crawler

ALL the hard working code at Google is in C/C++ (I have contacts that work at Google)

They use Perl or other technology for maintenance work or running their websites.  Look at this site: http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=perl&lang2=gpp  C is substantially faster than Perl.  Let's say for the sake of argument that C is ten times faster (which is conservative) then on a single computer a C program could execute in 1/10 the time.  That may not be a big deal, however, it also means it uses 1/10 the server resources.  As you start to scale a Perl program would likely require ten times as many servers to handle the same capacity as one server running C.

No search engine would store the entire page.  Can you imagine how much space would be required to do so?  They store a snippet of the text plus keywords and that's it.
0
 
efnConnect With a Mentor Commented:
Actually, Google stores snapshots of many (though not all) of the pages they crawl in a cache.  When you see a "Cached" link in Google search results, that goes to Google's stored copy, not the original page.

http://www.googleguide.com/cached_pages.html
0
 
david875Author Commented:
@tedbilly: thank you for taking time and answer my question, it's really kind of you. When you type a keyword to search about something, the button will go and contact the database which is the BigTable right? Then it will bring the information based on the keyword you typed while searching, and it will show you the information, if i understood, BigTable is a database created by Google and it was written in C/C++ by the google founders right? What does contain this BigTable? does it store keywords in order of popularity? And if this is true the BigTable contacts another database to find the websites related to the keyword? no? so i think there is more than 1 database, i just need clarification to have a good understanding of how this work. and I really thank you from my heart for every word you type to explain me. God bless you guys :)
0
 
Ted BouskillConnect With a Mentor Senior Software DeveloperCommented:
Google uses more than BigTable in addition to other systems and they control their proprietary secrets very carefully.  You have to remember that their system is over a decade old and has had the work of hundreds of full-time developers working on it.  Even if you are 20 years old and work for 80 years full time you won't be able to match the hours they have put into that system.

BigTable and Hadoop are essentially like a wildcard grep distributed database.  Don't think of them like a typical single database running on a single server because they have been designed to run on LOTs of hardware.

All of Google's infrastructure was actually designed to run on huge array's of servers, old and new.  Initially the founders couldn't afford new hardware so they designed all their systems to be distributed over multiple inexpensive servers that run their own proprietary operating system.

The white paper on the link I provided has more detail.
0
 
david875Author Commented:
Thanks @tedbilly for the answer but google started from the scratch i think and they had a rubbish server, how did they get successful like that? is it because there is no search engine in that time? how did they get famous like this? their algorithms took efect and success since that time?

How the search engine finds the links, is it because users submitted their websites into the search engine? or what?
0
 
Ted BouskillConnect With a Mentor Senior Software DeveloperCommented:
Did you read the Google history link in comment http:#34209030 by tbsgadi?  They actually describe how Google first designed the system.

There were other search engines at the time Google started (also in the article) and they received funding while is school from the National Science Foundation.  They also benefited from starting up during the dotCom bubble of the late 90's.
http://en.wikipedia.org/wiki/History_of_Google#Financing_and_initial_public_offering

I would never say it's impossible to start another search engine company, however, it would be far more difficult that when Google started.  For one, Google is now a brand and is even used as a verb for searching.  People will now say "Why don't you Google it?"  When Google started there company they didn't have to compete against a brand like that.
0
 
david875Author Commented:
Yes, indeed it's not impossible to start a new search engine, this will require a hard work to do and study every piece of the search engine, and master programming languages only if you found a sponsor to help you getting it like bing or any other new search engine. One last question, did google submitted or add all urls and sites into their database?
0
 
Ted BouskillConnect With a Mentor Senior Software DeveloperCommented:
If you read the history Google finds sites using URL's on a starter site, then ranks based on the links pointing back to the first site.  They can only find sites by finding links to new sites on sites they've already found PLUS if there is a low enough rank they won't even keep it in their database.

So no Google does not have all the possible sites.  No search engine does.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.