Solved

how to create a search engine?

Posted on 2010-11-22
17
750 Views
Last Modified: 2013-12-25
Hello, this question might be odd little bit but i'm curious on how to create a search engine like google, of course not like google but something similar and standard, what tools i have to study to make my own search engine, i know this can be done, don't say this is impossible, just i need what things i must know and what plan should i take to make this project.

thank you
0
Comment
Question by:david875
  • 6
  • 4
  • 3
  • +2
17 Comments
 
LVL 82

Accepted Solution

by:
Dave Baldwin earned 63 total points
Comment Utility
The key pieces of a search engine are a web crawler to download pages, a program to extract the information from them that you're looking for and put it into a database, and a website to accept queries and return results from the database.   You will probably use several languages like C/C++ and PHP/HTML/javascript and you will need to understand how to use and organize your database.  You will also need the computer resources to do these things.  The 'big' search engines like Google and Bing have tens of thousands of computers in their own data centers doing these things.
0
 
LVL 46

Assisted Solution

by:tbsgadi
tbsgadi earned 126 total points
Comment Utility
Have a look at this similar question:

http://www.nairaland.com/nigeria/topic-8148.0.html

Gary
0
 

Author Comment

by:david875
Comment Utility
do you have documents on how google's system or algorithm work?
0
 
LVL 46

Expert Comment

by:tbsgadi
Comment Utility
0
 

Author Comment

by:david875
Comment Utility
Do you have any idea what database google use? what about the crowl? does google store every page that anyone submitted? is this the way google shows the answers? i heared that google search engine is based on Perl language. any idea
0
 
LVL 46

Assisted Solution

by:tbsgadi
tbsgadi earned 126 total points
Comment Utility
0
 

Author Comment

by:david875
Comment Utility
i think i've seen this, Do you have any idea what database google use? what about the crowl? does google store every page that anyone submitted? is this the way google shows the answers? i heared that google search engine is based on Perl language. any idea
0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 51

Assisted Solution

by:tedbilly
tedbilly earned 249 total points
Comment Utility
Google uses their own proprietary database http://labs.google.com/papers/bigtable.html

Hadoop is capable of being the basis for a search database: http://hadoop.apache.org/index.html

Before there was Google there used to be 'spiders' or 'bots' for crawling the web: http://en.wikipedia.org/wiki/Web_crawler

ALL the hard working code at Google is in C/C++ (I have contacts that work at Google)

They use Perl or other technology for maintenance work or running their websites.  Look at this site: http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=perl&lang2=gpp  C is substantially faster than Perl.  Let's say for the sake of argument that C is ten times faster (which is conservative) then on a single computer a C program could execute in 1/10 the time.  That may not be a big deal, however, it also means it uses 1/10 the server resources.  As you start to scale a Perl program would likely require ten times as many servers to handle the same capacity as one server running C.

No search engine would store the entire page.  Can you imagine how much space would be required to do so?  They store a snippet of the text plus keywords and that's it.
0
 
LVL 15

Assisted Solution

by:efn
efn earned 62 total points
Comment Utility
Actually, Google stores snapshots of many (though not all) of the pages they crawl in a cache.  When you see a "Cached" link in Google search results, that goes to Google's stored copy, not the original page.

http://www.googleguide.com/cached_pages.html
0
 

Author Comment

by:david875
Comment Utility
@tedbilly: thank you for taking time and answer my question, it's really kind of you. When you type a keyword to search about something, the button will go and contact the database which is the BigTable right? Then it will bring the information based on the keyword you typed while searching, and it will show you the information, if i understood, BigTable is a database created by Google and it was written in C/C++ by the google founders right? What does contain this BigTable? does it store keywords in order of popularity? And if this is true the BigTable contacts another database to find the websites related to the keyword? no? so i think there is more than 1 database, i just need clarification to have a good understanding of how this work. and I really thank you from my heart for every word you type to explain me. God bless you guys :)
0
 
LVL 51

Assisted Solution

by:tedbilly
tedbilly earned 249 total points
Comment Utility
Google uses more than BigTable in addition to other systems and they control their proprietary secrets very carefully.  You have to remember that their system is over a decade old and has had the work of hundreds of full-time developers working on it.  Even if you are 20 years old and work for 80 years full time you won't be able to match the hours they have put into that system.

BigTable and Hadoop are essentially like a wildcard grep distributed database.  Don't think of them like a typical single database running on a single server because they have been designed to run on LOTs of hardware.

All of Google's infrastructure was actually designed to run on huge array's of servers, old and new.  Initially the founders couldn't afford new hardware so they designed all their systems to be distributed over multiple inexpensive servers that run their own proprietary operating system.

The white paper on the link I provided has more detail.
0
 

Author Comment

by:david875
Comment Utility
Thanks @tedbilly for the answer but google started from the scratch i think and they had a rubbish server, how did they get successful like that? is it because there is no search engine in that time? how did they get famous like this? their algorithms took efect and success since that time?

How the search engine finds the links, is it because users submitted their websites into the search engine? or what?
0
 
LVL 51

Assisted Solution

by:tedbilly
tedbilly earned 249 total points
Comment Utility
Did you read the Google history link in comment http:#34209030 by tbsgadi?  They actually describe how Google first designed the system.

There were other search engines at the time Google started (also in the article) and they received funding while is school from the National Science Foundation.  They also benefited from starting up during the dotCom bubble of the late 90's.
http://en.wikipedia.org/wiki/History_of_Google#Financing_and_initial_public_offering

I would never say it's impossible to start another search engine company, however, it would be far more difficult that when Google started.  For one, Google is now a brand and is even used as a verb for searching.  People will now say "Why don't you Google it?"  When Google started there company they didn't have to compete against a brand like that.
0
 

Author Comment

by:david875
Comment Utility
Yes, indeed it's not impossible to start a new search engine, this will require a hard work to do and study every piece of the search engine, and master programming languages only if you found a sponsor to help you getting it like bing or any other new search engine. One last question, did google submitted or add all urls and sites into their database?
0
 
LVL 51

Assisted Solution

by:tedbilly
tedbilly earned 249 total points
Comment Utility
If you read the history Google finds sites using URL's on a starter site, then ranks based on the links pointing back to the first site.  They can only find sites by finding links to new sites on sites they've already found PLUS if there is a low enough rank they won't even keep it in their database.

So no Google does not have all the possible sites.  No search engine does.
0

Featured Post

ScreenConnect 6.0 Free Trial

At ScreenConnect, partner feedback doesn't fall on deaf ears. We collected partner suggestions off of their virtual wish list and transformed them into one game-changing release: ScreenConnect 6.0. Explore all of the extras and enhancements for yourself!

Join & Write a Comment

Data center, now-a-days, is referred as the home of all the advanced technologies. In-fact, most of the businesses are now establishing their entire organizational structure around the IT capabilities.
Read about why website design really matters in today's demanding market.
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now