Solved

how to create a search engine?

Posted on 2010-11-22
17
752 Views
Last Modified: 2013-12-25
Hello, this question might be odd little bit but i'm curious on how to create a search engine like google, of course not like google but something similar and standard, what tools i have to study to make my own search engine, i know this can be done, don't say this is impossible, just i need what things i must know and what plan should i take to make this project.

thank you
0
Comment
Question by:david875
  • 6
  • 4
  • 3
  • +2
17 Comments
 
LVL 83

Accepted Solution

by:
Dave Baldwin earned 63 total points
ID: 34203140
The key pieces of a search engine are a web crawler to download pages, a program to extract the information from them that you're looking for and put it into a database, and a website to accept queries and return results from the database.   You will probably use several languages like C/C++ and PHP/HTML/javascript and you will need to understand how to use and organize your database.  You will also need the computer resources to do these things.  The 'big' search engines like Google and Bing have tens of thousands of computers in their own data centers doing these things.
0
 
LVL 46

Assisted Solution

by:tbsgadi
tbsgadi earned 126 total points
ID: 34203326
Have a look at this similar question:

http://www.nairaland.com/nigeria/topic-8148.0.html

Gary
0
 

Author Comment

by:david875
ID: 34204135
do you have documents on how google's system or algorithm work?
0
Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

 
LVL 46

Expert Comment

by:tbsgadi
ID: 34204270
0
 

Author Comment

by:david875
ID: 34208502
Do you have any idea what database google use? what about the crowl? does google store every page that anyone submitted? is this the way google shows the answers? i heared that google search engine is based on Perl language. any idea
0
 
LVL 46

Assisted Solution

by:tbsgadi
tbsgadi earned 126 total points
ID: 34209030
0
 

Author Comment

by:david875
ID: 34209273
i think i've seen this, Do you have any idea what database google use? what about the crowl? does google store every page that anyone submitted? is this the way google shows the answers? i heared that google search engine is based on Perl language. any idea
0
 
LVL 51

Assisted Solution

by:Ted Bouskill
Ted Bouskill earned 249 total points
ID: 34210584
Google uses their own proprietary database http://labs.google.com/papers/bigtable.html

Hadoop is capable of being the basis for a search database: http://hadoop.apache.org/index.html

Before there was Google there used to be 'spiders' or 'bots' for crawling the web: http://en.wikipedia.org/wiki/Web_crawler

ALL the hard working code at Google is in C/C++ (I have contacts that work at Google)

They use Perl or other technology for maintenance work or running their websites.  Look at this site: http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=perl&lang2=gpp  C is substantially faster than Perl.  Let's say for the sake of argument that C is ten times faster (which is conservative) then on a single computer a C program could execute in 1/10 the time.  That may not be a big deal, however, it also means it uses 1/10 the server resources.  As you start to scale a Perl program would likely require ten times as many servers to handle the same capacity as one server running C.

No search engine would store the entire page.  Can you imagine how much space would be required to do so?  They store a snippet of the text plus keywords and that's it.
0
 
LVL 15

Assisted Solution

by:efn
efn earned 62 total points
ID: 34210868
Actually, Google stores snapshots of many (though not all) of the pages they crawl in a cache.  When you see a "Cached" link in Google search results, that goes to Google's stored copy, not the original page.

http://www.googleguide.com/cached_pages.html
0
 

Author Comment

by:david875
ID: 34213436
@tedbilly: thank you for taking time and answer my question, it's really kind of you. When you type a keyword to search about something, the button will go and contact the database which is the BigTable right? Then it will bring the information based on the keyword you typed while searching, and it will show you the information, if i understood, BigTable is a database created by Google and it was written in C/C++ by the google founders right? What does contain this BigTable? does it store keywords in order of popularity? And if this is true the BigTable contacts another database to find the websites related to the keyword? no? so i think there is more than 1 database, i just need clarification to have a good understanding of how this work. and I really thank you from my heart for every word you type to explain me. God bless you guys :)
0
 
LVL 51

Assisted Solution

by:Ted Bouskill
Ted Bouskill earned 249 total points
ID: 34215261
Google uses more than BigTable in addition to other systems and they control their proprietary secrets very carefully.  You have to remember that their system is over a decade old and has had the work of hundreds of full-time developers working on it.  Even if you are 20 years old and work for 80 years full time you won't be able to match the hours they have put into that system.

BigTable and Hadoop are essentially like a wildcard grep distributed database.  Don't think of them like a typical single database running on a single server because they have been designed to run on LOTs of hardware.

All of Google's infrastructure was actually designed to run on huge array's of servers, old and new.  Initially the founders couldn't afford new hardware so they designed all their systems to be distributed over multiple inexpensive servers that run their own proprietary operating system.

The white paper on the link I provided has more detail.
0
 

Author Comment

by:david875
ID: 34216836
Thanks @tedbilly for the answer but google started from the scratch i think and they had a rubbish server, how did they get successful like that? is it because there is no search engine in that time? how did they get famous like this? their algorithms took efect and success since that time?

How the search engine finds the links, is it because users submitted their websites into the search engine? or what?
0
 
LVL 51

Assisted Solution

by:Ted Bouskill
Ted Bouskill earned 249 total points
ID: 34220616
Did you read the Google history link in comment http:#34209030 by tbsgadi?  They actually describe how Google first designed the system.

There were other search engines at the time Google started (also in the article) and they received funding while is school from the National Science Foundation.  They also benefited from starting up during the dotCom bubble of the late 90's.
http://en.wikipedia.org/wiki/History_of_Google#Financing_and_initial_public_offering

I would never say it's impossible to start another search engine company, however, it would be far more difficult that when Google started.  For one, Google is now a brand and is even used as a verb for searching.  People will now say "Why don't you Google it?"  When Google started there company they didn't have to compete against a brand like that.
0
 

Author Comment

by:david875
ID: 34223381
Yes, indeed it's not impossible to start a new search engine, this will require a hard work to do and study every piece of the search engine, and master programming languages only if you found a sponsor to help you getting it like bing or any other new search engine. One last question, did google submitted or add all urls and sites into their database?
0
 
LVL 51

Assisted Solution

by:Ted Bouskill
Ted Bouskill earned 249 total points
ID: 34224050
If you read the history Google finds sites using URL's on a starter site, then ranks based on the links pointing back to the first site.  They can only find sites by finding links to new sites on sites they've already found PLUS if there is a low enough rank they won't even keep it in their database.

So no Google does not have all the possible sites.  No search engine does.
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Learn by example how to specify CSS selectors for Selenium WebDriver test automation software.
Is your company's data protection keeping pace with virtualization? Here are 7 dynamic ways to adapt to rapid breakthroughs in technology.
This Micro Tutorial will explain how to export DynamoDB tables in Amazon Web Services.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question