Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

how to create a search engine?

Posted on 2010-11-22
17
Medium Priority
?
758 Views
Last Modified: 2013-12-25
Hello, this question might be odd little bit but i'm curious on how to create a search engine like google, of course not like google but something similar and standard, what tools i have to study to make my own search engine, i know this can be done, don't say this is impossible, just i need what things i must know and what plan should i take to make this project.

thank you
0
Comment
Question by:david875
  • 6
  • 4
  • 3
  • +2
17 Comments
 
LVL 84

Accepted Solution

by:
Dave Baldwin earned 252 total points
ID: 34203140
The key pieces of a search engine are a web crawler to download pages, a program to extract the information from them that you're looking for and put it into a database, and a website to accept queries and return results from the database.   You will probably use several languages like C/C++ and PHP/HTML/javascript and you will need to understand how to use and organize your database.  You will also need the computer resources to do these things.  The 'big' search engines like Google and Bing have tens of thousands of computers in their own data centers doing these things.
0
 
LVL 46

Assisted Solution

by:tbsgadi
tbsgadi earned 504 total points
ID: 34203326
Have a look at this similar question:

http://www.nairaland.com/nigeria/topic-8148.0.html

Gary
0
 

Author Comment

by:david875
ID: 34204135
do you have documents on how google's system or algorithm work?
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 46

Expert Comment

by:tbsgadi
ID: 34204270
0
 

Author Comment

by:david875
ID: 34208502
Do you have any idea what database google use? what about the crowl? does google store every page that anyone submitted? is this the way google shows the answers? i heared that google search engine is based on Perl language. any idea
0
 
LVL 46

Assisted Solution

by:tbsgadi
tbsgadi earned 504 total points
ID: 34209030
0
 

Author Comment

by:david875
ID: 34209273
i think i've seen this, Do you have any idea what database google use? what about the crowl? does google store every page that anyone submitted? is this the way google shows the answers? i heared that google search engine is based on Perl language. any idea
0
 
LVL 51

Assisted Solution

by:Ted Bouskill
Ted Bouskill earned 996 total points
ID: 34210584
Google uses their own proprietary database http://labs.google.com/papers/bigtable.html

Hadoop is capable of being the basis for a search database: http://hadoop.apache.org/index.html

Before there was Google there used to be 'spiders' or 'bots' for crawling the web: http://en.wikipedia.org/wiki/Web_crawler

ALL the hard working code at Google is in C/C++ (I have contacts that work at Google)

They use Perl or other technology for maintenance work or running their websites.  Look at this site: http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=perl&lang2=gpp  C is substantially faster than Perl.  Let's say for the sake of argument that C is ten times faster (which is conservative) then on a single computer a C program could execute in 1/10 the time.  That may not be a big deal, however, it also means it uses 1/10 the server resources.  As you start to scale a Perl program would likely require ten times as many servers to handle the same capacity as one server running C.

No search engine would store the entire page.  Can you imagine how much space would be required to do so?  They store a snippet of the text plus keywords and that's it.
0
 
LVL 15

Assisted Solution

by:efn
efn earned 248 total points
ID: 34210868
Actually, Google stores snapshots of many (though not all) of the pages they crawl in a cache.  When you see a "Cached" link in Google search results, that goes to Google's stored copy, not the original page.

http://www.googleguide.com/cached_pages.html
0
 

Author Comment

by:david875
ID: 34213436
@tedbilly: thank you for taking time and answer my question, it's really kind of you. When you type a keyword to search about something, the button will go and contact the database which is the BigTable right? Then it will bring the information based on the keyword you typed while searching, and it will show you the information, if i understood, BigTable is a database created by Google and it was written in C/C++ by the google founders right? What does contain this BigTable? does it store keywords in order of popularity? And if this is true the BigTable contacts another database to find the websites related to the keyword? no? so i think there is more than 1 database, i just need clarification to have a good understanding of how this work. and I really thank you from my heart for every word you type to explain me. God bless you guys :)
0
 
LVL 51

Assisted Solution

by:Ted Bouskill
Ted Bouskill earned 996 total points
ID: 34215261
Google uses more than BigTable in addition to other systems and they control their proprietary secrets very carefully.  You have to remember that their system is over a decade old and has had the work of hundreds of full-time developers working on it.  Even if you are 20 years old and work for 80 years full time you won't be able to match the hours they have put into that system.

BigTable and Hadoop are essentially like a wildcard grep distributed database.  Don't think of them like a typical single database running on a single server because they have been designed to run on LOTs of hardware.

All of Google's infrastructure was actually designed to run on huge array's of servers, old and new.  Initially the founders couldn't afford new hardware so they designed all their systems to be distributed over multiple inexpensive servers that run their own proprietary operating system.

The white paper on the link I provided has more detail.
0
 

Author Comment

by:david875
ID: 34216836
Thanks @tedbilly for the answer but google started from the scratch i think and they had a rubbish server, how did they get successful like that? is it because there is no search engine in that time? how did they get famous like this? their algorithms took efect and success since that time?

How the search engine finds the links, is it because users submitted their websites into the search engine? or what?
0
 
LVL 51

Assisted Solution

by:Ted Bouskill
Ted Bouskill earned 996 total points
ID: 34220616
Did you read the Google history link in comment http:#34209030 by tbsgadi?  They actually describe how Google first designed the system.

There were other search engines at the time Google started (also in the article) and they received funding while is school from the National Science Foundation.  They also benefited from starting up during the dotCom bubble of the late 90's.
http://en.wikipedia.org/wiki/History_of_Google#Financing_and_initial_public_offering

I would never say it's impossible to start another search engine company, however, it would be far more difficult that when Google started.  For one, Google is now a brand and is even used as a verb for searching.  People will now say "Why don't you Google it?"  When Google started there company they didn't have to compete against a brand like that.
0
 

Author Comment

by:david875
ID: 34223381
Yes, indeed it's not impossible to start a new search engine, this will require a hard work to do and study every piece of the search engine, and master programming languages only if you found a sponsor to help you getting it like bing or any other new search engine. One last question, did google submitted or add all urls and sites into their database?
0
 
LVL 51

Assisted Solution

by:Ted Bouskill
Ted Bouskill earned 996 total points
ID: 34224050
If you read the history Google finds sites using URL's on a starter site, then ranks based on the links pointing back to the first site.  They can only find sites by finding links to new sites on sites they've already found PLUS if there is a low enough rank they won't even keep it in their database.

So no Google does not have all the possible sites.  No search engine does.
0

Featured Post

Put Machine Learning to Work--Protect Your Clients

Machine learning means Smarter Cybersecurity™ Solutions.
As technology continues to advance, managing and analyzing massive data sets just can’t be accomplished by humans alone. It requires huge amounts of memory and storage, as well as high-speed processing of the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Dramatic changes are revolutionizing how we build and use technology. Every company is automating, digitizing, and modernizing operations. We need a better, more connected way to work together as teams so we can harness the insights from our system…
Quickbooks hosting can do wonders to your enterprise but considering the points elaborated in the article which will help you to better analyze the outcomes. So scan your business, its needs and then move to the new world of limitless benefits.
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
Suggested Courses

886 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question