Solved

how to create meta search engine ?

Posted on 2010-09-11
7
888 Views
Last Modified: 2013-12-13
Meta search engine pass the quires through many search engines like google and yahoo but i want to know how their programming work ?
does it possible to make in php?
is google and other search engine gives rights to use their search engine for meta search engine ?


0
Comment
Question by:savsoft
  • 4
  • 2
7 Comments
 
LVL 30

Expert Comment

by:Marco Gasi
ID: 33652395
Yes they do. You can start to read these pages for Google and Yahoo:

http://code.google.com/intl/it-IT/apis/ajax/
http://developer.yahoo.com/everything.html

You have to learn about curl also: http://php.net/curl

What you want to do is not so trivial to can be done with few lines of code. Good luck.
0
 

Author Comment

by:savsoft
ID: 33652421
Thank you,
ok, i will read this..
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 33653149
The general design pattern would be to call each search engine API and save the results, perhaps in an array with one response element for each search engine.  You might want to include Bing, in addition to Google, Yahoo, and the lesser engines.

You would probably take your search terms from a URL argument (in the PHP script this appears in $_GET).  You might want to have a data base to store your results for some period of time, so you are not so dependent on the foreign sites.

These links might be helpful:
http://lmgtfy.com?q=google+search+api
http://developer.yahoo.com/search/boss/
http://msdn.microsoft.com/en-us/library/dd251056.aspx

In case you find that learning CURL is a daunting task (I did), here is a little script that will make a CURL request.  Put the URL you want to retrieve into line 54.

Good luck with your project, ~Ray


<?php // RAY_temp_curl_example.php
error_reporting(E_ALL);

function my_curl($url, $timeout=2, $error_report=FALSE)
{
    $curl = curl_init();

    // HEADERS FROM FIREFOX - APPEARS TO BE A BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt($curl, CURLOPT_URL,            $url);
    curl_setopt($curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6');
    curl_setopt($curl, CURLOPT_HTTPHEADER,     $header);
    curl_setopt($curl, CURLOPT_REFERER,        'http://www.google.com');
    curl_setopt($curl, CURLOPT_ENCODING,       'gzip,deflate');
    curl_setopt($curl, CURLOPT_AUTOREFERER,    TRUE);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($curl, CURLOPT_TIMEOUT,        $timeout);

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
		    $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}




// USAGE EXAMPLE - PUT YOUR FAVORITE URL HERE
$url = "http://finance.yahoo.com/d/quotes.csv?s=lulu&f=snl1c1ohgvt1";
$htm = my_curl($url);
if (!$htm) die("NO $url");


// SHOW WHAT WE GOT
echo "<pre>";
echo htmlentities($htm);

Open in new window

0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:savsoft
ID: 33653355
Thank you ray and margusG.

I am reading your refered page and found very useful.
Actually i want to start my own search engine to show better search result then existing search engine. But i know that i can't crawl whole web like google nd bing. So i select meta search technology.
Please can you suggest me more if you have any better idea.
I have dedicated server of 500gb space 4gb ram

0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 33653833
Not to discourage you, but what you're working on puts you in direct competition against Google, Yahoo, and Bing -- all of them are spending millions of dollars each month trying to get better search results.  They constantly study each other, and they have unlimited access to the top scientists and engineers.

I think you're in fine shape just using their search results through their APIs.  Just be careful of the terms of service - you may need to pay them if you use their data for commercial purposes.
0
 

Author Comment

by:savsoft
ID: 33654966
is there any service where we can pay for their data usage?
I have also found an amazon web information service
http://aws.amazon.com
There we use all information of any website.
Can i use this information for my search engine. It charges $0.00015 per request. Alexa also powered by it.


0
 

Author Comment

by:savsoft
ID: 33655472
As i understand search technology, according to it all existing search engine have web spider/web crawler program which start visiting website through some initializing url ( known as seeds) and store all website in their database to further index or page rank use. then crawler detect all hyper links and also add it to their seeds list.it need very large space to store these information. it is one type of downloading of whole web. i think it repeat all this process atleast once in 10 days.
if its true then i think its not an advance method. it makes large data transfer.

There is need to develop new technology of search engine....
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

Suggested Solutions

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now