Solved

how to create meta search engine ?

Posted on 2010-09-11
7
896 Views
Last Modified: 2013-12-13
Meta search engine pass the quires through many search engines like google and yahoo but i want to know how their programming work ?
does it possible to make in php?
is google and other search engine gives rights to use their search engine for meta search engine ?


0
Comment
Question by:savsoft
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 2
7 Comments
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 33652395
Yes they do. You can start to read these pages for Google and Yahoo:

http://code.google.com/intl/it-IT/apis/ajax/
http://developer.yahoo.com/everything.html

You have to learn about curl also: http://php.net/curl

What you want to do is not so trivial to can be done with few lines of code. Good luck.
0
 

Author Comment

by:savsoft
ID: 33652421
Thank you,
ok, i will read this..
0
 
LVL 110

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 33653149
The general design pattern would be to call each search engine API and save the results, perhaps in an array with one response element for each search engine.  You might want to include Bing, in addition to Google, Yahoo, and the lesser engines.

You would probably take your search terms from a URL argument (in the PHP script this appears in $_GET).  You might want to have a data base to store your results for some period of time, so you are not so dependent on the foreign sites.

These links might be helpful:
http://lmgtfy.com?q=google+search+api
http://developer.yahoo.com/search/boss/
http://msdn.microsoft.com/en-us/library/dd251056.aspx

In case you find that learning CURL is a daunting task (I did), here is a little script that will make a CURL request.  Put the URL you want to retrieve into line 54.

Good luck with your project, ~Ray


<?php // RAY_temp_curl_example.php
error_reporting(E_ALL);

function my_curl($url, $timeout=2, $error_report=FALSE)
{
    $curl = curl_init();

    // HEADERS FROM FIREFOX - APPEARS TO BE A BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt($curl, CURLOPT_URL,            $url);
    curl_setopt($curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6');
    curl_setopt($curl, CURLOPT_HTTPHEADER,     $header);
    curl_setopt($curl, CURLOPT_REFERER,        'http://www.google.com');
    curl_setopt($curl, CURLOPT_ENCODING,       'gzip,deflate');
    curl_setopt($curl, CURLOPT_AUTOREFERER,    TRUE);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($curl, CURLOPT_TIMEOUT,        $timeout);

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
		    $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}




// USAGE EXAMPLE - PUT YOUR FAVORITE URL HERE
$url = "http://finance.yahoo.com/d/quotes.csv?s=lulu&f=snl1c1ohgvt1";
$htm = my_curl($url);
if (!$htm) die("NO $url");


// SHOW WHAT WE GOT
echo "<pre>";
echo htmlentities($htm);

Open in new window

0
Get 15 Days FREE Full-Featured Trial

Benefit from a mission critical IT monitoring with Monitis Premium or get it FREE for your entry level monitoring needs.
-Over 200,000 users
-More than 300,000 websites monitored
-Used in 197 countries
-Recommended by 98% of users

 

Author Comment

by:savsoft
ID: 33653355
Thank you ray and margusG.

I am reading your refered page and found very useful.
Actually i want to start my own search engine to show better search result then existing search engine. But i know that i can't crawl whole web like google nd bing. So i select meta search technology.
Please can you suggest me more if you have any better idea.
I have dedicated server of 500gb space 4gb ram

0
 
LVL 110

Expert Comment

by:Ray Paseur
ID: 33653833
Not to discourage you, but what you're working on puts you in direct competition against Google, Yahoo, and Bing -- all of them are spending millions of dollars each month trying to get better search results.  They constantly study each other, and they have unlimited access to the top scientists and engineers.

I think you're in fine shape just using their search results through their APIs.  Just be careful of the terms of service - you may need to pay them if you use their data for commercial purposes.
0
 

Author Comment

by:savsoft
ID: 33654966
is there any service where we can pay for their data usage?
I have also found an amazon web information service
http://aws.amazon.com
There we use all information of any website.
Can i use this information for my search engine. It charges $0.00015 per request. Alexa also powered by it.


0
 

Author Comment

by:savsoft
ID: 33655472
As i understand search technology, according to it all existing search engine have web spider/web crawler program which start visiting website through some initializing url ( known as seeds) and store all website in their database to further index or page rank use. then crawler detect all hyper links and also add it to their seeds list.it need very large space to store these information. it is one type of downloading of whole web. i think it repeat all this process atleast once in 10 days.
if its true then i think its not an advance method. it makes large data transfer.

There is need to develop new technology of search engine....
0

Featured Post

Revamp Your Training Process

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

These days socially coordinated efforts have turned into a critical requirement for enterprises.
3 proven steps to speed up Magento powered sites. The article focus is on optimizing time to first byte (TTFB), full page caching and configuring server for optimal performance.
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

729 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question