Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

how to create meta search engine ?

Posted on 2010-09-11
7
893 Views
Last Modified: 2013-12-13
Meta search engine pass the quires through many search engines like google and yahoo but i want to know how their programming work ?
does it possible to make in php?
is google and other search engine gives rights to use their search engine for meta search engine ?


0
Comment
Question by:savsoft
  • 4
  • 2
7 Comments
 
LVL 31

Expert Comment

by:Marco Gasi
ID: 33652395
Yes they do. You can start to read these pages for Google and Yahoo:

http://code.google.com/intl/it-IT/apis/ajax/
http://developer.yahoo.com/everything.html

You have to learn about curl also: http://php.net/curl

What you want to do is not so trivial to can be done with few lines of code. Good luck.
0
 

Author Comment

by:savsoft
ID: 33652421
Thank you,
ok, i will read this..
0
 
LVL 109

Accepted Solution

by:
Ray Paseur earned 500 total points
ID: 33653149
The general design pattern would be to call each search engine API and save the results, perhaps in an array with one response element for each search engine.  You might want to include Bing, in addition to Google, Yahoo, and the lesser engines.

You would probably take your search terms from a URL argument (in the PHP script this appears in $_GET).  You might want to have a data base to store your results for some period of time, so you are not so dependent on the foreign sites.

These links might be helpful:
http://lmgtfy.com?q=google+search+api
http://developer.yahoo.com/search/boss/
http://msdn.microsoft.com/en-us/library/dd251056.aspx

In case you find that learning CURL is a daunting task (I did), here is a little script that will make a CURL request.  Put the URL you want to retrieve into line 54.

Good luck with your project, ~Ray


<?php // RAY_temp_curl_example.php
error_reporting(E_ALL);

function my_curl($url, $timeout=2, $error_report=FALSE)
{
    $curl = curl_init();

    // HEADERS FROM FIREFOX - APPEARS TO BE A BROWSER REFERRED BY GOOGLE
    $header[] = "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: max-age=0";
    $header[] = "Connection: keep-alive";
    $header[] = "Keep-Alive: 300";
    $header[] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7";
    $header[] = "Accept-Language: en-us,en;q=0.5";
    $header[] = "Pragma: "; // BROWSERS USUALLY LEAVE BLANK

    // SET THE CURL OPTIONS - SEE http://php.net/manual/en/function.curl-setopt.php
    curl_setopt($curl, CURLOPT_URL,            $url);
    curl_setopt($curl, CURLOPT_USERAGENT,      'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6');
    curl_setopt($curl, CURLOPT_HTTPHEADER,     $header);
    curl_setopt($curl, CURLOPT_REFERER,        'http://www.google.com');
    curl_setopt($curl, CURLOPT_ENCODING,       'gzip,deflate');
    curl_setopt($curl, CURLOPT_AUTOREFERER,    TRUE);
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($curl, CURLOPT_FOLLOWLOCATION, TRUE);
    curl_setopt($curl, CURLOPT_TIMEOUT,        $timeout);

    // RUN THE CURL REQUEST AND GET THE RESULTS
    $htm = curl_exec($curl);

    // ON FAILURE HANDLE ERROR MESSAGE
    if ($htm === FALSE)
    {
        if ($error_report)
        {
            $err = curl_errno($curl);
		    $inf = curl_getinfo($curl);
            echo "CURL FAIL: $url TIMEOUT=$timeout, CURL_ERRNO=$err";
            var_dump($inf);
        }
        curl_close($curl);
        return FALSE;
    }

    // ON SUCCESS RETURN XML / HTML STRING
    curl_close($curl);
    return $htm;
}




// USAGE EXAMPLE - PUT YOUR FAVORITE URL HERE
$url = "http://finance.yahoo.com/d/quotes.csv?s=lulu&f=snl1c1ohgvt1";
$htm = my_curl($url);
if (!$htm) die("NO $url");


// SHOW WHAT WE GOT
echo "<pre>";
echo htmlentities($htm);

Open in new window

0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 

Author Comment

by:savsoft
ID: 33653355
Thank you ray and margusG.

I am reading your refered page and found very useful.
Actually i want to start my own search engine to show better search result then existing search engine. But i know that i can't crawl whole web like google nd bing. So i select meta search technology.
Please can you suggest me more if you have any better idea.
I have dedicated server of 500gb space 4gb ram

0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 33653833
Not to discourage you, but what you're working on puts you in direct competition against Google, Yahoo, and Bing -- all of them are spending millions of dollars each month trying to get better search results.  They constantly study each other, and they have unlimited access to the top scientists and engineers.

I think you're in fine shape just using their search results through their APIs.  Just be careful of the terms of service - you may need to pay them if you use their data for commercial purposes.
0
 

Author Comment

by:savsoft
ID: 33654966
is there any service where we can pay for their data usage?
I have also found an amazon web information service
http://aws.amazon.com
There we use all information of any website.
Can i use this information for my search engine. It charges $0.00015 per request. Alexa also powered by it.


0
 

Author Comment

by:savsoft
ID: 33655472
As i understand search technology, according to it all existing search engine have web spider/web crawler program which start visiting website through some initializing url ( known as seeds) and store all website in their database to further index or page rank use. then crawler detect all hyper links and also add it to their seeds list.it need very large space to store these information. it is one type of downloading of whole web. i think it repeat all this process atleast once in 10 days.
if its true then i think its not an advance method. it makes large data transfer.

There is need to develop new technology of search engine....
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article will give core knowledge of JavaScript and will head in to your first JavaScript program. I am Durvesh Naik and I am here to deal with this series of JavaScript. I will teach you JavaScript in part wise , as its quite boring to read big…
JavaScript can be used in a browser to change parts of a webpage dynamically. It begins with the following pattern: If condition W is true, do thing X to target Y after event Z. Below are some tips and tricks to help you get started with JavaScript …
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question