?
Solved

Connection to Yahoo

Posted on 2003-04-01
4
Medium Priority
?
232 Views
Last Modified: 2010-03-31

How can i manage to connect to Yahoo send a query and get the answers back? is there a way to avoid parsing the HTML by using an existing API.. or by using a little trick?


thanks in advance.
0
Comment
Question by:Caillou
  • 2
4 Comments
 
LVL 3

Expert Comment

by:msterjev
ID: 8248202
Use a query :

http://search.yahoo.com/bin/search?p=Java

and HttpURLConnection:

What do you mean by avoiding parsing the HTML? What do you want to retrieve?
0
 
LVL 4

Accepted Solution

by:
ShannonE earned 200 total points
ID: 8248225
Hey man,
The HTML from commercial sites is usually WAY too complicated to parse (thats why XML is so popular ;). You could try Java regular expressions, but chances are your reg expressions for yahoo won't work for some other search site, and even Yahoo might change the structure of its HTML months down the road. But you can do it if you bypass parseing the document and just show it to the user by opening up the default browser from your Java program and loading the results page.

Heres the code:
//-------------------------
//file BrowserControl.java
//-------------------------
import java.io.*;

public class BrowserControl
{
    /**
     * Display a file in the system browser.  If you want to display a
     * file, you must include the absolute path name.
     *
     * @param url the file's url (the url must start with either "http://"
or
     * "file://").
     */
    public static void displayURL(String url)
    {
        boolean windows = isWindowsPlatform();
        String cmd = null;
        try
        {
            if (windows)
            {
                // cmd = 'rundll32 url.dll,FileProtocolHandler http://...'
                cmd = WIN_PATH + " " + WIN_FLAG + " " + url;
                Process p = Runtime.getRuntime().exec(cmd);
            }
            else
            {
                // Under Unix, Netscape has to be running for the "-remote"
                // command to work.  So, we try sending the command and
                // check for an exit value.  If the exit command is 0,
                // it worked, otherwise we need to start the browser.
                // cmd = 'netscape -remote openURL(http://www.javaworld.com)'
                cmd = UNIX_PATH + " " + UNIX_FLAG + "(" + url + ")";
                Process p = Runtime.getRuntime().exec(cmd);
                try
                {
                    // wait for exit code -- if it's 0, command worked,
                    // otherwise we need to start the browser up.
                    int exitCode = p.waitFor();
                    if (exitCode != 0)
                    {
                        // Command failed, start up the browser
                        // cmd = 'netscape http://www.javaworld.com'
                        cmd = UNIX_PATH + " "  + url;
                        p = Runtime.getRuntime().exec(cmd);
                    }
                }
                catch(InterruptedException x)
                {
                    System.err.println("Error bringing up browser, cmd='" +
                                       cmd + "'");
                    System.err.println("Caught: " + x);
                }
            }
        }
        catch(IOException x)
        {
            // couldn't exec browser
            System.err.println("Could not invoke browser, command=" + cmd);
            System.err.println("Caught: " + x);
        }
    }
    /**
     * Try to determine whether this application is running under Windows
     * or some other platform by examing the "os.name" property.
     *
     * @return true if this application is running under a Windows OS
     */
    public static boolean isWindowsPlatform()
    {
        String os = System.getProperty("os.name");
        if ( os != null && os.startsWith(WIN_ID))
            return true;
        else
            return false;

    }
    /**
     * Simple example.
     */
    public static void main(String[] args)
    {
        displayURL("http://www.javaworld.com");
    }
    // Used to identify the windows platform.
    private static final String WIN_ID = "Windows";
    // The default system browser under windows.
    private static final String WIN_PATH = "rundll32";
    // The flag to display a url.
    private static final String WIN_FLAG = "url.dll,FileProtocolHandler";
    // The default browser under unix.
    private static final String UNIX_PATH = "netscape";
    // The flag to display a url.
    private static final String UNIX_FLAG = "-remote openURL";
}


//-------------------------------------
//File Yahoo.java
//-------------------------------------
import java.net.*;
import java.io.*;

public class Yahoo {

   public static void main (String args[]) {
         try {
         // make connection
               URL url = new URL("http://search.yahoo.com/bin/search?p=" +
                    URLEncoder.encode(args[0]));
        URLConnection connection = url.openConnection();
        connection.setDoInput(true);
        InputStream in = connection.getInputStream();

        // read reply
        StringBuffer b = new StringBuffer();
        BufferedReader r = new BufferedReader(new InputStreamReader(in));
        String line;
        while ((line = r.readLine()) != null) {
                  b.append(line);
            }

        String s = b.toString();
            // look for first search result, if any

        System.out.println("Entire string:\n" + s);
               PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("results.html")));
        out.print(s);
        out.close();
        BrowserControl bc = new BrowserControl();
        bc.displayURL("results.html");


    }

    catch (Exception e) { e.printStackTrace(); }

   }
}

To do your search, compile both files and type java Yahoo <search string>. Currently the search string comes from just the first command line arg value but you can easily change this.
Good luck!
0
 

Author Comment

by:Caillou
ID: 8253822
great Code Shannon,


I'm really impressed!

Actually i have to parse the HTML a little, 'cause i need to analyse the URLs returned by the query... I've manage to do so with AltaVista but i took me some time, that's why i meant by avoid parsing the HTML.

Anyway thanks a lot man.
0
 

Author Comment

by:Caillou
ID: 8253830
nothing to add... just excellent ;)
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
This tutorial explains how to use the VisualVM tool for the Java platform application. This video goes into detail on the Threads, Sampler, and Profiler tabs.
Suggested Courses
Course of the Month12 days, 23 hours left to enroll

579 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question