Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Exception: no protocol ...

Posted on 2004-10-10
9
Medium Priority
?
408 Views
Last Modified: 2012-06-21
Hey everyone,
I've basically made a program that, basically; let's you/the user know whether the links on a webpage have changed/(new ones added/old ones removed).

Here's the code:

//:::::::::::::::::::::::::::::::::::::::  DomainChangeCheck.java :::::::::::::::::::::::::::::::::::::::::\\
import java.io.*;
import java.util.*;
import java.net.*;
import javax.swing.text.html.*;
import javax.swing.text.*;
import java.text.*;

public class DomainChangeCheck {
      
      static String sDomain = "";
      static String sPage   = "";
      static int http       = 80;
      
      public static void main ( String [] args ) {
            
            DomainChangeCheck dcc = new DomainChangeCheck ();
            
            // Verify that an argument has been passed:
            if ( args.length == 0 ) {
                  System.out.println("Usage: java EeJavaInformer www.domain.com/page.htm");
                  System.exit( 1 );
            }
            
            String s1 = "",
                     s2 = "";
            
            sDomain = args[0];
            if ( args[1].length() > 0 ) sPage = args[1];
            
            // Verify domain existance:
            boolean exist = false;
          try {
              InetAddress addr = InetAddress.getByName( sDomain );
   
              // This constructor will block until the connection succeeds
              Socket socket = new Socket(addr, http);
              exist = true;
              socket.close();
          } catch (Exception e) {      }
          
          // Exit if domain does not exist.
          if ( /*domain does not*/ !exist ) {
                System.out.println( "Domain: " + sDomain + " cannot be found." );
                System.exit( 1 );
          }
            
            boolean bFirstTime = true;
            
      for (;;) {
            
            s2 = s1;
          
          // Retrieve all links in HTML document:
          String [] links = dcc.getLinks( "http://" + sDomain + sPage );
          
          if ( links.length == 0) {
                System.out.println( "Error: Could not find any links in specified document" );
                System.exit( 1 );
          }
          
          s1 = "";
          
          // Check to see if any changes to the links on the page have been made:
          for ( int i=0; i<links.length; i++ ) {
                s1 += links[i];
          }
          
          if ( !(dcc.doDiff( s1, s2 )) ) {
                if ( bFirstTime ) {
                      System.out.println( "\nStarted Successfully..." );
                      bFirstTime = false;
                } else {
                      System.out.println( "\nA change to the page has been made." );
                }
          }
    }
          
    }
      
      private boolean doDiff ( String s1, String s2 ) {
            // return true if the two strings are the same. Else, false.
            
            if ( s1.length() != s2.length() ) return false;
            
            char [] c1 = s1.toCharArray();
            char [] c2 = s2.toCharArray();
            
            boolean bSame = true;
            
            for ( int i=0; i<c1.length; i++ ) {
                  if ( c1[i]!=c2[i] && bSame ) bSame = false;
            }
            
            return bSame;
            
      }
      
    public static String[] getLinks(String uriStr) {
        List result = new ArrayList();
   
        try {
            // Create a reader on the HTML content
            URL url = new URI(uriStr).toURL();
            URLConnection conn = url.openConnection();
            Reader rd = new InputStreamReader(conn.getInputStream());
                
            // Parse the HTML
            EditorKit kit = new HTMLEditorKit();
            HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
            doc.putProperty("IgnoreCharsetDirective", new Boolean(true));
            kit.read(rd, doc, 0);
                
            // Find all the A elements in the HTML document
            HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
            while (it.isValid()) {
                SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
                      
                String link = (String)s.getAttribute(HTML.Attribute.HREF);
                if (link != null) {
                    // Add the link to the result list
                    result.add(link);
                }
                it.next();
            }
        } catch (Exception e) {
              System.out.println( "Exception: " + e.getMessage() + "\n" );
              e.printStackTrace();
        }
          
        // Return all found links
        return (String[])result.toArray(new String[result.size()]);
    }
      
}
//:::::::::::::::::::::::::::::::::::::::  DomainChangeCheck.java :::::::::::::::::::::::::::::::::::::::::\\

Okay now; When I type something like:

java DomainChangeCheck www.positive-websolutions.co.uk /Index.htm

It works fine, but, let's assume that one puts:
java DomainChangeCheck www.experts-exchange.com /Java/

:o\ -- It doesn't work!! I get the following exception (from within the getLinks() method):
well, the  stackTrace:

java.net.MalformedURLException: no protocol: /Programming/Programming_Langauges/Java/
etc..

Any ideas why?

ThanQ :)
[r.D]
0
Comment
Question by:DrWarezz
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
9 Comments
 
LVL 86

Expert Comment

by:CEHJ
ID: 12272005
Are you sure you're not calling getLinks from elsewhere and that the only code you're running is the code you posted?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12272023
Also, instead of, or in addition to:

>>System.out.println( "Exception: " + e.getMessage() + "\n" );

could you do

System.err.println( "URI was " + uriStr );

?


0
 
LVL 9

Author Comment

by:DrWarezz
ID: 12272141
I'm sure that all the code I'm running is what's been posted.

I inserted:
System.err.println( "URI was " + uriStr );
into it, and it outputs:

URI was http://www.experts-exchange.com/Java/

:o\ ?

ThanQ,
[r.D]
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 9

Author Comment

by:DrWarezz
ID: 12272148
According to the stackTrace, it would seem that it's line 106 that the error is occuring on:

Reader rd = new InputStreamReader(conn.getInputStream());

?
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12272154
Yes but i'd like to see *all* the error output with that in it too ;-)
0
 
LVL 9

Author Comment

by:DrWarezz
ID: 12272179
:)
Okay, here's EVERYTHING that's outputted:

Exception: no protocol: /Programming/Programming_Languages/Java/

URI was http://www.experts-exchange.com/Java/
java.net.MalformedURLException: no protocol: /Programming/Programming_Languages/Java/
   at java.net.URL.(init)(URL.java.579)
   at java.net.URL.(init)(URL.java.476)
   at java.net.URL.(init)(URL.java.425)
   at sun.net.www.protocol.http.HttpURLConnection.followRedirect(HttpURLConnection.java:1090)
   at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:681)
   at DomainChangeCheck.getLinks(DomanChangeCheck.java:106)
   at DomainChangeCheck.getLinks(DomanChangeCheck.java:55)
Error: Could not find any links in specified document

And here's the command I type:
java DomainChangeCheck www.experts-exchange.com /Java/

Thanks.
[r.D]
0
 
LVL 86

Accepted Solution

by:
CEHJ earned 2000 total points
ID: 12272229
At the risk of repeating myself (i said this yesterday and was accused of being 'unhelpful') the html packages are pretty flaky and/or intolerant when it comes to parsing html, so you'll have to accept uneven results. That

>>http://www.experts-exchange.com/Java/

does a redirect to

'/Programming/Programming_Languages/Java/'

So if you were to enter http://www.experts-exchange.com/Programming/Programming_Languages/Java/ at the outset, you might have more success
0
 
LVL 9

Author Comment

by:DrWarezz
ID: 12272240
lol - You were right! :D -- It's the redirection.

Thanks alot CEHJ :)
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 12272245
8-)
0

Featured Post

Enroll in October's Free Course of the Month

Do you work with and analyze data? Enroll in October's Course of the Month for 7+ hours of SQL training, allowing you to quickly and efficiently store or retrieve data. It's free for Premium Members, Team Accounts, and Qualified Experts!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Basic understanding on "OO- Object Orientation" is needed for designing a logical solution to solve a problem. Basic OOAD is a prerequisite for a coder to ensure that they follow the basic design of OO. This would help developers to understand the b…
Viewers will learn about if statements in Java and their use The if statement: The condition required to create an if statement: Variations of if statements: An example using if statements:
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …
Suggested Courses

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question