• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 168
  • Last Modified:

Problem in following the redirects!!

Hi,

I am trying to follow a redirect which is like.

<META HTTP-EQUIV="REFRESH" CONTENT="0;URL=some relative path here">

But my code is not working. The AttributeSet "attrs" is getting null value.

Can anyone help me.

String redirectURL = null;
try
{
      Reader reader = new StringReader(urlContent );
      // here urlcontent contains the html code of any webpage
      EditorKit kit = new HTMLEditorKit();
      HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument();
      doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
      kit.read(reader, doc, 0);
      HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.META);
      while (it.isValid())
      {      
            AttributeSet attrs =  it.getAttributes();
            String httpEquiv = (String) attrs.getAttribute(HTML.Attribute.HTTPEQUIV);
            String content = (String) attrs.getAttribute(HTML.Attribute.CONTENT);
            if ("REFRESH".equalsIgnoreCase(httpEquiv) && content != null)
            {      
                  String[] strings = content.split(";");
                  String timeAttr = strings[0].trim();
                  String urlAttr = strings[1].replaceAll(" ", "");
                  System.out.println("time => " + timeAttr);
                  System.out.println("urlAttr => " + urlAttr);
                if ("0".equals(timeAttr) && urlAttr.toLowerCase().indexOf("url=")== 0)
                {      redirectURL = urlAttr.substring(4);
                  break;
                }
            }      
      it.next();
      }
}catch (Exception e)
      {
            e.printStackTrace();
      }
0
sumantedla
Asked:
sumantedla
  • 7
  • 4
  • 2
1 Solution
 
aozarovCommented:
try:
if ("0".equals(timeAttr) && urlAttr.toLowerCase().indexOf("url=") >=  0)
0
 
sumantedlaAuthor Commented:

Where to put that code??? I didnt get you.

I will once again explain it. The problem is with

=>  AttributeSet attrs =  it.getAttributes();

the attrs is getting a null value. There are META tags in the urlContent. But is is unable to retrieve.

To be exact, the urlContent is
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en,us">
<HEAD>  
<META http-equiv="REFRESH" content="0;URL=/pls/portal/portalp.home"></HEAD><body></BODY></HTML>

0
 
aozarovCommented:
Sorry, didn't see your ";" tokenizing so I suggested
urlAttr.toLowerCase().indexOf("url=")>= 0
instead of
urlAttr.toLowerCase().indexOf("url=")== 0

Never used HTMLDocument.Iterator but shouldn't you call next "before" each iteration (like jdbc hasNext or starndard iterators)?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
aozarovCommented:
Looking at the source code of HtmlDocument.Iterator (which is actually LeafIterator it doesn't seem that you need to create next before).
0
 
aozarovCommented:
Typo: create next before -> call next before.
0
 
aozarovCommented:
did you try calling  it.getTag().toString()  instead?
0
 
sumantedlaAuthor Commented:
I tried,

System.out.println("Tag =>" + it.getTag());

It is printing "meta".

But the attrs is becoming null. Does the method getAttributes() of HTMLDocument.Iterator works fine??
0
 
aozarovCommented:
I think so
http://www.javaalmanac.com/cgi-bin/search/find.pl?words=HTMLDocument

If that doesn't work for you then you can have a look at http://httpunit.sourceforge.net/ which can function in a similar fashion.
see: http://httpunit.sourceforge.net/doc/cookbook.html
0
 
objectsCommented:
What version of Java are you running it on?
0
 
sumantedlaAuthor Commented:
i tried it on both versions 1.4 and 1.5
0
 
aozarovCommented:
If you decide to go with httpunit then
http://httpunit.sourceforge.net/doc/api/com/meterware/httpunit/WebResponse.html#getMetaTagContent(java.lang.String, java.lang.String)
is probably what you are looking for:

WebConversation wc = new WebConversation();
    WebRequest     req = new GetMethodWebRequest( "http://www.meterware.com/testpage.html" );
    WebResponse   resp = wc.getResponse( req );
...
    resp.getMetaTagContent("HTTP-EQUIV", "REFRESH");
0
 
objectsCommented:
try this:

   HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.META);
     while (it.isValid())
     {    
          AttributeSet attrs =  it.getAttributes();
          if (attrs!=null)
          {
             String httpEquiv = (String) attrs.getAttribute(HTML.Attribute.HTTPEQUIV);
             String content = (String) attrs.getAttribute(HTML.Attribute.CONTENT);
             if ("REFRESH".equalsIgnoreCase(httpEquiv) && content != null)
             {    
                 String[] strings = content.split(";");
                 String timeAttr = strings[0].trim();
                 String urlAttr = strings[1].replaceAll(" ", "");
                 System.out.println("time => " + timeAttr);
                 System.out.println("urlAttr => " + urlAttr);
                 if ("0".equals(timeAttr) && urlAttr.toLowerCase().indexOf("url=")== 0)
                 {
                    redirectURL = urlAttr.substring(4);
                    break;
                 }
             }
         }    
         it.next();
     }
0
 
sumantedlaAuthor Commented:
Still it is not working.

What all I want to do is to extract the links. For that I have to get the pagecontent. But when there is a client side redirection i am unable to get the pagecontent.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 7
  • 4
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now