Link to home
Start Free TrialLog in
Avatar of sumantedla
sumantedla

asked on

Problem in following the redirects!!

Hi,

I am trying to follow a redirect which is like.

<META HTTP-EQUIV="REFRESH" CONTENT="0;URL=some relative path here">

But my code is not working. The AttributeSet "attrs" is getting null value.

Can anyone help me.

String redirectURL = null;
try
{
      Reader reader = new StringReader(urlContent );
      // here urlcontent contains the html code of any webpage
      EditorKit kit = new HTMLEditorKit();
      HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument();
      doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
      kit.read(reader, doc, 0);
      HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.META);
      while (it.isValid())
      {      
            AttributeSet attrs =  it.getAttributes();
            String httpEquiv = (String) attrs.getAttribute(HTML.Attribute.HTTPEQUIV);
            String content = (String) attrs.getAttribute(HTML.Attribute.CONTENT);
            if ("REFRESH".equalsIgnoreCase(httpEquiv) && content != null)
            {      
                  String[] strings = content.split(";");
                  String timeAttr = strings[0].trim();
                  String urlAttr = strings[1].replaceAll(" ", "");
                  System.out.println("time => " + timeAttr);
                  System.out.println("urlAttr => " + urlAttr);
                if ("0".equals(timeAttr) && urlAttr.toLowerCase().indexOf("url=")== 0)
                {      redirectURL = urlAttr.substring(4);
                  break;
                }
            }      
      it.next();
      }
}catch (Exception e)
      {
            e.printStackTrace();
      }
Avatar of aozarov
aozarov

try:
if ("0".equals(timeAttr) && urlAttr.toLowerCase().indexOf("url=") >=  0)
Avatar of sumantedla

ASKER


Where to put that code??? I didnt get you.

I will once again explain it. The problem is with

=>  AttributeSet attrs =  it.getAttributes();

the attrs is getting a null value. There are META tags in the urlContent. But is is unable to retrieve.

To be exact, the urlContent is
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en,us">
<HEAD>  
<META http-equiv="REFRESH" content="0;URL=/pls/portal/portalp.home"></HEAD><body></BODY></HTML>

Sorry, didn't see your ";" tokenizing so I suggested
urlAttr.toLowerCase().indexOf("url=")>= 0
instead of
urlAttr.toLowerCase().indexOf("url=")== 0

Never used HTMLDocument.Iterator but shouldn't you call next "before" each iteration (like jdbc hasNext or starndard iterators)?
Looking at the source code of HtmlDocument.Iterator (which is actually LeafIterator it doesn't seem that you need to create next before).
Typo: create next before -> call next before.
did you try calling  it.getTag().toString()  instead?
I tried,

System.out.println("Tag =>" + it.getTag());

It is printing "meta".

But the attrs is becoming null. Does the method getAttributes() of HTMLDocument.Iterator works fine??
I think so
http://www.javaalmanac.com/cgi-bin/search/find.pl?words=HTMLDocument

If that doesn't work for you then you can have a look at http://httpunit.sourceforge.net/ which can function in a similar fashion.
see: http://httpunit.sourceforge.net/doc/cookbook.html
Avatar of Mick Barry
What version of Java are you running it on?
i tried it on both versions 1.4 and 1.5
ASKER CERTIFIED SOLUTION
Avatar of aozarov
aozarov

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
try this:

   HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.META);
     while (it.isValid())
     {    
          AttributeSet attrs =  it.getAttributes();
          if (attrs!=null)
          {
             String httpEquiv = (String) attrs.getAttribute(HTML.Attribute.HTTPEQUIV);
             String content = (String) attrs.getAttribute(HTML.Attribute.CONTENT);
             if ("REFRESH".equalsIgnoreCase(httpEquiv) && content != null)
             {    
                 String[] strings = content.split(";");
                 String timeAttr = strings[0].trim();
                 String urlAttr = strings[1].replaceAll(" ", "");
                 System.out.println("time => " + timeAttr);
                 System.out.println("urlAttr => " + urlAttr);
                 if ("0".equals(timeAttr) && urlAttr.toLowerCase().indexOf("url=")== 0)
                 {
                    redirectURL = urlAttr.substring(4);
                    break;
                 }
             }
         }    
         it.next();
     }
Still it is not working.

What all I want to do is to extract the links. For that I have to get the pagecontent. But when there is a client side redirection i am unable to get the pagecontent.