asked on

Problem in following the redirects!!

Hi,

I am trying to follow a redirect which is like.

<META HTTP-EQUIV="REFRESH" CONTENT="0;URL=some relative path here">

But my code is not working. The AttributeSet "attrs" is getting null value.

Can anyone help me.

String redirectURL = null;
try
{
      Reader reader = new StringReader(urlContent );
      // here urlcontent contains the html code of any webpage
      EditorKit kit = new HTMLEditorKit();
      HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument();
      doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
      kit.read(reader, doc, 0);
      HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.META);
      while (it.isValid())
      {
            AttributeSet attrs = it.getAttributes();
            String httpEquiv = (String) attrs.getAttribute(HTML.Attribute.HTTPEQUIV);
            String content = (String) attrs.getAttribute(HTML.Attribute.CONTENT);
            if ("REFRESH".equalsIgnoreCase(httpEquiv) && content != null)
            {
                  String[] strings = content.split(";");
                  String timeAttr = strings[0].trim();
                  String urlAttr = strings[1].replaceAll(" ", "");
                  System.out.println("time => " + timeAttr);
                  System.out.println("urlAttr => " + urlAttr);
              if ("0".equals(timeAttr) && urlAttr.toLowerCase().indexOf("url=")== 0)
              {      redirectURL = urlAttr.substring(4);
                  break;
              }
            }
      it.next();
      }
}catch (Exception e)
      {
            e.printStackTrace();
      }

aozarov

try:
if ("0".equals(timeAttr) && urlAttr.toLowerCase().indexOf("url=") >= 0)

sumantedla

ASKER

Where to put that code??? I didnt get you.

I will once again explain it. The problem is with

=> AttributeSet attrs = it.getAttributes();

the attrs is getting a null value. There are META tags in the urlContent. But is is unable to retrieve.

To be exact, the urlContent is
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en,us">
<HEAD>
<META http-equiv="REFRESH" content="0;URL=/pls/portal/portalp.home"></HEAD><body></BODY></HTML>

aozarov

Sorry, didn't see your ";" tokenizing so I suggested
urlAttr.toLowerCase().indexOf("url=")>= 0
instead of
urlAttr.toLowerCase().indexOf("url=")== 0

Never used HTMLDocument.Iterator but shouldn't you call next "before" each iteration (like jdbc hasNext or starndard iterators)?

aozarov

Looking at the source code of HtmlDocument.Iterator (which is actually LeafIterator it doesn't seem that you need to create next before).

aozarov

Typo: create next before -> call next before.

aozarov

did you try calling it.getTag().toString() instead?

sumantedla

ASKER

I tried,

System.out.println("Tag =>" + it.getTag());

It is printing "meta".

But the attrs is becoming null. Does the method getAttributes() of HTMLDocument.Iterator works fine??

aozarov

I think so
http://www.javaalmanac.com/cgi-bin/search/find.pl?words=HTMLDocument

If that doesn't work for you then you can have a look at http://httpunit.sourceforge.net/ which can function in a similar fashion.
see: http://httpunit.sourceforge.net/doc/cookbook.html

Mick Barry

What version of Java are you running it on?

sumantedla

ASKER

i tried it on both versions 1.4 and 1.5

ASKER CERTIFIED SOLUTION

aozarov

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Mick Barry

try this:

HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.META);
while (it.isValid())
{
AttributeSet attrs = it.getAttributes();
if (attrs!=null)
{
String httpEquiv = (String) attrs.getAttribute(HTML.Attribute.HTTPEQUIV);
String content = (String) attrs.getAttribute(HTML.Attribute.CONTENT);
if ("REFRESH".equalsIgnoreCase(httpEquiv) && content != null)
{
String[] strings = content.split(";");
String timeAttr = strings[0].trim();
String urlAttr = strings[1].replaceAll(" ", "");
System.out.println("time => " + timeAttr);
System.out.println("urlAttr => " + urlAttr);
if ("0".equals(timeAttr) && urlAttr.toLowerCase().indexOf("url=")== 0)
{
redirectURL = urlAttr.substring(4);
break;
}
}
}
it.next();
}

sumantedla

ASKER

Still it is not working.

What all I want to do is to extract the links. For that I have to get the pagecontent. But when there is a client side redirection i am unable to get the pagecontent.