?
Solved

parsing question

Posted on 2004-09-02
4
Medium Priority
?
167 Views
Last Modified: 2010-03-31
hi im trying to get a simple parser going below is my bit of code, I m to read all the contents of an url and store it in a database, the question is how do i know when i reached the end of a page? so that i know when to store everything in my database and move on to the next link

 public void handleStartTag(HTML.Tag t,MutableAttributeSet a, int p)
{  
               if (t == HTML.Tag.A)
      {
                 ahreflink = (String)a.getAttribute(HTML.Attribute.HREF);
                 searchList.add(ahreflink);

                }
      
          if (t == HTML.Tag.TITLE)
                 {    
                  titleFlag=true;
                  }


}

            public void handleText(char[] data, int pos)
            {         
            try{
                title = new String(data);
                content =new String(data);
                  
                  if(titleFlag==false)
                    {                        
                       text = text + " " + content;                   
                  }      
            
                        if(titleFlag==true)
                        {
                         System.out.println("Title: "+ title);
                         titleFlag=false;
                        }

                  }catch(Exception p){p.printStackTrace();}                          
            }//end of handleText


0
Comment
Question by:HomerrSimpson
  • 2
4 Comments
 
LVL 1

Assisted Solution

by:primusmagestri
primusmagestri earned 60 total points
ID: 11962861
Look for the html end tag: </html>. After this tag you can, at most, have some comments.
0
 
LVL 35

Accepted Solution

by:
TimYates earned 240 total points
ID: 11962896
public void handleEndTag( HTML.Tag t, int pos )
0
 

Author Comment

by:HomerrSimpson
ID: 11963108
do you mean something like

public void handleEndTag(HTML.Tag t, int pos)
{
   if (t == HTML.Tag.HTML)
     {

    store "text" in database
    }



}
0
 
LVL 35

Expert Comment

by:TimYates
ID: 11963367
yup...that should do it...
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Java contains several comparison operators (e.g., <, <=, >, >=, ==, !=) that allow you to compare primitive values. However, these operators cannot be used to compare the contents of objects. Interface Comparable is used to allow objects of a cl…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
Suggested Courses

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question