Solved

parsing question

Posted on 2004-09-02
4
159 Views
Last Modified: 2010-03-31
hi im trying to get a simple parser going below is my bit of code, I m to read all the contents of an url and store it in a database, the question is how do i know when i reached the end of a page? so that i know when to store everything in my database and move on to the next link

 public void handleStartTag(HTML.Tag t,MutableAttributeSet a, int p)
{  
               if (t == HTML.Tag.A)
      {
                 ahreflink = (String)a.getAttribute(HTML.Attribute.HREF);
                 searchList.add(ahreflink);

                }
      
          if (t == HTML.Tag.TITLE)
                 {    
                  titleFlag=true;
                  }


}

            public void handleText(char[] data, int pos)
            {         
            try{
                title = new String(data);
                content =new String(data);
                  
                  if(titleFlag==false)
                    {                        
                       text = text + " " + content;                   
                  }      
            
                        if(titleFlag==true)
                        {
                         System.out.println("Title: "+ title);
                         titleFlag=false;
                        }

                  }catch(Exception p){p.printStackTrace();}                          
            }//end of handleText


0
Comment
Question by:HomerrSimpson
  • 2
4 Comments
 
LVL 1

Assisted Solution

by:primusmagestri
primusmagestri earned 20 total points
Comment Utility
Look for the html end tag: </html>. After this tag you can, at most, have some comments.
0
 
LVL 35

Accepted Solution

by:
TimYates earned 80 total points
Comment Utility
public void handleEndTag( HTML.Tag t, int pos )
0
 

Author Comment

by:HomerrSimpson
Comment Utility
do you mean something like

public void handleEndTag(HTML.Tag t, int pos)
{
   if (t == HTML.Tag.HTML)
     {

    store "text" in database
    }



}
0
 
LVL 35

Expert Comment

by:TimYates
Comment Utility
yup...that should do it...
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
Java 1603 Error 2 42
Logs to delete in Windows VCenter server ? 11 156
scoresSpecial  challenge 13 40
Increment alphanumeric sequence 6 79
INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
After being asked a question last year, I went into one of my moods where I did some research and code just for the fun and learning of it all.  Subsequently, from this journey, I put together this article on "Range Searching Using Visual Basic.NET …
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
This video teaches viewers about errors in exception handling.

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now