Solved

parsing question

Posted on 2004-09-02
4
160 Views
Last Modified: 2010-03-31
hi im trying to get a simple parser going below is my bit of code, I m to read all the contents of an url and store it in a database, the question is how do i know when i reached the end of a page? so that i know when to store everything in my database and move on to the next link

 public void handleStartTag(HTML.Tag t,MutableAttributeSet a, int p)
{  
               if (t == HTML.Tag.A)
      {
                 ahreflink = (String)a.getAttribute(HTML.Attribute.HREF);
                 searchList.add(ahreflink);

                }
      
          if (t == HTML.Tag.TITLE)
                 {    
                  titleFlag=true;
                  }


}

            public void handleText(char[] data, int pos)
            {         
            try{
                title = new String(data);
                content =new String(data);
                  
                  if(titleFlag==false)
                    {                        
                       text = text + " " + content;                   
                  }      
            
                        if(titleFlag==true)
                        {
                         System.out.println("Title: "+ title);
                         titleFlag=false;
                        }

                  }catch(Exception p){p.printStackTrace();}                          
            }//end of handleText


0
Comment
Question by:HomerrSimpson
  • 2
4 Comments
 
LVL 1

Assisted Solution

by:primusmagestri
primusmagestri earned 20 total points
ID: 11962861
Look for the html end tag: </html>. After this tag you can, at most, have some comments.
0
 
LVL 35

Accepted Solution

by:
TimYates earned 80 total points
ID: 11962896
public void handleEndTag( HTML.Tag t, int pos )
0
 

Author Comment

by:HomerrSimpson
ID: 11963108
do you mean something like

public void handleEndTag(HTML.Tag t, int pos)
{
   if (t == HTML.Tag.HTML)
     {

    store "text" in database
    }



}
0
 
LVL 35

Expert Comment

by:TimYates
ID: 11963367
yup...that should do it...
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
Java functions are among the best things for programmers to work with as Java sites can be very easy to read and prepare. Java especially simplifies many processes in the coding industry as it helps integrate many forms of technology and different d…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now