Link to home
Start Free TrialLog in
Avatar of priyapratheep
priyapratheep

asked on

parse a html file using java

hi
i have a html file i attached that file with this question.

i want to parse this html file and pickup the number <td>10</td>  10 from this...like that i have to take all the numbers 20 , 30 ....and store  them in a variable

html file will be in this format only

please help me to do this

Regards
some line are here
bla bla bla
 
 
<table width="600" border="0" cellpadding="1" cellspacing="0" bgcolor="#4682b4"><tr><td><table width="100%" border="0" cellpadding="4" cellspacing="1"><tr><td>Line #</td><td>Material</td><td>Brand</td><td>Quantity</td><td>CRD</td><td>Status</td></tr>
<tr class="data1">
<td>10</td>
<td>BQ2085DBTR-V1P3G4</td>
<td>TI</td>
<td>20000</td>
<td>2007-09-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>20</td>
<td>BQ2085DBTR-V1P3G4</td>
<td>TI</td>
<td>20000</td>
<td>2007-10-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data1">
<td>30</td>
<td>BQ2085DBTR-V1P3G4</td>
<td>TI</td>
<td>30000</td>
<td>2007-11-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>40</td>
<td>BQ2085DBTR-V1P3G4</td>
<td>TI</td>
<td>30000</td>
<td>2007-12-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data1">
<td>50</td>
<td>BQ29311PWRG4</td>
<td>TI</td>
<td>20000</td>
<td>2007-09-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>60</td>
<td>BQ29311PWRG4</td>
<td>TI</td>
<td>20000</td>
<td>2007-10-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data1">
<td>70</td>
<td>BQ29311PWRG4</td>
<td>TI</td>
<td>30000</td>
<td>2007-11-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>80</td>
<td>BQ29311PWRG4</td>
<td>TI</td>
<td>30000</td>
<td>2007-12-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
</table></td></tr></table>
<BR>Please Login to the Portal if you wish see further details.
<BR><BR>Regards
        </font>
      </TD>
  </TR>
</table>
</BODY>
</HTML>

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of pesmerg
pesmerg
Flag of Türkiye image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of priyapratheep
priyapratheep

ASKER

sorry prakash2007: i am not getting
What I want to do is

In my sql table i have one filed subject (varchar(200) ) .It contains a html file .

I want to parse this subject filed and take the values between <td> and </td>. not all data between <td> and </td>.only after the lines <tr class="header"><td>Header Message </td><td>:</td><td></td></tr>

Regards
<tr class="header"><td>Header Message </td><td>:</td><td></td></tr>
</table></td></tr></table>
<BR><BR>
<table width="600" border="0" cellpadding="1" cellspacing="0" bgcolor="#4682b4"><tr><td><table width="100%" border="0" cellpadding="4" cellspacing="1"><tr><td>Line #</td><td>Material</td><td>Brand</td><td>Quantity</td><td>CRD</td><td>Status</td></tr>
<tr class="data1">
<td>10</td>
<td>TPA6201A1ZQVR</td>
<td>TI</td>
<td>20000</td>
<td>2007-07-15</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>20</td>
<td>TPA6201A1ZQVR</td>
<td>TI</td>
<td>20000</td>
<td>2007-08-20</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data1">
<td>30</td>
<td>TPA6201A1ZQVR</td>
<td>TI</td>
<td>20000</td>
<td>2007-09-20</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>40</td>
<td>TPA6201A1ZQVR</td>
<td>TI</td>
<td>20000</td>
<td>2007-10-20</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
</table></td></tr></table>
<BR>Please Login to the Portal if you wish see further details.
<BR><BR>Regards
        </font>
      </TD>
  </TR>
</table>
</BODY>
</HTML>

Open in new window

Hi,
     The number you want will be always after the line '<tr class="data1">' , '<tr class="data2">' etc right. So the Code snipet I gave you finds this string and get the value between <td> and </td>.

   So if you want the value of Td only after '<tr class="header">'

Add this line before the while in above code.

So, First it will search for header class. Then it will search for the data class and then it will take the value between <TD> and </TD> just after the data class

               
String Header = "<tr class=\"header\">";
Index =  HTML.IndexOf(Header,0) + Header.Length();

Open in new window

With the class

http://fit.c2.com/Release/Source/fit/Parse.java
String content=FileUtils.readFileToString(new File("test.html"));
    	Parse p=new Parse(content,new String[]{"table","tr","td"},2,0);
    	
    	for(int i=0;p.at(i) != p.last();i++){
    		String tdValue=p.at(i).text();
    		try {
	            System.out.println(Integer.parseInt(tdValue));
            } catch (Exception e) {
            	//do nothing, this value is not an integer
            }
    	}

Open in new window

what is this FileUtils mean?
it's from jakarta commons-io library, you don't need it , i use it to load the html file to content string to test your html.
String content="html content in here"
        Parse p=new Parse(content,new String[]{"table","tr","td"},2,0);
        
        for(int i=0;p.at(i) != p.last();i++){
                String tdValue=p.at(i).text();
                try {
                    System.out.println(Integer.parseInt(tdValue));
            } catch (Exception e) {
                //do nothing, this value is not an integer
            }
        }

Open in new window

AT this line i have to convert html file to string right?

String content="html content in here"

for FileUtils what jar file i have to import i google and imported http://commons.apache.org/downloads/download_io.cgi 

pls help

if you reading the html content from a file, you should convert it to string  in that line
.And yes the link refer to commons-io.jar which contains the FileUtils class.
http://commons.apache.org/downloads/download_io.cgi