priyapratheep
asked on
parse a html file using java
hi
i have a html file i attached that file with this question.
i want to parse this html file and pickup the number <td>10</td> 10 from this...like that i have to take all the numbers 20 , 30 ....and store them in a variable
html file will be in this format only
please help me to do this
Regards
i have a html file i attached that file with this question.
i want to parse this html file and pickup the number <td>10</td> 10 from this...like that i have to take all the numbers 20 , 30 ....and store them in a variable
html file will be in this format only
please help me to do this
Regards
some line are here
bla bla bla
<table width="600" border="0" cellpadding="1" cellspacing="0" bgcolor="#4682b4"><tr><td><table width="100%" border="0" cellpadding="4" cellspacing="1"><tr><td>Line #</td><td>Material</td><td>Brand</td><td>Quantity</td><td>CRD</td><td>Status</td></tr>
<tr class="data1">
<td>10</td>
<td>BQ2085DBTR-V1P3G4</td>
<td>TI</td>
<td>20000</td>
<td>2007-09-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>20</td>
<td>BQ2085DBTR-V1P3G4</td>
<td>TI</td>
<td>20000</td>
<td>2007-10-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data1">
<td>30</td>
<td>BQ2085DBTR-V1P3G4</td>
<td>TI</td>
<td>30000</td>
<td>2007-11-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>40</td>
<td>BQ2085DBTR-V1P3G4</td>
<td>TI</td>
<td>30000</td>
<td>2007-12-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data1">
<td>50</td>
<td>BQ29311PWRG4</td>
<td>TI</td>
<td>20000</td>
<td>2007-09-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>60</td>
<td>BQ29311PWRG4</td>
<td>TI</td>
<td>20000</td>
<td>2007-10-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data1">
<td>70</td>
<td>BQ29311PWRG4</td>
<td>TI</td>
<td>30000</td>
<td>2007-11-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>80</td>
<td>BQ29311PWRG4</td>
<td>TI</td>
<td>30000</td>
<td>2007-12-05</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
</table></td></tr></table>
<BR>Please Login to the Portal if you wish see further details.
<BR><BR>Regards
</font>
</TD>
</TR>
</table>
</BODY>
</HTML>
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
What I want to do is
In my sql table i have one filed subject (varchar(200) ) .It contains a html file .
I want to parse this subject filed and take the values between <td> and </td>. not all data between <td> and </td>.only after the lines <tr class="header"><td>Header Message </td><td>:</td><td></td></ tr>
Regards
In my sql table i have one filed subject (varchar(200) ) .It contains a html file .
I want to parse this subject filed and take the values between <td> and </td>. not all data between <td> and </td>.only after the lines <tr class="header"><td>Header Message </td><td>:</td><td></td></
Regards
<tr class="header"><td>Header Message </td><td>:</td><td></td></tr>
</table></td></tr></table>
<BR><BR>
<table width="600" border="0" cellpadding="1" cellspacing="0" bgcolor="#4682b4"><tr><td><table width="100%" border="0" cellpadding="4" cellspacing="1"><tr><td>Line #</td><td>Material</td><td>Brand</td><td>Quantity</td><td>CRD</td><td>Status</td></tr>
<tr class="data1">
<td>10</td>
<td>TPA6201A1ZQVR</td>
<td>TI</td>
<td>20000</td>
<td>2007-07-15</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>20</td>
<td>TPA6201A1ZQVR</td>
<td>TI</td>
<td>20000</td>
<td>2007-08-20</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data1">
<td>30</td>
<td>TPA6201A1ZQVR</td>
<td>TI</td>
<td>20000</td>
<td>2007-09-20</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
<tr class="data2">
<td>40</td>
<td>TPA6201A1ZQVR</td>
<td>TI</td>
<td>20000</td>
<td>2007-10-20</td>
<td>Rejected</td>
</tr>
<tr class="header"><TD colspan="6">Line Item Messages </td></tr>
</table></td></tr></table>
<BR>Please Login to the Portal if you wish see further details.
<BR><BR>Regards
</font>
</TD>
</TR>
</table>
</BODY>
</HTML>
Hi,
The number you want will be always after the line '<tr class="data1">' , '<tr class="data2">' etc right. So the Code snipet I gave you finds this string and get the value between <td> and </td>.
So if you want the value of Td only after '<tr class="header">'
Add this line before the while in above code.
So, First it will search for header class. Then it will search for the data class and then it will take the value between <TD> and </TD> just after the data class
The number you want will be always after the line '<tr class="data1">' , '<tr class="data2">' etc right. So the Code snipet I gave you finds this string and get the value between <td> and </td>.
So if you want the value of Td only after '<tr class="header">'
Add this line before the while in above code.
So, First it will search for header class. Then it will search for the data class and then it will take the value between <TD> and </TD> just after the data class
String Header = "<tr class=\"header\">";
Index = HTML.IndexOf(Header,0) + Header.Length();
With the class
http://fit.c2.com/Release/Source/fit/Parse.java
http://fit.c2.com/Release/Source/fit/Parse.java
String content=FileUtils.readFileToString(new File("test.html"));
Parse p=new Parse(content,new String[]{"table","tr","td"},2,0);
for(int i=0;p.at(i) != p.last();i++){
String tdValue=p.at(i).text();
try {
System.out.println(Integer.parseInt(tdValue));
} catch (Exception e) {
//do nothing, this value is not an integer
}
}
ASKER
what is this FileUtils mean?
it's from jakarta commons-io library, you don't need it , i use it to load the html file to content string to test your html.
String content="html content in here"
Parse p=new Parse(content,new String[]{"table","tr","td"},2,0);
for(int i=0;p.at(i) != p.last();i++){
String tdValue=p.at(i).text();
try {
System.out.println(Integer.parseInt(tdValue));
} catch (Exception e) {
//do nothing, this value is not an integer
}
}
ASKER
AT this line i have to convert html file to string right?
String content="html content in here"
for FileUtils what jar file i have to import i google and imported http://commons.apache.org/downloads/download_io.cgi
pls help
String content="html content in here"
for FileUtils what jar file i have to import i google and imported http://commons.apache.org/downloads/download_io.cgi
pls help
if you reading the html content from a file, you should convert it to string in that line
.And yes the link refer to commons-io.jar which contains the FileUtils class.
http://commons.apache.org/downloads/download_io.cgi
.And yes the link refer to commons-io.jar which contains the FileUtils class.
http://commons.apache.org/downloads/download_io.cgi
ASKER