akzah
asked on
Parsing HTML
Hi
I have html files on my sites, which contain download links in a specfic way. e.g
<p align="center"><font color="#FFFF00"><strong>FS 2004 Aircraft</strong><br>
</font><font color="#87A9DC">Download: </font> <a href="http://www.@@@@.com/cgi-bin/download.pl?url=uploads04/apr/247hal.zip" target="main2">
<u>247hal.zip</u></a><font
color="#87A9DC"> (1936 KB)</font><img
src="images2/uploads04/apr /247hal.jp g"
align="right" hspace="0" width="189" height="73"></p>
<p><font color="#FFFF00"><strong>Au thor:</str ong></font >
<font color="#87A9DC">Heather Sherman</font><br>
<font color="#FFFF00"><strong>Da te:</stron g></font> <font
color="#87A9DC">2004-04-19 </font><br >
<font color="#87A9DC" size="2" face="Times New Roman">FS2004 Boeing
247D, Heather Aviation Ltd.<br>
These are textures ***ONLY*** and are applied to any of Dee Waldron's
models, this one in particular being the WhiteYellow Boeing 247,
filename: boeing247yellowwhite.zip (which contains the entire model).
This is an FS2002 model but this texture package also contains an
updated aircraft.cfg file making it compatible for FS2004.</font></p>
<hr color="#87A9DC">
This pattern is repeated for the rest of the downloads on the page. How would use java to get the urllink, the picture and the the rest of the details on the page. Example
Outputs using system.out.prinln :
FS2004 Aircraft
247hal.zip
1936 KB
Heather Sherman
FS2004 Boeing
247D, Heather Aviation Ltd.<br>
These are textures ***ONLY*** and are applied to any of Dee Waldron's
models, this one in particular being the WhiteYellow Boeing 247,
filename: boeing247yellowwhite.zip (which contains the entire model).
This is an FS2002 model but this texture package also contains an
updated aircraft.cfg file making it compatible for FS2004.
------------
This is being done, so I can change the ouput into a insertsql query.
Thanks in advance of any help
Akbar
I have html files on my sites, which contain download links in a specfic way. e.g
<p align="center"><font color="#FFFF00"><strong>FS
</font><font color="#87A9DC">Download: </font> <a href="http://www.@@@@.com/cgi-bin/download.pl?url=uploads04/apr/247hal.zip" target="main2">
<u>247hal.zip</u></a><font
color="#87A9DC"> (1936 KB)</font><img
src="images2/uploads04/apr
align="right" hspace="0" width="189" height="73"></p>
<p><font color="#FFFF00"><strong>Au
<font color="#87A9DC">Heather Sherman</font><br>
<font color="#FFFF00"><strong>Da
color="#87A9DC">2004-04-19
<font color="#87A9DC" size="2" face="Times New Roman">FS2004 Boeing
247D, Heather Aviation Ltd.<br>
These are textures ***ONLY*** and are applied to any of Dee Waldron's
models, this one in particular being the WhiteYellow Boeing 247,
filename: boeing247yellowwhite.zip (which contains the entire model).
This is an FS2002 model but this texture package also contains an
updated aircraft.cfg file making it compatible for FS2004.</font></p>
<hr color="#87A9DC">
This pattern is repeated for the rest of the downloads on the page. How would use java to get the urllink, the picture and the the rest of the details on the page. Example
Outputs using system.out.prinln :
FS2004 Aircraft
247hal.zip
1936 KB
Heather Sherman
FS2004 Boeing
247D, Heather Aviation Ltd.<br>
These are textures ***ONLY*** and are applied to any of Dee Waldron's
models, this one in particular being the WhiteYellow Boeing 247,
filename: boeing247yellowwhite.zip (which contains the entire model).
This is an FS2002 model but this texture package also contains an
updated aircraft.cfg file making it compatible for FS2004.
------------
This is being done, so I can change the ouput into a insertsql query.
Thanks in advance of any help
Akbar
Use HTMLEditorKit.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi
Thanks for the link, though I don't want to get the links, but just the data. As the output contains no links.
Akbar
Thanks for the link, though I don't want to get the links, but just the data. As the output contains no links.
Akbar
u can use similiar techniques to extract whatever data you require.
ASKER
Can you explain how I can change that code, to get the authors name by its self?
Also, the code does'nt not run, it does'nt seem to recognise HTMLDocument doc = new HTMLDocument() . even when you import import javax.swing.*;
Akbar
Also, the code does'nt not run, it does'nt seem to recognise HTMLDocument doc = new HTMLDocument() . even when you import import javax.swing.*;
Akbar
import javax.swing.text.html.*;
ASKER
The java code still does'nt compile. I have added the following imports:
import javax.swing.text.html.*;
import java.io.*;
import java.net.*;
import javax.swing.text.EditorKit ;
and it has trouble on } catch (BadLocationException e) {.
I just want to get specfic data back from the html, not all of it.
Thanks
Akbar
import javax.swing.text.html.*;
import java.io.*;
import java.net.*;
import javax.swing.text.EditorKit
and it has trouble on } catch (BadLocationException e) {.
I just want to get specfic data back from the html, not all of it.
Thanks
Akbar
what are the errors exactly?
you need to add that method to your class.
you need to add that method to your class.
ASKER
The error from bluej says " cannot resolve symbol" on that line.
If I got the html file, and placed the file data into a txt file, then opened it up and it was all a string, would I be able to split the file up taking the part "<p align="center">....<hr color="#87A9DC">" ?
Thanks again
Akbar
If I got the html file, and placed the file data into a txt file, then opened it up and it was all a string, would I be able to split the file up taking the part "<p align="center">....<hr color="#87A9DC">" ?
Thanks again
Akbar
import javax.swing.text.*;
ASKER
Well it comiples, though whatever url I try it returns with a empty String.
I am totally lost on which way to go about it. There must be way to search a string and return certain parts of it???
Akbar
I am totally lost on which way to go about it. There must be way to search a string and return certain parts of it???
Akbar
you can parse the string manually if you like but its a lot simpler to use an existing parser isn't it?
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks for your help guys, though both way seem to complicated for me, I have put up another post using the string method. I will close this question soon, just in case I get further replies.
Akbar
Akbar