cutie_smily
asked on
How to parse html page
I am trying to learn to parse html page. What i did is i connected to one the web page got it contents in char array / string.
Now how do i get the my required data . I need to have an idea how to get from here.
For example iam connecting to a page http://www.ussearch.com/consumer/index.jsp and type fn,ln,m, age and you get result in a table ..i would like to grab that. I have the resulted page in str. Now from here how to go
ASKER
I do not want the above links. I have already got the huge text in a string. I need to use string methods. So how do i go to that particular line and get the text i want.
example : part of my string has shown below
I should go here and get firstname lastname city age
how do grab from this text from below string.
need to grab words shown below
searchCity=NEW+YORK
searchState=NY
-------------------------- ---------- ---------- ---------- ---
Preliminary Search Results for:
"Twinky R Winky"
displayDisplayName('1', "http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249456&searchFName=TEXTILES&searchMName=&searchLName=WINKY&searchCity=NEW+YORK&searchState=NY&searchApproxAge=29&searchStateJurisdiction=NY&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC", 'TEXTILES','','WINKY', '0', 'ENH1078249456', 'off'); displayAgeCityState('-', 'NEW YORK', 'NY'); displayPremiumUrls('&searc hFName=TEX TILES&sear chMName=&s earchLName =WINKY&sea rchCity=NE W+YORK&sea rchState=N Y&searchAp proxAge=29 &searchSta teJurisdic tion=NY&se archGender =&searchZi p=&vid=cfc &searchAge ntNotes=PR EVIEW-CFC' , 'ENH1078249456', '**/**/00', '51540c04140a03510'); displayL2Result('0', 'ENH1078249456', 'off');
displayDisplayName('2', "http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249457&searchFName=TIMOTHY&searchMName=J&searchLName=WINKY&searchCity=NEW+LENOX&searchState=IL&searchApproxAge=29&searchStateJurisdiction=IL&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC", 'TIMOTHY','J','WINKY', '1', 'ENH1078249457', 'off'); displayAgeCityState('-', 'NEW LENOX', 'IL'); displayPremiumUrls('&searc hFName=TIM OTHY&searc hMName=J&s earchLName =WINKY&sea rchCity=NE W+LENOX&se archState= IL&searchA pproxAge=2 9&searchSt ateJurisdi ction=IL&s earchGende r=&searchZ ip=&vid=cf c&searchAg entNotes=P REVIEW-CFC ', 'ENH1078249457', '**/**/00', '65050a021d5c4d4a8'); displayL2Result('1', 'ENH1078249457', 'off');
2
example : part of my string has shown below
I should go here and get firstname lastname city age
how do grab from this text from below string.
need to grab words shown below
searchCity=NEW+YORK
searchState=NY
--------------------------
Preliminary Search Results for:
"Twinky R Winky"
displayDisplayName('1', "http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249456&searchFName=TEXTILES&searchMName=&searchLName=WINKY&searchCity=NEW+YORK&searchState=NY&searchApproxAge=29&searchStateJurisdiction=NY&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC", 'TEXTILES','','WINKY', '0', 'ENH1078249456', 'off'); displayAgeCityState('-', 'NEW YORK', 'NY'); displayPremiumUrls('&searc
displayDisplayName('2', "http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249457&searchFName=TIMOTHY&searchMName=J&searchLName=WINKY&searchCity=NEW+LENOX&searchState=IL&searchApproxAge=29&searchStateJurisdiction=IL&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC", 'TIMOTHY','J','WINKY', '1', 'ENH1078249457', 'off'); displayAgeCityState('-', 'NEW LENOX', 'IL'); displayPremiumUrls('&searc
2
import java.util.regex.*;
public class P
{
public static void main(String st[])
{
String str = "Twinky R Winky" +
"displayDisplayName('1', \"http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249456&searchFName=TEXTILES&searchMName=&searchLName=WINKY&searchCity=NEW+YORK&searchState=NY&searchApproxAge=29&searchStateJurisdiction=NY&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC\", 'TEXTILES','','WINKY', '0', 'ENH1078249456', 'off'); displayAgeCityState('-', 'NEW YORK', 'NY'); displayPremiumUrls('&searc hFName=TEX TILES&sear chMName=&s earchLName =WINKY&sea rchCity=NE W+YORK&sea rchState=N Y&searchAp proxAge=29 &searchSta teJurisdic tion=NY&se archGender =&searchZi p=&vid=cfc &searchAge ntNotes=PR EVIEW-CFC' , 'ENH1078249456', '**/**/00', '51540c04140a03510'); displayL2Result('0', 'ENH1078249456', 'off');" +
"displayDisplayName('2', \"http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249457&searchFName=TIMOTHY&searchMName=J&searchLName=WINKY&searchCity=NEW+LENOX&searchState=IL&searchApproxAge=29&searchStateJurisdiction=IL&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC\", 'TIMOTHY','J','WINKY', '1', 'ENH1078249457', 'off'); displayAgeCityState('-', 'NEW LENOX', 'IL'); displayPremiumUrls('&searc hFName=TIM OTHY&searc hMName=J&s earchLName =WINKY&sea rchCity=NE W+LENOX&se archState= IL&searchA pproxAge=2 9&searchSt ateJurisdi ction=IL&s earchGende r=&searchZ ip=&vid=cf c&searchAg entNotes=P REVIEW-CFC ', 'ENH1078249457', '**/**/00', '65050a021d5c4d4a8'); displayL2Result('1', 'ENH1078249457', 'off');";
Pattern pattern = Pattern.compile("search(Ci ty|State|P erson)=([^ &]*)?");
Matcher matcher = pattern.matcher(str);
while (matcher.find())
System.out.println(matcher .group(1) + "=" + matcher.group(2));
}
}
public class P
{
public static void main(String st[])
{
String str = "Twinky R Winky" +
"displayDisplayName('1', \"http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249456&searchFName=TEXTILES&searchMName=&searchLName=WINKY&searchCity=NEW+YORK&searchState=NY&searchApproxAge=29&searchStateJurisdiction=NY&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC\", 'TEXTILES','','WINKY', '0', 'ENH1078249456', 'off'); displayAgeCityState('-', 'NEW YORK', 'NY'); displayPremiumUrls('&searc
"displayDisplayName('2', \"http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249457&searchFName=TIMOTHY&searchMName=J&searchLName=WINKY&searchCity=NEW+LENOX&searchState=IL&searchApproxAge=29&searchStateJurisdiction=IL&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC\", 'TIMOTHY','J','WINKY', '1', 'ENH1078249457', 'off'); displayAgeCityState('-', 'NEW LENOX', 'IL'); displayPremiumUrls('&searc
Pattern pattern = Pattern.compile("search(Ci
Matcher matcher = pattern.matcher(str);
while (matcher.find())
System.out.println(matcher
}
}
Forgot the output:
G:\java-temp>java P
Person=ENH1078249456
City=NEW+YORK
State=NY
City=NEW+YORK
State=NY
Person=ENH1078249457
City=NEW+LENOX
State=IL
City=NEW+LENOX
State=IL
G:\java-temp>java P
Person=ENH1078249456
City=NEW+YORK
State=NY
City=NEW+YORK
State=NY
Person=ENH1078249457
City=NEW+LENOX
State=IL
City=NEW+LENOX
State=IL
ASKER
can you explain me in detail. What is compile doing? what does the pattern represent here.
And how are you gettin g output i.e person,city state, and again repeating the same..
Thanks
And how are you gettin g output i.e person,city state, and again repeating the same..
Thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
thanks.
ASKER
"search(City|State|Person) =([^&]*)?"
if i would get Age value so my pattern should be
"search(City|State|Person| Age)=([^&] *)?"
Is search a function?? doesn't look like that.
what is search. Can u tell me
(----stands for
[--stands for
The caret ^ matches the position before the first character in the string
&--
* is repititive
i know it is very hard to explain. I would like to know for what pattern you are looking for and how you came p with pattern.
Thanks
if i would get Age value so my pattern should be
"search(City|State|Person|
Is search a function?? doesn't look like that.
what is search. Can u tell me
(----stands for
[--stands for
The caret ^ matches the position before the first character in the string
&--
* is repititive
i know it is very hard to explain. I would like to know for what pattern you are looking for and how you came p with pattern.
Thanks
>> if i would get Age value so my pattern should be "search(City|State|Person| Age)=([^&] *)?"
Yes.
>> what is search. Can u tell me
search is the prefix for city,state,person... -> in your text they are written as searchCity, searchState, ...
(....) -> will "capture" the match inside the brackets so you can later on get it via the group(index) command
[...] -> says match any character inside the squared brackets. e.g [abc] will mactch any character which is either a or b or c.
^ -> this actually has two meanings. in our case (where inside [..]) it means any character which is NOT the character that
comes after it. hence [^a] says match any character which is not a.
& is just & (which is your name=value delimiter)
* -> is repititive (right) -> zero or more matches of what precede it.
[^&] means any character which is no & and [^&]* means the same but zero or more characters which are not &
For short regular expression tutorials check this: http://www.regular-expressions.info/quickstart.html
Yes.
>> what is search. Can u tell me
search is the prefix for city,state,person... -> in your text they are written as searchCity, searchState, ...
(....) -> will "capture" the match inside the brackets so you can later on get it via the group(index) command
[...] -> says match any character inside the squared brackets. e.g [abc] will mactch any character which is either a or b or c.
^ -> this actually has two meanings. in our case (where inside [..]) it means any character which is NOT the character that
comes after it. hence [^a] says match any character which is not a.
& is just & (which is your name=value delimiter)
* -> is repititive (right) -> zero or more matches of what precede it.
[^&] means any character which is no & and [^&]* means the same but zero or more characters which are not &
For short regular expression tutorials check this: http://www.regular-expressions.info/quickstart.html
ASKER
thanks
:-)
http://www.javaalmanac.com/egs/javax.swing.text.html/GetText.html
http://www.javaalmanac.com/egs/javax.swing.text.html/GetLinks.html