[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 221
  • Last Modified:

How to parse html page



I am trying to learn to parse html page. What i did is i connected to one the web page got it contents in char array / string.

Now how do i get the my required data . I need to have an idea how to get from here.

For example iam connecting to a page http://www.ussearch.com/consumer/index.jsp and type fn,ln,m, age and you get result in a table ..i would like to grab that. I have the resulted page in str. Now from here how to go



0
cutie_smily
Asked:
cutie_smily
  • 5
  • 5
1 Solution
 
cutie_smilyAuthor Commented:
I do not want the above links. I have already got the huge text in a string. I need to use string methods. So how do i go to that particular line and get the text i want.

example : part of my string has shown below

I should go here and get firstname lastname  city age
how do grab from this text from below string.

need to grab words shown below
searchCity=NEW+YORK
searchState=NY
-----------------------------------------------------------
 Preliminary Search Results for:
"Twinky R Winky"
displayDisplayName('1', "http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249456&searchFName=TEXTILES&searchMName=&searchLName=WINKY&searchCity=NEW+YORK&searchState=NY&searchApproxAge=29&searchStateJurisdiction=NY&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC", 'TEXTILES','','WINKY', '0', 'ENH1078249456', 'off'); displayAgeCityState('-', 'NEW YORK', 'NY'); displayPremiumUrls('&searchFName=TEXTILES&searchMName=&searchLName=WINKY&searchCity=NEW+YORK&searchState=NY&searchApproxAge=29&searchStateJurisdiction=NY&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC', 'ENH1078249456', '**/**/00', '51540c04140a03510'); displayL2Result('0', 'ENH1078249456', 'off');
displayDisplayName('2', "http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249457&searchFName=TIMOTHY&searchMName=J&searchLName=WINKY&searchCity=NEW+LENOX&searchState=IL&searchApproxAge=29&searchStateJurisdiction=IL&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC", 'TIMOTHY','J','WINKY', '1', 'ENH1078249457', 'off'); displayAgeCityState('-', 'NEW LENOX', 'IL'); displayPremiumUrls('&searchFName=TIMOTHY&searchMName=J&searchLName=WINKY&searchCity=NEW+LENOX&searchState=IL&searchApproxAge=29&searchStateJurisdiction=IL&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC', 'ENH1078249457', '**/**/00', '65050a021d5c4d4a8'); displayL2Result('1', 'ENH1078249457', 'off');
2
0
 
aozarovCommented:
import java.util.regex.*;

public class P
{
      public static void main(String st[])
      {
                                     String str = "Twinky R Winky" +
                        "displayDisplayName('1', \"http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249456&searchFName=TEXTILES&searchMName=&searchLName=WINKY&searchCity=NEW+YORK&searchState=NY&searchApproxAge=29&searchStateJurisdiction=NY&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC\", 'TEXTILES','','WINKY', '0', 'ENH1078249456', 'off'); displayAgeCityState('-', 'NEW YORK', 'NY'); displayPremiumUrls('&searchFName=TEXTILES&searchMName=&searchLName=WINKY&searchCity=NEW+YORK&searchState=NY&searchApproxAge=29&searchStateJurisdiction=NY&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC', 'ENH1078249456', '**/**/00', '51540c04140a03510'); displayL2Result('0', 'ENH1078249456', 'off');" +
                             "displayDisplayName('2', \"http://www.ussearch.com/consumer/cwf?adID=10002101&action=browseproduct&searchtab=people&pid=3064&searchPerson=ENH1078249457&searchFName=TIMOTHY&searchMName=J&searchLName=WINKY&searchCity=NEW+LENOX&searchState=IL&searchApproxAge=29&searchStateJurisdiction=IL&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC\", 'TIMOTHY','J','WINKY', '1', 'ENH1078249457', 'off'); displayAgeCityState('-', 'NEW LENOX', 'IL'); displayPremiumUrls('&searchFName=TIMOTHY&searchMName=J&searchLName=WINKY&searchCity=NEW+LENOX&searchState=IL&searchApproxAge=29&searchStateJurisdiction=IL&searchGender=&searchZip=&vid=cfc&searchAgentNotes=PREVIEW-CFC', 'ENH1078249457', '**/**/00', '65050a021d5c4d4a8'); displayL2Result('1', 'ENH1078249457', 'off');";

                   Pattern pattern = Pattern.compile("search(City|State|Person)=([^&]*)?");
                   Matcher matcher = pattern.matcher(str);

                   while (matcher.find())
                         System.out.println(matcher.group(1)  + "=" + matcher.group(2));
      }
}
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
aozarovCommented:
Forgot the output:
G:\java-temp>java P
Person=ENH1078249456
City=NEW+YORK
State=NY
City=NEW+YORK
State=NY
Person=ENH1078249457
City=NEW+LENOX
State=IL
City=NEW+LENOX
State=IL
0
 
cutie_smilyAuthor Commented:
can you explain me in detail. What is compile doing? what does the pattern represent here.

And how are you gettin g output i.e person,city state, and again repeating the same..

Thanks
0
 
aozarovCommented:
I am using the Java regular expression libraries.

Javadoc: http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html
and http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html

"search(City|State|Person)=([^&]*)?" -> this is the search pattern. [which has two groups: group1 = (City|State|Person), group2 = ([^&]*). [^&]* means anything except &)
pattern.matcher(str); -> will apply the search pattern on a given input

while (matcher.find()) --> loop as long as you find the above pattern in the given input
    System.out.println(matcher.group(1)  + "=" + matcher.group(2)); -> group(1) matches the first brackets in the pattern, group(2) matches the second brackets in the pattern


For a short tutorial -> http://java.sun.com/docs/books/tutorial/extra/regex/
For more examples how to use the regexp package: http://www.javaalmanac.com/cgi-bin/search/find.pl?words=regex

0
 
cutie_smilyAuthor Commented:
thanks.
0
 
cutie_smilyAuthor Commented:
"search(City|State|Person)=([^&]*)?"

if i would get Age value so my pattern should be

"search(City|State|Person|Age)=([^&]*)?"

Is search a function?? doesn't look like that.

what is search. Can u tell me

(----stands for
[--stands for
The caret ^ matches the position before the first character in the string
&--
* is repititive


i know it is very hard to explain. I would like to know for what pattern you are looking for and how you came p with pattern.

Thanks
0
 
aozarovCommented:
>> if i would get Age value so my pattern should be "search(City|State|Person|Age)=([^&]*)?"
Yes.

>> what is search. Can u tell me
search is the prefix for city,state,person... -> in your text they are written as searchCity, searchState, ...

(....) -> will "capture" the match inside the brackets so you can later on get it via the group(index) command
[...] -> says match any character inside the squared brackets. e.g [abc] will mactch any character which is either a or b or c.
^ -> this actually has two meanings. in our case (where inside [..]) it means any character which is NOT the character that
comes after it. hence [^a] says match any character which is not a.
& is just & (which is your name=value delimiter)
* -> is repititive (right) -> zero or more matches of what precede it.
[^&] means any character which is no & and [^&]* means the same but zero or more characters which are not &

For short regular expression tutorials check this: http://www.regular-expressions.info/quickstart.html
0
 
cutie_smilyAuthor Commented:
thanks
0
 
aozarovCommented:
:-)
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 5
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now