[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

Getting Data from Web Query

Posted on 2011-05-02
7
Medium Priority
?
694 Views
Last Modified: 2012-05-11
Have a website....  Parent page contains multiple license numbers....  Need to "click" each license number...  Then next page presents with data I need t capture in a web query within excel...  

There are a total of 15 entries on the parent page; however, there are approximately 615 of the parent pages, with the undelying data I need to get at on a seperate page that is accessed by clicking one item on the parent page.... I know that I can create a Web Query that will extract information from one page; however, I need to progrommatically step through each parent page, and then bring up each of the 15 "child" pages, and extract the data from the "child page...

Any suggestions, or resources I could consult to accomplish??

Thanks in advance.

Rick D Norris, CPA
Conrad, Burnett, Norris, and Gordon, LLC
 
0
Comment
Question by:Rick Norris
  • 3
  • 2
  • 2
7 Comments
 
LVL 39

Expert Comment

by:nutsch
ID: 35510015
Does the URL of the child page expose the codes of the parents and or children entries? The same way the following url exposes the fact that I'm searching for internet vba, and that I'm looking at results # 20 and after:

http://www.google.com/#q=internet+vba&hl=en&biw=1680&bih=905&prmd=ivnsfd&ei=_H2_TdbNIZO-sQPAo6jQAw&start=20&sa=N&bav=on.2,or.r_gc.r_pw.&fp=c9365e657b732f11

In that case, you can run a loop on web query in excel.

If not, you might have to analyze the source code of the page further. You could post a template for more precise input.

Thomas
0
 
LVL 11

Expert Comment

by:thydzik
ID: 35511507
is this a once off thing or needs to performed regularly?

if you could provide some example parent and child pages, with obfuscated text, that would help.


0
 

Author Comment

by:Rick Norris
ID: 35512251
Code below is from "Parent" page...  Relevant info to capture here is the License Number....  When license number is "clicked", carries you to relevant child page with additional info...  Posting code of "Child" page after this...

======================  BEGIN PARENT PAGE  ===================================

      <table cellspacing="0" cellpadding="4" rules="all" border="1" id="ctl00_ContentPlaceHolder1_GridView1" style="color:#333333;width:95%;border-collapse:collapse;">
            <tr style="color:White;background-color:#990000;font-weight:bold;">
                  <th scope="col"><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Sort$LicenseNo')" style="color:White;">License Number</a></th><th scope="col"><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Sort$Name')" style="color:White;">Name</a></th><th scope="col"><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Sort$City')" style="color:White;">City</a></th><th scope="col"><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Sort$State')" style="color:White;">State</a></th>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-43893">S-43893</a></td><td>1 RELIABLE FIRE PROTECTION INCORPORATED</td><td>BYRON CENTER</td><td>MI</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=42894">42894</a></td><td>184 BUILDERS LLC</td><td>BLACKSBURG</td><td>SC</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-46463">S-46463</a></td><td>1ST CHOICE ELECTRICAL LLC</td><td>WARRIOR</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=S-41346">S-41346</a></td><td>2000 CONCRETE LLC</td><td>ATLANTA</td><td>GA</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-45260">S-45260</a></td><td>2000 CONCRETE STRUCTURES LLC</td><td>ATLANTA</td><td>GA</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=46799">46799</a></td><td>270 CONSTRUCTION INC</td><td>SAMSON</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=25428">25428</a></td><td>2-K STEEL PRODUCTS INC</td><td>ASHVILLE</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=26016">26016</a></td><td>2-WAY, INC</td><td>METAIRIE</td><td>LA</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-45836">S-45836</a></td><td>2-WD INC</td><td>HUNTSVILLE</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=S-41436">S-41436</a></td><td>3 D MECHANICAL LLC</td><td>ALABASTER</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-41887">S-41887</a></td><td>3 G METALWORKS INC</td><td>DEATSVILLE</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=S-45965">S-45965</a></td><td>360 CONSTRUCTION, LLC</td><td>OZARK</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=40687">40687</a></td><td>3M COMPANY</td><td>ST PAUL</td><td>MN</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=39086">39086</a></td><td>4 H CONSTRUCTION CORPORATION</td><td>CLEVELAND</td><td>MS</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-41734">S-41734</a></td><td>4 SONS CONSTRUCTION LLC</td><td>WINSTON</td><td>GA</td>
            </tr><tr align="center" style="color:#333333;background-color:#FFCC66;">
                  <td colspan="4"><table border="0">
                        <tr>
                              <td><span>1</span></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$2')" style="color:#333333;">2</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$3')" style="color:#333333;">3</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$4')" style="color:#333333;">4</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$5')" style="color:#333333;">5</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$6')" style="color:#333333;">6</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$7')" style="color:#333333;">7</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$8')" style="color:#333333;">8</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$9')" style="color:#333333;">9</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$10')" style="color:#333333;">10</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$11')" style="color:#333333;">...</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$Last')" style="color:#333333;">&gt;&gt;</a></td>
                        </tr>
                  </table></td>
            </tr>
      </table>

=======================  END PARENT PAGE  ===============================

An example, the "child" code following represents the source code obtained when the FIRST License number in the above "Parent" page is "clicked"....  licenseno=S-43893......  Child code follows:

====================== BEGIN CHILD CODE  ===============================

 <table border="0" cellpadding="0" cellspacing="0" style="width: 100%">
                                                        <tr>
                                                            <td>
                                                               
      <h4>
        <font color="#005594">Search Results Detail</font><br />
        &nbsp;
   
    <font face="Arial" size="3">&nbsp;
   
   
    </FONT>
    </h4><table cellspacing="0" align="Center" border="0" id="ctl00_ContentPlaceHolder1_FormView1" style="width:90%;border-collapse:collapse;">
      <tr>
            <td colspan="2">
            <table border="1" bordercolor="#800000" cellpadding="4" cellspacing="0" style="border-collapse: collapse"
                width="100%">
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">License Number</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <b><font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_LicenseNoLabel">S-43893</span><span
                                lang="en-us">&nbsp; </span></font></b>
                    </td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Name</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_NameLabel">1 RELIABLE FIRE PROTECTION INCORPORATED</span></font><font
                                size="2"><span lang="en-us">&nbsp; </span></font>
                    </td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Address</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_AddressLabel">1000 100TH SW, STE A</span></font><span
                                lang="en-us"></span></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">City</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_CityLabel">BYRON CENTER</span></font></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">State</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_StateLabel">MI</span></font></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Zip Code</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_ZipLabel">49315</span></font></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Phone Number</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_PhonenoLabel">(616) 877-3300</span></font><span
                                lang="en-us"></span></td>
                </tr>
                <tr>
                    <font size="3"></font><font face="Arial" size="3"></font>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Fax Number</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_FaxLabel">(616) 877-3308</span></font><span
                                lang="en-us"></span></td>
                </tr>
                <tr>
                    <font size="3"></font>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Bid Limit</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_BidlimitLabel"></span></font></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Specialty</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_SpecialtyLabel">SUBCONTRACTOR:  FIRE PROTECTION SYSTEMS</span></font></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <span lang="en-us"><font color="#005594"><b>Expiration Date</b></font></span></td>
                    <td align="left" valign="top" width="75%">
                        <span lang="en-us">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_Expr1Label">5/31/2011</span><br />
                            &nbsp;<a href="../Renewal_Info.htm"><span style="color: #0000ff">More information on
                                License Renewals.</span></a></span></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <span lang="en-us"><font color="#005594"><b>Extension Date</b></font></span></td>
                    <td align="left" valign="top" width="75%">
                        <span lang="en-us">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_ExtensionLabel"></span><br /></td>
                </tr>
            </table>
======================  END OF CHILD PAGE  ==================================


License #.....  can be captured either at "parent" page or "child" page ......

Ideal solution would be the ability to loop through all the pages to get detail as below in an excel spreadsheet......  There are approximately 9300 records in total to capture.

Relevant info in above example of the data to be captured from the Child Page is as follows:

License #:  S-43893
Name:        1 RELIABLE FIRE PROTECTION INCORPORATED  
Address:   1000 100TH SW, STE A
City:           BYRON CENTER
State:         MI
Zip:            49315
Phone:       (616) 877-3300
Fax:           (616) 877-3308
Bid Limit:                                       ***  blank on this record; however, non blank in most records
Specialty:  SUBCONTRACTOR: FIRE PROTECTION SYSTEMS
Expiration Date:  5/31/2011

In answer to question posed by thydzik....  This is something that would ideally be run on a quarterly basis.

Any help is greatly appreciated......

If someone needs additional info, please let me know.

Thanks again,

Rick D Norris, CPA/CFE
Conrad, Burnett, Norris and Gordon, LLC
Birmingham, AL
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 11

Expert Comment

by:thydzik
ID: 35512341
these are the two approaches I would take;

1.
Download all the html, with a program like WebZip.
Create some vba code to parse the html to a worksheet.

2.
install php executable on your machine locally, create some code to fetch the parents/childs, parse the html and output to some intermediate file (csv file).
Read this into Excel.

It is all possible, though a lot of work. I am not sure how much detail you want use to go into?

also, It might be better suited to parse this information into a database.
0
 

Author Comment

by:Rick Norris
ID: 35514960
thydzik....

I think my major problem is in your suggestions... suggesstion 1 - "create some vba code...
suggestion 2....  creatae some code to fetch parents/childs....

If I knew HOW to fetch ALL of the relevant html, then I COULD write some vb code to parse.....  My major problem is how to get at ALL of the child html pages.....?????

Rick
0
 
LVL 39

Expert Comment

by:nutsch
ID: 35514975
What's the url of the parent page, or the source of the page that links to the parent page?
0
 
LVL 11

Accepted Solution

by:
thydzik earned 2000 total points
ID: 35515494
rick_norris,

have a look at these examples;

parse html with vba

parse html wil php
0

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Some code to ensure data integrity when using macros within Excel. Also included code that helps secure your data within an Excel workbook.
Microsoft's Excel has many features that most people will never need nor take advantage of.  Conditional formatting is one feature that you may find a necessity once you start using it.
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…
This Micro Tutorial demonstrate the bugs in Microsoft Excel for Mac with Pivot Charts.

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question