• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 758
  • Last Modified:

Getting Data from Web Query

Have a website....  Parent page contains multiple license numbers....  Need to "click" each license number...  Then next page presents with data I need t capture in a web query within excel...  

There are a total of 15 entries on the parent page; however, there are approximately 615 of the parent pages, with the undelying data I need to get at on a seperate page that is accessed by clicking one item on the parent page.... I know that I can create a Web Query that will extract information from one page; however, I need to progrommatically step through each parent page, and then bring up each of the 15 "child" pages, and extract the data from the "child page...

Any suggestions, or resources I could consult to accomplish??

Thanks in advance.

Rick D Norris, CPA
Conrad, Burnett, Norris, and Gordon, LLC
 
0
Rick Norris
Asked:
Rick Norris
  • 3
  • 2
  • 2
1 Solution
 
nutschCommented:
Does the URL of the child page expose the codes of the parents and or children entries? The same way the following url exposes the fact that I'm searching for internet vba, and that I'm looking at results # 20 and after:

http://www.google.com/#q=internet+vba&hl=en&biw=1680&bih=905&prmd=ivnsfd&ei=_H2_TdbNIZO-sQPAo6jQAw&start=20&sa=N&bav=on.2,or.r_gc.r_pw.&fp=c9365e657b732f11

In that case, you can run a loop on web query in excel.

If not, you might have to analyze the source code of the page further. You could post a template for more precise input.

Thomas
0
 
thydzikCommented:
is this a once off thing or needs to performed regularly?

if you could provide some example parent and child pages, with obfuscated text, that would help.


0
 
Rick NorrisAuthor Commented:
Code below is from "Parent" page...  Relevant info to capture here is the License Number....  When license number is "clicked", carries you to relevant child page with additional info...  Posting code of "Child" page after this...

======================  BEGIN PARENT PAGE  ===================================

      <table cellspacing="0" cellpadding="4" rules="all" border="1" id="ctl00_ContentPlaceHolder1_GridView1" style="color:#333333;width:95%;border-collapse:collapse;">
            <tr style="color:White;background-color:#990000;font-weight:bold;">
                  <th scope="col"><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Sort$LicenseNo')" style="color:White;">License Number</a></th><th scope="col"><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Sort$Name')" style="color:White;">Name</a></th><th scope="col"><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Sort$City')" style="color:White;">City</a></th><th scope="col"><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Sort$State')" style="color:White;">State</a></th>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-43893">S-43893</a></td><td>1 RELIABLE FIRE PROTECTION INCORPORATED</td><td>BYRON CENTER</td><td>MI</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=42894">42894</a></td><td>184 BUILDERS LLC</td><td>BLACKSBURG</td><td>SC</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-46463">S-46463</a></td><td>1ST CHOICE ELECTRICAL LLC</td><td>WARRIOR</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=S-41346">S-41346</a></td><td>2000 CONCRETE LLC</td><td>ATLANTA</td><td>GA</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-45260">S-45260</a></td><td>2000 CONCRETE STRUCTURES LLC</td><td>ATLANTA</td><td>GA</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=46799">46799</a></td><td>270 CONSTRUCTION INC</td><td>SAMSON</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=25428">25428</a></td><td>2-K STEEL PRODUCTS INC</td><td>ASHVILLE</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=26016">26016</a></td><td>2-WAY, INC</td><td>METAIRIE</td><td>LA</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-45836">S-45836</a></td><td>2-WD INC</td><td>HUNTSVILLE</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=S-41436">S-41436</a></td><td>3 D MECHANICAL LLC</td><td>ALABASTER</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-41887">S-41887</a></td><td>3 G METALWORKS INC</td><td>DEATSVILLE</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=S-45965">S-45965</a></td><td>360 CONSTRUCTION, LLC</td><td>OZARK</td><td>AL</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=40687">40687</a></td><td>3M COMPANY</td><td>ST PAUL</td><td>MN</td>
            </tr><tr style="color:#333333;background-color:White;">
                  <td><a href="detail.aspx?licenseno=39086">39086</a></td><td>4 H CONSTRUCTION CORPORATION</td><td>CLEVELAND</td><td>MS</td>
            </tr><tr style="color:#333333;background-color:#FFFBD6;">
                  <td><a href="detail.aspx?licenseno=S-41734">S-41734</a></td><td>4 SONS CONSTRUCTION LLC</td><td>WINSTON</td><td>GA</td>
            </tr><tr align="center" style="color:#333333;background-color:#FFCC66;">
                  <td colspan="4"><table border="0">
                        <tr>
                              <td><span>1</span></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$2')" style="color:#333333;">2</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$3')" style="color:#333333;">3</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$4')" style="color:#333333;">4</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$5')" style="color:#333333;">5</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$6')" style="color:#333333;">6</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$7')" style="color:#333333;">7</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$8')" style="color:#333333;">8</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$9')" style="color:#333333;">9</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$10')" style="color:#333333;">10</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$11')" style="color:#333333;">...</a></td><td><a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$GridView1','Page$Last')" style="color:#333333;">&gt;&gt;</a></td>
                        </tr>
                  </table></td>
            </tr>
      </table>

=======================  END PARENT PAGE  ===============================

An example, the "child" code following represents the source code obtained when the FIRST License number in the above "Parent" page is "clicked"....  licenseno=S-43893......  Child code follows:

====================== BEGIN CHILD CODE  ===============================

 <table border="0" cellpadding="0" cellspacing="0" style="width: 100%">
                                                        <tr>
                                                            <td>
                                                               
      <h4>
        <font color="#005594">Search Results Detail</font><br />
        &nbsp;
   
    <font face="Arial" size="3">&nbsp;
   
   
    </FONT>
    </h4><table cellspacing="0" align="Center" border="0" id="ctl00_ContentPlaceHolder1_FormView1" style="width:90%;border-collapse:collapse;">
      <tr>
            <td colspan="2">
            <table border="1" bordercolor="#800000" cellpadding="4" cellspacing="0" style="border-collapse: collapse"
                width="100%">
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">License Number</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <b><font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_LicenseNoLabel">S-43893</span><span
                                lang="en-us">&nbsp; </span></font></b>
                    </td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Name</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_NameLabel">1 RELIABLE FIRE PROTECTION INCORPORATED</span></font><font
                                size="2"><span lang="en-us">&nbsp; </span></font>
                    </td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Address</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_AddressLabel">1000 100TH SW, STE A</span></font><span
                                lang="en-us"></span></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">City</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_CityLabel">BYRON CENTER</span></font></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">State</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_StateLabel">MI</span></font></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Zip Code</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_ZipLabel">49315</span></font></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Phone Number</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_PhonenoLabel">(616) 877-3300</span></font><span
                                lang="en-us"></span></td>
                </tr>
                <tr>
                    <font size="3"></font><font face="Arial" size="3"></font>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Fax Number</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_FaxLabel">(616) 877-3308</span></font><span
                                lang="en-us"></span></td>
                </tr>
                <tr>
                    <font size="3"></font>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Bid Limit</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_BidlimitLabel"></span></font></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <b><font color="#005594" size="2">Specialty</font></b></td>
                    <td align="left" valign="top" width="75%">
                        <font size="2">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_SpecialtyLabel">SUBCONTRACTOR:  FIRE PROTECTION SYSTEMS</span></font></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <span lang="en-us"><font color="#005594"><b>Expiration Date</b></font></span></td>
                    <td align="left" valign="top" width="75%">
                        <span lang="en-us">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_Expr1Label">5/31/2011</span><br />
                            &nbsp;<a href="../Renewal_Info.htm"><span style="color: #0000ff">More information on
                                License Renewals.</span></a></span></td>
                </tr>
                <tr>
                    <td align="right" valign="top" width="25%">
                        <span lang="en-us"><font color="#005594"><b>Extension Date</b></font></span></td>
                    <td align="left" valign="top" width="75%">
                        <span lang="en-us">
                            <span id="ctl00_ContentPlaceHolder1_FormView1_ExtensionLabel"></span><br /></td>
                </tr>
            </table>
======================  END OF CHILD PAGE  ==================================


License #.....  can be captured either at "parent" page or "child" page ......

Ideal solution would be the ability to loop through all the pages to get detail as below in an excel spreadsheet......  There are approximately 9300 records in total to capture.

Relevant info in above example of the data to be captured from the Child Page is as follows:

License #:  S-43893
Name:        1 RELIABLE FIRE PROTECTION INCORPORATED  
Address:   1000 100TH SW, STE A
City:           BYRON CENTER
State:         MI
Zip:            49315
Phone:       (616) 877-3300
Fax:           (616) 877-3308
Bid Limit:                                       ***  blank on this record; however, non blank in most records
Specialty:  SUBCONTRACTOR: FIRE PROTECTION SYSTEMS
Expiration Date:  5/31/2011

In answer to question posed by thydzik....  This is something that would ideally be run on a quarterly basis.

Any help is greatly appreciated......

If someone needs additional info, please let me know.

Thanks again,

Rick D Norris, CPA/CFE
Conrad, Burnett, Norris and Gordon, LLC
Birmingham, AL
0
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
thydzikCommented:
these are the two approaches I would take;

1.
Download all the html, with a program like WebZip.
Create some vba code to parse the html to a worksheet.

2.
install php executable on your machine locally, create some code to fetch the parents/childs, parse the html and output to some intermediate file (csv file).
Read this into Excel.

It is all possible, though a lot of work. I am not sure how much detail you want use to go into?

also, It might be better suited to parse this information into a database.
0
 
Rick NorrisAuthor Commented:
thydzik....

I think my major problem is in your suggestions... suggesstion 1 - "create some vba code...
suggestion 2....  creatae some code to fetch parents/childs....

If I knew HOW to fetch ALL of the relevant html, then I COULD write some vb code to parse.....  My major problem is how to get at ALL of the child html pages.....?????

Rick
0
 
nutschCommented:
What's the url of the parent page, or the source of the page that links to the parent page?
0
 
thydzikCommented:
rick_norris,

have a look at these examples;

parse html with vba

parse html wil php
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

  • 3
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now