Link to home
Start Free TrialLog in
Avatar of njohnson6378
njohnson6378Flag for United States of America

asked on

Scraping data from web, web scrape, screen scrape

I have script that goes to https://www.dmdc.osd.mil/appj/scra/scraHome.do 
 and then enters info which pulls a results page in a new window

I cannot figure out how to get the data from this page back to excel. i need to pull all of the data that shows in the table (im assuming its a table).

thanks in advance
Sub GetData()
Dim IE As Object
Dim doc As Object
Dim strURL As String
Dim cpy As Object
Dim rngssn As Range
Dim rnglastame As Range
Dim rngfirstname As Range
Dim rng As Range


    strURL = "https://www.dmdc.osd.mil/appj/scra/scraHome.do"

    Set IE = CreateObject("InternetExplorer.Application")

    With IE
        .navigate strURL

        Do While .Busy: DoEvents: Loop
        Do While .readyState <> 4: DoEvents: Loop
        .Visible = True

        Set doc = IE.document


    End With

    Set IE = Nothing

Set rngssn = Range("d2")
Set rnglastname = Range("b2")
Set rngfirstname = Range("c2")

 While rngsn.Value <> ""
While rnglasname.Value <> ""
While rngfirstame.Value <> ""


            Application.Wait Now() + TimeValue("00:00:08")
  	doc.getElementByID("ssn").value = rngssn.calue 
	doc.getElementByID("repeatssn").value = rngsn.value
	doc.getElementByID("lastname").value = rnglastname.value
	doc.getElementByID("repeatlastname").value = rnglastname.value
	doc.getElementByID("firstname").value = rngfirstname.value
	doc.getElementByID("repeatfirstname").value = rngfirstname.value

	doc.getElementById("submit").Click 



    ' wait for page to load and grab data
 rng.Offset(, 4).Value = doc.getElementById("start date").innerText

  rng.Offset(, 5).Value = doc.getElementById("status").innerText

  rng.Offset(, 6).Value = doc.getElementById("end date").innerText
 
  rng.Offset(, 7).Value = doc.getElementById("ageny").innerText


  Set rng = rngssn.Offset(1,0) '                goto next ssn number number et.
  Set rng = rnglastname.Offset(1,0)
  Set rng = rngfirstname.Offset(1,0)

 With IE
        .navigate strURL

        
        Do While .Busy: DoEvents: Loop
        Do While .readyState <> 4: DoEvents: Loop
        .Visible = True


        Wend

    End With

    Set IE = Nothing

End Sub

Open in new window

source code for results page


<meta http-equiv='PRAGMA' content='NO-CACHE'> 
<meta http-equiv='EXPIRES' content='0'>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"></meta>
<link REL="StyleSheet" HREF="/scra/css/info.css" TYPE="text/css" MEDIA="screen"/>
<html>
<head>
<title>
    Request for Military Status
  </title>
</head>
<body bgcolor="#f3f2e6" ondragstart="return false" onselectstart="return false">





<a Name="top"/>
<table width="100%">
<tr>
<td>
Department of Defense Manpower Data Center
</td>
<td align="right">Dec-10-2011 15:10:38
</td>
</tr>
<tr><td colspan=2><table><tr><td>
<IMG SRC="/scra/images/webdod71.gif" WIDTH="71" HEIGHT="71" ALT="Department of Defense" BORDER="0"></td>
<td valign="center">Military Status Report<br>Pursuant to the Service Members Civil Relief Act</td></tr></table></td></tr>
</table>

<table cellpadding=1 cellspacing=0 border=1 width="100%">
<tr bgcolor="#FFCC99">
<th id="TH"> 
  <a Href="javascript:opener.v_objselect=15;window.opener.objselect()">
  <IMG SRC="/scra/images/lftpfeil.gif" WIDTH="15" HEIGHT="13" ALT="Return to SSN Selection" BORDER="0"></a> Last Name
</th>
<th id="center">First/Middle</th>
<th id="center">Begin Date</th>
<th id="center">Active Duty Status</th>
<th id="center">Active Duty End Date</th>
<th id="center">Service<BR>Agency</th>
</tr>
<tr>
  <td align="center">smith</td>
  <td align="center">John &nbsp;</td>
  
    <td colspan=4>
       Based on the information you have furnished, 
       the DMDC does not possess any information indicating 
       the individual status.
    </td>
  
  
  
  
</tr>
</table>



<p>
  Upon searching the information data banks of the Department of Defense Manpower Data Center,
   based on the information that you provided,
   the above is the current status of the individual as to all branches of the 
   Uniformed Services (Army, Navy, Marine Corps, Air Force, NOAA, Public Health, 
   and Coast Guard).

</p>




<table>
   <tr>
      <td>
          <img src="/scra/images/snavely-dixon_signature_transparent.gif"
              width="269" height="53" alt="Mary M. Snavely-Dixon" border="0"></img>
      </td>
   </tr>
   <tr>
      <td>________________________________</td>
   </tr>
   <tr>
      <td>Mary M. Snavely-Dixon, Director</td>
   </tr>
   <tr>
      <td>Department of Defense - Manpower Data Center</td>
   </tr>
   <tr>
      <td>1600 Wilson Blvd., Suite 400</td>
   </tr>
   <tr>
      <td>Arlington, VA 22209-2593</td>
   </tr>
</table>
<p>The Defense Manpower Data Center (DMDC) is an organization of the Department
   of Defense that maintains the Defense Enrollment and Eligibility Reporting
   System (DEERS) database which is the official source of data on eligibility
   for military medical care and other eligibility systems.</p>

<p>The DoD strongly supports the enforcement of the Service Members Civil Relief Act
   (50 USC App. &sect;&sect; 501 et seq, as amended)
   (SCRA) (formerly known as the Soldiers' and Sailors' Civil Relief Act of 1940).
   DMDC has issued hundreds of thousands of &quot;does not possess any information
   indicating that the individual is currently on active duty&quot; responses,
   and has experienced a small error rate. In the event the individual
   referenced above, or any family member, friend, or representative asserts in
   any manner that the individual is on active duty, or is otherwise entitled to
   the protections of the SCRA, you are strongly encouraged to obtain further
   verification of the person's status by contacting that person's Service via
   the &quot;defenselink.mil&quot; URL <a href="http://www.defenselink.mil/faq/pis/PC09SLDR.html">
   http://www.defenselink.mil/faq/pis/PC09SLDR.html</a>.
   If you have evidence the person is on active duty and you fail to obtain this
   additional Service verification, punitive provisions of the SCRA may be
   invoked against you. See 50 USC App. &sect;521(c).</p>
<p>If you obtain additional information about the person (e.g., an SSN, improved
   accuracy of DOB, a middle name), you can submit your request again at this
   Web site and we will provide a new certificate for that query.</p>
<p>This response reflects <b>active duty status</b> including date the individual 
   was last on active duty, if it was within the preceding 367 days.
   For historical information, please contact the Service SCRA points-of-contact. 
   
<p><i><b>More information on "Active Duty Status"</b></i><br>
   Active duty status as reported in this certificate is defined in accordance with 10
   USC &sect; 101(d)(1) for a period of more than 30 consecutive days.  In the case
   of a member of the National Guard, includes service under a call to active service 
   authorized by the President or the Secretary of Defense for a period of more than 30 
   consecutive days under 32 USC &sect; 502(f) for purposes of responding to a national 
   emergency declared by the President and supported by Federal funds.  All Active Guard Reserve (AGR)
   members must be assigned against an authorized mobilization position in the unit they support.  
   This includes Navy TARs, Marine Corps ARs and Coast Guard RPAs.  Active Duty
   status also applies to a Uniformed Service member who is an active duty commissioned 
   officer of the U.S. Public Health Service or the National Oceanic and Atmospheric Administration
   (NOAA Commissioned Corps) for a period of more than 30 consecutive days.
   <p><i><b>Coverage Under the SCRA is Broader in Some Cases</b></i><br>
   Coverage under the SCRA is broader in some cases and includes some categories of persons
   on active duty for purposes of the SCRA who would not be reported as on Active Duty under 
   this certificate.
   <p>
   Many times orders are amended to extend the period of active duty, which would extend SCRA protections.
   Persons seeking to rely on this website certification should check to make sure the orders on which 
   SCRA protections are based have not been amended to extend the inclusive dates of service.
   Furthermore, some protections of the SCRA may extend to persons who have received orders to report 
   for active duty or to be inducted, but who have not actually begun active duty or actually reported
   for induction.  The Last Date on Active Duty entry is important because a number of protections 
   of SCRA extend beyond the last dates of active duty.
   <p>
   Those who would rely on this certificate are urged to seek qualified legal counsel to ensure that all 
   rights guaranteed to Service members under the SCRA are protected.

</p>

WARNING: This certificate was provided based on a name and SSN provided by the requester. 
Providing an erroneous name or SSN will cause an erroneous certificate to be provided.<br> 

<span id="highlight">Report ID:5BCN37BT21</span>  
</body>
</html>

Open in new window

Avatar of Norie
Norie

The URL you posted isn't accessible, for me anyway.

I get some sort of  security alert that I've rarely seen.

Anyway, if you can get the URL for the window with the data you want you could try opening it in a second instance of IE.

Provided that works getting the data from the table is striaghtforward.
Avatar of njohnson6378

ASKER

Yeah there  is a problem with their security certificate., but the link does work.
But even after that I can't go any further, and I'm not sure if i want to - as far as I can see this is a DoD (or something related) site.

Did you try my suggestion?
Yeah but the real issue is if you look at the sourc code from the results page I'm not sure where the table names are. Could you look  at that and let me know what you think?
As far as I can see there's only 1 table and you can get a reference to it like this.
Set objTbl = doc.getElementsByTagName("TABLE")(0)

Open in new window

You would need to have navigated succesfuly to the page and set  a reference to it's document.

To get the data from the table you can use something like this.
Sub GetTableData(ByRef tbl, rng As Range)
Dim cl As Object
Dim rw As Object
Dim I As Long

    For Each rw In tbl.Rows

        For Each cl In rw.Cells
            rng.Value = cl.outerText
            Set rng = rng.Offset(, 1)
        Next cl

        Set rng = Cells(rng.Row + 1, 1)
    Next rw

End Sub

Open in new window

That sub requires that you pass a range and a reference to a table.

So you would call it something like this.
Set rngDest = Worksheets("Sheet1").Range("A1")

Set objTbl = doc.getElementsByTagName("TABLE")(0)

' get data from table
GetTableData objTbl, rngDest 

Open in new window

You could try testing by saving the source code you posted to a local file and doing the IE navigation thing with that.
Ok I'll try this later this evening and let you know
Ok i couldnt figure out how to combine the code you gaev me to the code i already have..Think you could help with that? I would be much appreciateive
Do you have the URL of the page with the data you want?
but it will just give you a blank page.. You have to key in the first last and social and then hit lookup before it will populate a page with a table...But i provided the source code for the table page above
Yes and I provided the code to get the data from the table in the source code.:)

Another thing you could try would be to stop the page being opened in a new window.

Hard to tell if that's even possible without accessing the site properly.

You probably want to look at what's behind the Lookup button.

Were you going to be able to combine the code?
Combine what exactly?

You can just use similar code to what you have but with the URL of the page you want to get the data from.
    strURL = "https:\\www.themissingpage.com"

    Set IE = CreateObject("InternetExplorer.Application")

    With IE
            .navigate strURL

            Do While .Busy: DoEvents: Loop    
            Do While .readyState <> 4: DoEvents: Loop

            Set doc = .document

     End With

            Set rngDest = Worksheets("Sheet1").Range("A1")

            Set objTbl = doc.getElementsByTagName("TABLE")(0)

            ' get data from table and put it on worksheet

            GetTableData objTbl, rngDest 

Open in new window

I already haev the url of the page i want to get the data from. It is https://www.dmdc.osd.mil/appj/scra/scraHome.do 

You fill out the form and click lookup then it opens a new window automatically with the results of the query. you cant navigate to it.
if you fill out your first and last name and social you will see that it just checks wheather or not you are active military or not. I think this is whre the confusion comes into play
That'd be hard for me to do - I don't have a US social security number, and I don't think my NI no will work.:)

I'm assuming this is a US site of course.

Can you give any examples?
So this is the search page.. Data is populated from my spreadsheet to the text boxes on here.. then the lookup button is pressed..
After the button is pressed it opens a child window that displays the results (see pic below)
My code as posted here works until the child window opens..(i am unable to scrape the data from this to paste into excel)
then the whole thing loops again
I have the code you gave me commented out because i count get it to work.

So the search result will either show what it has in the image or it will be populated with dates of service etc.. Please help
Option Explicit

Sub GetData()
Dim IE As Object
Dim doc As Object
Dim strURL As String
Dim cpy As Object
Dim rngssn As Range
Dim rnglastname As Range
Dim rngfirstname As Range
Dim rng As Range
Dim btndosearch As Object
Dim submitbuttons As Object
Dim cl As Object
Dim rw As Object
Dim I As Long
Dim rngdest As Range
Dim objtable As Object
Dim objTbl As Object
Dim GetTableData As Object
Dim ByReftbl As DataTable
Dim tbl As DataTable



    strURL = "https://www.dmdc.osd.mil/appj/scra/scraHome.do"

    Set IE = CreateObject("InternetExplorer.Application")

    With IE
        .navigate strURL

        Do While .Busy: DoEvents: Loop
        Do While .readyState <> 4: DoEvents: Loop
        .Visible = True

        Set doc = IE.document

    
Set rngssn = Range("d2")
Set rnglastname = Range("b2")
Set rngfirstname = Range("c2")
 
 While rngssn.Value <> ""
'While rnglastname.Value <> ""
'While rngfirstname.Value <> ""
' wait for page to load and grab data
            Do While .Busy: DoEvents: Loop
            Do While .readyState <> 4: DoEvents: Loop
            Do While doc.readyState <> "complete": DoEvents: Loop

         
    doc.getElementById("ssn").Value = rngssn.Value
    doc.getElementById("repeatssn").Value = rngssn.Value
    doc.getElementById("lastname").Value = rnglastname.Value
    doc.getElementById("repeatlastname").Value = rnglastname.Value
    doc.getElementById("firstname").Value = rngfirstname.Value
    doc.getElementById("repeatfirstname").Value = rngfirstname.Value
    Application.Wait Now() + TimeValue("00:00:05")
    Set submitbuttons = doc.getElementsByName("p_select")
    
'MsgBox submitbuttons.Length
    submitbuttons(0).Click
     Application.Wait Now() + TimeValue("00:00:05")
 
     'Set doc = IE.document
 'Set rngdest = Worksheets("Sheet1").Range("E1")
 'Set objTbl = doc.getElementsByTagName("Top")(0)

' get data from table
'GetTableData objTbl, rngdest


'For Each rw In tbl.Rows

        'For Each cl In rw.Cells
            'rng.Value = cl.outerText
            'Set rng = rng.Offset(, 1)
       ' Next cl

        'Set rng = Cells(rng.Row + 1, 1)
   ' Next rw




    'doc.getElementsByTagName("Lookup").Click
    'doc.execScript ("mkaux")
    'doc.getElementById("p_select[0]").Click
    
     'Set doc = IE.document
    
  ' Application.Wait Now() + TimeValue("00:00:010")  Set doc = IE.document

    ' wait for page to load and grab data ' rng.Offset(, 4).Value = doc.getElementById("start date").innerText

 ' rng.Offset(, 5).Value = doc.getElementById("status").innerText

  'rng.Offset(, 6).Value = doc.getElementById("end date").innerText
 
  'rng.Offset(, 7).Value = doc.getElementById("ageny").innerText

'goto next ssn number number et.
  Set rngssn = rngssn.Offset(1, 0)
  Set rnglastname = rnglastname.Offset(1, 0)
  Set rngfirstname = rngfirstname.Offset(1, 0)


       With IE
        .navigate strURL
        
        
        Application.Wait Now() + TimeValue("00:00:05")
        


        
        
    End With
Wend
 
    Set IE = Nothing
End With
End Sub

Open in new window

search-results-page.bmp
main-search.bmp
Can you provide some sample data?

Even one SSN and name might help.
Just use John smith for the name 1234556789 as the social and  click lookup
Sorry 123456789 or any 9 digit made up number will do
I'll give it a try.
Sounds good . Thank you!
Well I eventually found an SSN that worked.

I'll take a look at this further tomorrow - I think there is code that can be used to grab the page of interest.
Great I  appreciate it.
Any luck yet?
I've found the code but I'm afraid I won't be able to do anything with it tonight.
Ok, not trying to be pushy but this project is due by 11am tomorrow US Central time. Do you think you would have anyting by then, or should i try and postpone?
So that's in 24 hours approximately?
yes sir
I'll post something before the deadline.

It's already night here anyway.
ok sounds great thank you for your help. I really do appreciate it
ASKER CERTIFIED SOLUTION
Avatar of Norie
Norie

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You are my HERO today sir. Thanks a million
Fantastic
is there  a way to view the code for this so that i can change something?
nevermind