Solved

Import XML info embedded in HTML output from a website

Posted on 2012-03-31
13
432 Views
Last Modified: 2012-04-15
I use MS Access to track my business activities.  I am an independent contractor that gets assignments from several different online sources.  When I get a new work order I input all the info into my DB by doing a lot of copy and paste.  This is VERY time consuming and tedious.

I am attaching output from one of these websites that was downloaded using Firefox.

I have tried to have VBA in Access parse the output from this website, but I believe the XML is wrapped up in HTML, so the parser is having trouble with it.

Dim objXML As DOMDocument60
Set objXML = New DOMDocument60
objXML.loadXML strXMLfile

Open in new window


I think my question is:  How do I separate the HTML from the XML so Access can import it?

Thanks!
-wosDetails.php.htm
0
Comment
Question by:AviationAce
  • 7
  • 5
13 Comments
 

Author Comment

by:AviationAce
Comment Utility
NOTE: I didn't include the sub directory for the attached .HTM file.  It contains various .CSS files among others.  Let me know if any of these are necessary.
0
 
LVL 33

Expert Comment

by:Norie
Comment Utility
What data are you dealing with on that page?

I've only had a quick look and I can't see any XML.
0
 

Author Comment

by:AviationAce
Comment Utility
OK.  I am assuming there is XML embedded in it.  If not, how is the data contained in it?

Just loading it in notepad and doing a search for "Site Information" will get you to some of the information I need.  This is the address of a site I will be doing work at.
0
 
LVL 82

Expert Comment

by:Dave Baldwin
Comment Utility
That section is just a table with the info, no XML.
0
 
LVL 33

Expert Comment

by:Norie
Comment Utility
The Site Information is in an HTML table along with Assignment Date/Times,  Tech Pay and Contact Information.

I can't see any XML, and it's not an XML file.
0
 
LVL 33

Expert Comment

by:Norie
Comment Utility
I've got code that can extract the data but it uses IE and Excel, so I realise it's not ideal.

Anyway, here it is.
Option Explicit

Sub test()
Dim IE As Object
Dim doc As Object
Dim tbl As Object
Dim strURL As String

strURL = "C:\wosDetails.php.htm"


    Set IE = CreateObject("InternetExplorer.Application")
    

    IE.Navigate strURL
    
    Set doc = IE.Document
    
    Set tbl = doc.getElementById("TechWOWorkPlaceLeft_box").ChildNodes(0)
    
    GetTableData tbl, Range("A1")
    
    IE.Quit
End Sub

Sub GetTableData(ByRef tbl As Object, rng As Range)
Dim cl As Object
Dim rw As Object
Dim I As Long

    For Each rw In tbl.Rows
    
        For Each cl In rw.Cells
            rng.Value = cl.outerText
            Set rng = rng.Offset(, 1)
        Next cl
        
        Set rng = rng.Parent.Cells(rng.Row + 1, 1)
    Next rw
End Sub

Open in new window

0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:AviationAce
Comment Utility
Great insight!  I believe I am getting pointed in the right direction now.
What I would like to do now is enumerate each element in the HTML so I can decide which elements are important to me.

The following code was adapted from the link below:
http://www.vbaexpress.com/forum/showthread.php?t=31831

'Requires references to Microsoft Internet Controls and Microsoft HTML Object Library

Sub testHTMLelements()
Dim URL As String
    Dim IE As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    'Dim TDelements As IHTMLElementCollection
    'Dim TDelement As HTMLTableCell
    
        URL = XMLfile
    
    Set IE = New InternetExplorer
    
    With IE
        .navigate URL
        .Visible = True
    
        'Wait for page to load
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
    
        Set HTMLdoc = .Document
    End With

    'How do I enumerate each element at this point

    IE.Quit
Debug.Print "done"
End Sub

Open in new window

0
 
LVL 33

Assisted Solution

by:Norie
Norie earned 400 total points
Comment Utility
Are you sure you want to enumerate every element on the page?

Even on that small page there are 951.

If you really want to though, try this.
Option Explicit

Sub test()
Dim IE As Object
Dim doc As Object
Dim elem As Object
Dim strURL As String
Dim rng As Range

    strURL = "C:\wosDetails.php.htm"

    Set rng = Range("A1")


    Set IE = CreateObject("InternetExplorer.Application")


    IE.Navigate strURL

    Do Until IE.ReadyState = 4: DoEvents: Loop

    Set doc = IE.Document

    For Each elem In doc.all
        rng.Value = elem.innerHTML
        rng.Offset(, 1).Value = elem.innerText
        Set rng = rng.Offset(1)
    Next elem

    IE.Quit
End Sub

Open in new window

Not sure how that will help with fnding the elements of interest, perhaps you should find out what data you want from the page and which elements hold it.

A 'better' way to do that would be to look at the source code using some of the web tools available with most browsers.

I mainly use IE which has a Developer Tools which you can activate using F12 - it's how I found the ID for the table with the data.

I'm sure FireFox has something similar, it might be called Web Develober or WebKit.
0
 

Author Comment

by:AviationAce
Comment Utility
That F12 thing is good!  If you would take a look at the screen shot I've attached.  How can I write the code to retrieve just that one piece of information.  If I can get that, I can figure out how to get it all.
Thanks!
ScrnShot01.jpg
0
 

Author Comment

by:AviationAce
Comment Utility
Increasing points to 400.
0
 

Accepted Solution

by:
AviationAce earned 0 total points
Comment Utility
This code is using the "brute force" method for getting the info I need, but it works!

Function ParseFSworkorder() As Boolean
ParseFSworkorder = False
Dim bTemp As Boolean

If Not bInit Then: init
    
    Dim IE As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim TDelements As IHTMLElementCollection
    Dim TDelement As HTMLTableCell
    
    Set IE = New InternetExplorer
    
    With IE
        .navigate HTMLfile
        .Visible = True
    
        'Wait for page to load
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
    
        Set HTMLdoc = .Document
    End With

    Set TDelements = HTMLdoc.getElementsByName("mainContainer")
    Set TDelement = TDelements.Item(, 1)
    
    Dim ChildElemments As IHTMLElementCollection, ChildElem As HTMLTableCell
    Set ChildElemments = TDelement.all
    For Each ChildElem In ChildElemments
        Dim sUid As String
        sUid = ChildElem.getAttribute("UniqueID")
        Select Case sUid
            Case getClientNumber_id_const
                'Debug.Print ChildElem.innerText
                sClientNumber = ChildElem.innerText
            Case getCustName_id_const
                'Debug.Print ChildElem.innerText
                sCustName = parseCustName(ChildElem.innerText)
            Case getSiteAddress_id_const
                'Debug.Print ChildElem.innerText
                bTemp = parseAddress(ChildElem.innerText)
            Case getSitePhone_id_const
                'Debug.Print ChildElem.innerText
                sCustPhone = parsePhoneNumber(ChildElem.innerText)
            Case getWorkDescription_id_const
                'Debug.Print ChildElem.innerText
                sGeneralNotes = ChildElem.innerText
            Case getTechReq_id_const
                'Debug.Print ChildElem.innerText
                sTechReq = ChildElem.innerText
            Case getStartDate_Time_id_const
                'Debug.Print ChildElem.innerText
                bTemp = parseDateTime(ChildElem.innerText)
            Case getWinNumber_id_const
                'Debug.Print ChildElem.innerText
                sWinNumber = ChildElem.innerText
        End Select
    Next ChildElem
IE.Quit
Debug.Print "sWinNumber: " & sWinNumber
Debug.Print "sClientNumber: " & sClientNumber
Debug.Print "dtStartDate: " & dtStartDate
Debug.Print "dtStartTime: " & dtStartTime
Debug.Print "sCustName: " & sCustName
Debug.Print "sCustAddr1: " & sCustAddr1
Debug.Print "sCustAddr2: " & sCustAddr2
Debug.Print "sCustCity: " & sCustCity
Debug.Print "sCustState: " & sCustState
Debug.Print "sCustZip: " & sCustZip
Debug.Print "sCustPhone: " & sCustPhone
Debug.Print "sDeviceBrand: " & sDeviceBrand
Debug.Print "sGeneralNotes: " & sGeneralNotes
Debug.Print "sTechReq: " & sTechReq

End Function

Open in new window

0
 
LVL 33

Expert Comment

by:Norie
Comment Utility
I don't know how but I missed your last post.

Anyway, I've looked at the screenshot and it's not really enough information to go on.

To be honest it would be easier to get all the data from the table and work with it afterwards..

For that to work you could find the table you want and then use a small sub like the one, GetTableData, I posted earlier.

You can loop through all the tables on the page like this.
Set doc = IE.Document

Set tbls = doc.getElementsByTagName("TABLE")

For Each tbl In tbls
      ' code to identify table
Next tbl

Open in new window

0
 

Author Closing Comment

by:AviationAce
Comment Utility
Thanks for helping me analyze the sample file.  Your comments pointed me in the right direction.
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

Introduction The Visual Basic for Applications (VBA) language is at the heart of every application that you write. It is your key to taking Access beyond the world of wizards into a world where anything is possible. This article introduces you to…
In a multiple monitor setup, if you don't want to use AutoCenter to position your popup forms, you have a problem: where will they appear?  Sometimes you may have an additional problem: where the devil did they go?  If you last had a popup form open…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
In Microsoft Access, when working with VBA, learn some techniques for writing readable and easily maintained code.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now