Solved

Import XML info embedded in HTML output from a website

Posted on 2012-03-31
13
489 Views
Last Modified: 2012-04-15
I use MS Access to track my business activities.  I am an independent contractor that gets assignments from several different online sources.  When I get a new work order I input all the info into my DB by doing a lot of copy and paste.  This is VERY time consuming and tedious.

I am attaching output from one of these websites that was downloaded using Firefox.

I have tried to have VBA in Access parse the output from this website, but I believe the XML is wrapped up in HTML, so the parser is having trouble with it.

Dim objXML As DOMDocument60
Set objXML = New DOMDocument60
objXML.loadXML strXMLfile

Open in new window


I think my question is:  How do I separate the HTML from the XML so Access can import it?

Thanks!
-wosDetails.php.htm
0
Comment
Question by:AviationAce
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 5
13 Comments
 

Author Comment

by:AviationAce
ID: 37791093
NOTE: I didn't include the sub directory for the attached .HTM file.  It contains various .CSS files among others.  Let me know if any of these are necessary.
0
 
LVL 34

Expert Comment

by:Norie
ID: 37791185
What data are you dealing with on that page?

I've only had a quick look and I can't see any XML.
0
 

Author Comment

by:AviationAce
ID: 37791253
OK.  I am assuming there is XML embedded in it.  If not, how is the data contained in it?

Just loading it in notepad and doing a search for "Site Information" will get you to some of the information I need.  This is the address of a site I will be doing work at.
0
Don't Cry: How Liquid Web is Ensuring Security

WannaCry is just the start. Read how Liquid Web is protecting itself and its customers against new threats.

 
LVL 83

Expert Comment

by:Dave Baldwin
ID: 37791275
That section is just a table with the info, no XML.
0
 
LVL 34

Expert Comment

by:Norie
ID: 37791282
The Site Information is in an HTML table along with Assignment Date/Times,  Tech Pay and Contact Information.

I can't see any XML, and it's not an XML file.
0
 
LVL 34

Expert Comment

by:Norie
ID: 37791299
I've got code that can extract the data but it uses IE and Excel, so I realise it's not ideal.

Anyway, here it is.
Option Explicit

Sub test()
Dim IE As Object
Dim doc As Object
Dim tbl As Object
Dim strURL As String

strURL = "C:\wosDetails.php.htm"


    Set IE = CreateObject("InternetExplorer.Application")
    

    IE.Navigate strURL
    
    Set doc = IE.Document
    
    Set tbl = doc.getElementById("TechWOWorkPlaceLeft_box").ChildNodes(0)
    
    GetTableData tbl, Range("A1")
    
    IE.Quit
End Sub

Sub GetTableData(ByRef tbl As Object, rng As Range)
Dim cl As Object
Dim rw As Object
Dim I As Long

    For Each rw In tbl.Rows
    
        For Each cl In rw.Cells
            rng.Value = cl.outerText
            Set rng = rng.Offset(, 1)
        Next cl
        
        Set rng = rng.Parent.Cells(rng.Row + 1, 1)
    Next rw
End Sub

Open in new window

0
 

Author Comment

by:AviationAce
ID: 37791527
Great insight!  I believe I am getting pointed in the right direction now.
What I would like to do now is enumerate each element in the HTML so I can decide which elements are important to me.

The following code was adapted from the link below:
http://www.vbaexpress.com/forum/showthread.php?t=31831

'Requires references to Microsoft Internet Controls and Microsoft HTML Object Library

Sub testHTMLelements()
Dim URL As String
    Dim IE As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    'Dim TDelements As IHTMLElementCollection
    'Dim TDelement As HTMLTableCell
    
        URL = XMLfile
    
    Set IE = New InternetExplorer
    
    With IE
        .navigate URL
        .Visible = True
    
        'Wait for page to load
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
    
        Set HTMLdoc = .Document
    End With

    'How do I enumerate each element at this point

    IE.Quit
Debug.Print "done"
End Sub

Open in new window

0
 
LVL 34

Assisted Solution

by:Norie
Norie earned 400 total points
ID: 37791569
Are you sure you want to enumerate every element on the page?

Even on that small page there are 951.

If you really want to though, try this.
Option Explicit

Sub test()
Dim IE As Object
Dim doc As Object
Dim elem As Object
Dim strURL As String
Dim rng As Range

    strURL = "C:\wosDetails.php.htm"

    Set rng = Range("A1")


    Set IE = CreateObject("InternetExplorer.Application")


    IE.Navigate strURL

    Do Until IE.ReadyState = 4: DoEvents: Loop

    Set doc = IE.Document

    For Each elem In doc.all
        rng.Value = elem.innerHTML
        rng.Offset(, 1).Value = elem.innerText
        Set rng = rng.Offset(1)
    Next elem

    IE.Quit
End Sub

Open in new window

Not sure how that will help with fnding the elements of interest, perhaps you should find out what data you want from the page and which elements hold it.

A 'better' way to do that would be to look at the source code using some of the web tools available with most browsers.

I mainly use IE which has a Developer Tools which you can activate using F12 - it's how I found the ID for the table with the data.

I'm sure FireFox has something similar, it might be called Web Develober or WebKit.
0
 

Author Comment

by:AviationAce
ID: 37793006
That F12 thing is good!  If you would take a look at the screen shot I've attached.  How can I write the code to retrieve just that one piece of information.  If I can get that, I can figure out how to get it all.
Thanks!
ScrnShot01.jpg
0
 

Author Comment

by:AviationAce
ID: 37793011
Increasing points to 400.
0
 

Accepted Solution

by:
AviationAce earned 0 total points
ID: 37827104
This code is using the "brute force" method for getting the info I need, but it works!

Function ParseFSworkorder() As Boolean
ParseFSworkorder = False
Dim bTemp As Boolean

If Not bInit Then: init
    
    Dim IE As InternetExplorer
    Dim HTMLdoc As HTMLDocument
    Dim TDelements As IHTMLElementCollection
    Dim TDelement As HTMLTableCell
    
    Set IE = New InternetExplorer
    
    With IE
        .navigate HTMLfile
        .Visible = True
    
        'Wait for page to load
        While .Busy Or .readyState <> READYSTATE_COMPLETE: DoEvents: Wend
    
        Set HTMLdoc = .Document
    End With

    Set TDelements = HTMLdoc.getElementsByName("mainContainer")
    Set TDelement = TDelements.Item(, 1)
    
    Dim ChildElemments As IHTMLElementCollection, ChildElem As HTMLTableCell
    Set ChildElemments = TDelement.all
    For Each ChildElem In ChildElemments
        Dim sUid As String
        sUid = ChildElem.getAttribute("UniqueID")
        Select Case sUid
            Case getClientNumber_id_const
                'Debug.Print ChildElem.innerText
                sClientNumber = ChildElem.innerText
            Case getCustName_id_const
                'Debug.Print ChildElem.innerText
                sCustName = parseCustName(ChildElem.innerText)
            Case getSiteAddress_id_const
                'Debug.Print ChildElem.innerText
                bTemp = parseAddress(ChildElem.innerText)
            Case getSitePhone_id_const
                'Debug.Print ChildElem.innerText
                sCustPhone = parsePhoneNumber(ChildElem.innerText)
            Case getWorkDescription_id_const
                'Debug.Print ChildElem.innerText
                sGeneralNotes = ChildElem.innerText
            Case getTechReq_id_const
                'Debug.Print ChildElem.innerText
                sTechReq = ChildElem.innerText
            Case getStartDate_Time_id_const
                'Debug.Print ChildElem.innerText
                bTemp = parseDateTime(ChildElem.innerText)
            Case getWinNumber_id_const
                'Debug.Print ChildElem.innerText
                sWinNumber = ChildElem.innerText
        End Select
    Next ChildElem
IE.Quit
Debug.Print "sWinNumber: " & sWinNumber
Debug.Print "sClientNumber: " & sClientNumber
Debug.Print "dtStartDate: " & dtStartDate
Debug.Print "dtStartTime: " & dtStartTime
Debug.Print "sCustName: " & sCustName
Debug.Print "sCustAddr1: " & sCustAddr1
Debug.Print "sCustAddr2: " & sCustAddr2
Debug.Print "sCustCity: " & sCustCity
Debug.Print "sCustState: " & sCustState
Debug.Print "sCustZip: " & sCustZip
Debug.Print "sCustPhone: " & sCustPhone
Debug.Print "sDeviceBrand: " & sDeviceBrand
Debug.Print "sGeneralNotes: " & sGeneralNotes
Debug.Print "sTechReq: " & sTechReq

End Function

Open in new window

0
 
LVL 34

Expert Comment

by:Norie
ID: 37828243
I don't know how but I missed your last post.

Anyway, I've looked at the screenshot and it's not really enough information to go on.

To be honest it would be easier to get all the data from the table and work with it afterwards..

For that to work you could find the table you want and then use a small sub like the one, GetTableData, I posted earlier.

You can loop through all the tables on the page like this.
Set doc = IE.Document

Set tbls = doc.getElementsByTagName("TABLE")

For Each tbl In tbls
      ' code to identify table
Next tbl

Open in new window

0
 

Author Closing Comment

by:AviationAce
ID: 37847936
Thanks for helping me analyze the sample file.  Your comments pointed me in the right direction.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes two methods for creating a combo box that can be used to add new items to the row source -- one for simple lookup tables, and one for a more complex row source where the new item needs data for several fields.
Microsoft Access is a place to store data within tables and represent this stored data using multiple database objects such as in form of macros, forms, reports, etc. After a MS Access database is created there is need to improve the performance and…
Basics of query design. Shows you how to construct a simple query by adding tables, perform joins, defining output columns, perform sorting, and apply criteria.
In Microsoft Access, learn the trick to repeating sub-report headings at the top of each page. The problem with sub-reports and headings: Add a dummy group to the sub report using the expression =1: Set the “Repeat Section” property of the dummy…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question