Parse HTML code into access database

Posted on 2006-04-04
Medium Priority
Last Modified: 2010-02-09
I'd like to write some code that could automatically parse various HTML pages and insert the table data into an access database.

The webpages are used for tracking courier data.

You can see an example here: http://www.hdnl.co.uk/tracker.aspx?UPI=806290025850a

The tracking number is held between <strong> tags, so it shouldn't be too difficult to isolate this.
I'd like to then extract the <tr> and <td> data and dump it into a table next to the tracking number.

Then I'll be able to query this information and find consignments that aren't being delivered on time.
Any help or advice you can give will be great.

Thanks very much
Question by:cobybenson
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 58

Accepted Solution

harfang earned 500 total points
ID: 16370005

I guess that you would first need to save the HTML page locally. Something like in this question: http:Q_21597349.html

Then you can read the entire file from VB (to extract the portion between <strong> and </strong> I guess and import the table contained in the HTML body as an Access table. I would have to check out how to automate that part, though.

Hope this helps
LVL 58

Expert Comment

ID: 16370018
Addendum: in the link above I used ....documentElement.innerText. In your case it would have to be ....documentElement.innerHTML of course.
LVL 53

Assisted Solution

by:Ryan Chong
Ryan Chong earned 500 total points
ID: 16370027
You can try use M$ XML Object Library to get the content of an URL (you can have many other methods to download/get web content as well)


'Get Status using Microsoft XML 3.0 Object Library (MSXML)
Public Function getHTTPRequest_XML30(url As String) As String
    On Error GoTo EH
    DeleteUrlCacheEntry url
    Dim xmlHttp As MSXML2.XMLHTTP30
    Set xmlHttp = New MSXML2.XMLHTTP30
    xmlHttp.Open "GET", url, False
    'Debug.Print xmlHttp.Status
    'Debug.Print xmlHttp.responseText
    getHTTPRequest_XML30 = xmlHttp.responseText
    Set xmlHttp = Nothing
    Exit Function
    Debug.Print Err.Number & ": " & Err.Description
    On Error Resume Next
    Set xmlHttp = Nothing
End Function

and to parse the content, you may need this function posted by waty:


hope this helps
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

LVL 58

Expert Comment

ID: 16370089
Once you have the file locally, this will import the central table:

DoCmd.TransferText acImportHTML, _
    TableName:="<choose table name here>", _
    FileName:="C:\Full Path To File\tracker.aspx.html", _
    HasFieldNames:=True, _
    HTMLTableName:="Home Delivery Network Limited"

LVL 58

Expert Comment

ID: 16370199
Again, once you have the file locally, this would extract the parcel number:

    Dim strLine As String
    Open "C:\Full Path To File\tracker.aspx.html" For Input As #1
    Do While Not EOF(1)
        Line Input #1, strLine
        If InStr(strLine, "<strong>") Then Exit Do
        strLine = ""
    Close #1
    If strLine <> "" Then
        strLine = Split(Split(strLine, "<strong>")(1), "</strong>")(0)
        MsgBox strLine
    End If

Good luck!
LVL 65

Assisted Solution

rockiroads earned 500 total points
ID: 16370797
From what I understand u want to read the values in <TD> as well?

try this

where I put debug.print statements of sData, that is where one line is read
You can do whatever u want with that then
c:\tracker.html is your html file

Ive tried with your example and it seems to work. Ive tried to code it for multiple tracking numbers, that is untested

Public Sub ProcessFile()

    Dim iHandle As Integer
    Dim sLine As String
    Dim bReadOff As Boolean
    Dim sTrackingNumber As String
    Dim bLoop As Boolean
    Dim sData(4) As String
    Dim iDataCnt As Integer
    Dim bFirst As Boolean
    On Error Resume Next
    iHandle = FreeFile
    Open "c:\tracker.html" For Input As #iHandle
    bFirst = True
    bReadOff = False
    sTrackingNumber = ""
    Do While Not EOF(iHandle)
        Input #iHandle, sLine
        'Debug.Print sLine
        If InStr(1, UCase$(sLine), "<STRONG>") > 0 Then
            'Display last lot read
            If bFirst = False Then
                Debug.Print "Tracking Number : ", sTrackingNumber
                Debug.Print "Date", sData(0)
                Debug.Print "Depot", sData(1)
                Debug.Print "Action", sData(2)
                Debug.Print "Reason", sData(3)
            End If
            For iDataCnt = 0 To UBound(sData)
                sData(iDataCnt) = ""
            Next iDataCnt
            bFirst = False
            bReadOff = True
            sTrackingNumber = Mid$(sLine, InStr(1, UCase$(sLine), "<STRONG>") + 9, Len(sLine) - InStr(1, UCase$(sLine), "</STRONG>") - 1)
            Debug.Print "Found", sTrackingNumber
        End If
        If bReadOff = True And InStr(1, UCase$(sLine), "<TR>") > 0 Then
            'Read next line - expect TH or TD
            Input #iHandle, sLine
            'Ignore headings
            If InStr(1, UCase$(sLine), "<TD>") > 0 Then
                'Read off
                bLoop = True
                iDataCnt = 0
                Do While bLoop = True
                    If EOF(iHandle) = True Then
                        bLoop = False
                    ElseIf InStr(1, UCase$(sLine), "<TABLE>") > 0 Then
                        bLoop = False
                    ElseIf InStr(1, UCase$(sLine), "</TR>") > 0 Then
                        Debug.Print "Tracking Number : ", sTrackingNumber
                        Debug.Print "Date", sData(0)
                        Debug.Print "Depot", sData(1)
                        Debug.Print "Action", sData(2)
                        Debug.Print "Reason", sData(3)
                        iDataCnt = 0
                    ElseIf InStr(1, UCase$(sLine), "<TD>") > 0 Then
                        If iDataCnt < 4 Then
                            sData(iDataCnt) = Replace(Mid$(sLine, InStr(1, UCase$(sLine), "<TD>") + 4), UCase$("</TD>"), "")
                            Debug.Print sData(iDataCnt)
                            iDataCnt = iDataCnt + 1
                        End If
                    End If
                    Input #iHandle, sLine
                bReadOff = False
            End If
        End If
    If sTrackingNumber <> "" Then
        Debug.Print "Tracking Number : ", sTrackingNumber
        Debug.Print "Date", sData(0)
        Debug.Print "Depot", sData(1)
        Debug.Print "Action", sData(2)
        Debug.Print "Reason", sData(3)
    End If

    Close #iHandle
End Sub

LVL 51

Expert Comment

by:Gustav Brock
ID: 16371146
You can also use Excel to read the data live, no copy.
Use this iqy query:



Now, update this, link the worksheet to Access.

LVL 51

Assisted Solution

by:Gustav Brock
Gustav Brock earned 500 total points
ID: 16371667
Looks like you need to download the page, then link to this with the html link wizard.

First download:

Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" ( _
  ByVal pCaller As Long, _
  ByVal szURL As String, _
  ByVal szFileName As String, _
  ByVal dwReserved As Long, _
  ByVal lpfnCB As Long) _
  As Long

Public Function DownloadFile( _
  ByVal strURL As String, _
  ByVal strLocalFilename As String) _
  As Long
' Download file or page with public access from the web.
' 2004-12-17. Cactus Data ApS, CPH.

' Usage, download a file:
' lngRet = DownloadFile("http://www.databaseadvisors.com/Graphics/conf2002/2002ConferencePicsbySmolin/images/dba02smolin27.jpg", "c:\happybassett.jpg")
' Usage, download a page:
' lngRet = DownloadFile("http://www.databaseadvisors.com/conf2002/conf200202.htm", "c:\dbaconference.htm")

' Returns 0 if success, error code if not.
' Error codes:
' -2146697210 "file not found".
' -2146697211 "domain not found".

' Limitation.
' Does not check if local file was created successfully.

  Dim lngRetVal As Long
  lngRetVal = URLDownloadToFile(0, strURL & vbNullChar, strLocalFilename & vbNullChar, 0, 0)
  DownloadFile = lngRetVal
End Function

Now, any time before you open the linked table in Access, run the function to import. It's a split second.

LVL 51

Expert Comment

by:Gustav Brock
ID: 16371761
In Html wizard select Tables, not Lists.


Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

As tax season makes its return, so does the increase in cyber crime and tax refund phishing that comes with it
Microsoft Access is a place to store data within tables and represent this stored data using multiple database objects such as in form of macros, forms, reports, etc. After a MS Access database is created there is need to improve the performance and…
Learn how to number pages in an Access report over each group. Activate two pass printing by referencing the pages property: Add code to the Page Footers OnFormat event to capture the pages as there occur for each group. Use the pages property to …
Basics of query design. Shows you how to construct a simple query by adding tables, perform joins, defining output columns, perform sorting, and apply criteria.
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question