Parse HTML code into access database

Hi,
I'd like to write some code that could automatically parse various HTML pages and insert the table data into an access database.

The webpages are used for tracking courier data.

You can see an example here: http://www.hdnl.co.uk/tracker.aspx?UPI=806290025850a

The tracking number is held between <strong> tags, so it shouldn't be too difficult to isolate this.
I'd like to then extract the <tr> and <td> data and dump it into a table next to the tracking number.

Then I'll be able to query this information and find consignments that aren't being delivered on time.
Any help or advice you can give will be great.

Thanks very much
cobybensonAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

harfangCommented:
Hello,

I guess that you would first need to save the HTML page locally. Something like in this question: http:Q_21597349.html

Then you can read the entire file from VB (to extract the portion between <strong> and </strong> I guess and import the table contained in the HTML body as an Access table. I would have to check out how to automate that part, though.

Hope this helps
(°v°)
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
harfangCommented:
Addendum: in the link above I used ....documentElement.innerText. In your case it would have to be ....documentElement.innerHTML of course.
(°v°)
0
Ryan ChongCommented:
You can try use M$ XML Object Library to get the content of an URL (you can have many other methods to download/get web content as well)

like:

'Get Status using Microsoft XML 3.0 Object Library (MSXML)
Public Function getHTTPRequest_XML30(url As String) As String
    On Error GoTo EH
    DeleteUrlCacheEntry url
   
    Dim xmlHttp As MSXML2.XMLHTTP30
    Set xmlHttp = New MSXML2.XMLHTTP30
   
    xmlHttp.Open "GET", url, False
    xmlHttp.send
    'Debug.Print xmlHttp.Status
    'Debug.Print xmlHttp.responseText
    getHTTPRequest_XML30 = xmlHttp.responseText
    Set xmlHttp = Nothing
    Exit Function
EH:
    Debug.Print Err.Number & ": " & Err.Description
    On Error Resume Next
    Set xmlHttp = Nothing
End Function


and to parse the content, you may need this function posted by waty:

http:Q_20898133.html

hope this helps
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

harfangCommented:
Once you have the file locally, this will import the central table:

DoCmd.TransferText acImportHTML, _
    TableName:="<choose table name here>", _
    FileName:="C:\Full Path To File\tracker.aspx.html", _
    HasFieldNames:=True, _
    HTMLTableName:="Home Delivery Network Limited"

(°v°)
0
harfangCommented:
Again, once you have the file locally, this would extract the parcel number:

    Dim strLine As String
   
    Open "C:\Full Path To File\tracker.aspx.html" For Input As #1
    Do While Not EOF(1)
        Line Input #1, strLine
        If InStr(strLine, "<strong>") Then Exit Do
        strLine = ""
    Loop
    Close #1
   
    If strLine <> "" Then
        strLine = Split(Split(strLine, "<strong>")(1), "</strong>")(0)
        MsgBox strLine
    End If

Good luck!
(°v°)
0
rockiroadsCommented:
From what I understand u want to read the values in <TD> as well?

try this


where I put debug.print statements of sData, that is where one line is read
You can do whatever u want with that then
c:\tracker.html is your html file

Ive tried with your example and it seems to work. Ive tried to code it for multiple tracking numbers, that is untested


Public Sub ProcessFile()

    Dim iHandle As Integer
    Dim sLine As String
    Dim bReadOff As Boolean
    Dim sTrackingNumber As String
    Dim bLoop As Boolean
    Dim sData(4) As String
    Dim iDataCnt As Integer
    Dim bFirst As Boolean
   
   
    On Error Resume Next
   
    iHandle = FreeFile
    Open "c:\tracker.html" For Input As #iHandle
   
    bFirst = True
    bReadOff = False
    sTrackingNumber = ""
    Do While Not EOF(iHandle)
        Input #iHandle, sLine
        'Debug.Print sLine
       
        If InStr(1, UCase$(sLine), "<STRONG>") > 0 Then
            'Display last lot read
            If bFirst = False Then
                Debug.Print "Tracking Number : ", sTrackingNumber
                Debug.Print "Date", sData(0)
                Debug.Print "Depot", sData(1)
                Debug.Print "Action", sData(2)
                Debug.Print "Reason", sData(3)
            End If
           
            For iDataCnt = 0 To UBound(sData)
                sData(iDataCnt) = ""
            Next iDataCnt
           
            bFirst = False
            bReadOff = True
            sTrackingNumber = Mid$(sLine, InStr(1, UCase$(sLine), "<STRONG>") + 9, Len(sLine) - InStr(1, UCase$(sLine), "</STRONG>") - 1)
            Debug.Print "Found", sTrackingNumber
        End If
       
        If bReadOff = True And InStr(1, UCase$(sLine), "<TR>") > 0 Then
            'Read next line - expect TH or TD
            Input #iHandle, sLine
            'Ignore headings
            If InStr(1, UCase$(sLine), "<TD>") > 0 Then
                'Read off
                bLoop = True
                iDataCnt = 0
                Do While bLoop = True
                    If EOF(iHandle) = True Then
                        bLoop = False
                    ElseIf InStr(1, UCase$(sLine), "<TABLE>") > 0 Then
                        bLoop = False
                    ElseIf InStr(1, UCase$(sLine), "</TR>") > 0 Then
                        Debug.Print "Tracking Number : ", sTrackingNumber
                        Debug.Print "Date", sData(0)
                        Debug.Print "Depot", sData(1)
                        Debug.Print "Action", sData(2)
                        Debug.Print "Reason", sData(3)
                        iDataCnt = 0
                    ElseIf InStr(1, UCase$(sLine), "<TD>") > 0 Then
                        If iDataCnt < 4 Then
                            sData(iDataCnt) = Replace(Mid$(sLine, InStr(1, UCase$(sLine), "<TD>") + 4), UCase$("</TD>"), "")
                            Debug.Print sData(iDataCnt)
                            iDataCnt = iDataCnt + 1
                        End If
                    End If
                    Input #iHandle, sLine
                Loop
                bReadOff = False
            End If
        End If
    Loop
       
    If sTrackingNumber <> "" Then
        Debug.Print "Tracking Number : ", sTrackingNumber
        Debug.Print "Date", sData(0)
        Debug.Print "Depot", sData(1)
        Debug.Print "Action", sData(2)
        Debug.Print "Reason", sData(3)
    End If

    Close #iHandle
           
End Sub



0
Gustav BrockCIOCommented:
You can also use Excel to read the data live, no copy.
Use this iqy query:

WEB
1
http://www.hdnl.co.uk/tracker.aspx?UPI=806290025850a

Selection=1
Formatting=None
PreFormattedTextToColumns=True
ConsecutiveDelimitersAsOne=True
SingleBlockTextImport=False
DisableDateRecognition=False

Now, update this, link the worksheet to Access.

/gustav
0
Gustav BrockCIOCommented:
Looks like you need to download the page, then link to this with the html link wizard.

First download:

Declaration:
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" ( _
  ByVal pCaller As Long, _
  ByVal szURL As String, _
  ByVal szFileName As String, _
  ByVal dwReserved As Long, _
  ByVal lpfnCB As Long) _
  As Long

Public Function DownloadFile( _
  ByVal strURL As String, _
  ByVal strLocalFilename As String) _
  As Long
 
' Download file or page with public access from the web.
' 2004-12-17. Cactus Data ApS, CPH.

' Usage, download a file:
' lngRet = DownloadFile("http://www.databaseadvisors.com/Graphics/conf2002/2002ConferencePicsbySmolin/images/dba02smolin27.jpg", "c:\happybassett.jpg")
'
' Usage, download a page:
' lngRet = DownloadFile("http://www.databaseadvisors.com/conf2002/conf200202.htm", "c:\dbaconference.htm")

' Returns 0 if success, error code if not.
' Error codes:
' -2146697210 "file not found".
' -2146697211 "domain not found".

' Limitation.
' Does not check if local file was created successfully.

  Dim lngRetVal As Long
   
  lngRetVal = URLDownloadToFile(0, strURL & vbNullChar, strLocalFilename & vbNullChar, 0, 0)
 
  DownloadFile = lngRetVal
 
End Function

Now, any time before you open the linked table in Access, run the function to import. It's a split second.

/gustav
0
Gustav BrockCIOCommented:
In Html wizard select Tables, not Lists.

/gustav
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Access

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.