Solved

Parse HTML code into access database

Posted on 2006-04-04
11
1,083 Views
Last Modified: 2010-02-09
Hi,
I'd like to write some code that could automatically parse various HTML pages and insert the table data into an access database.

The webpages are used for tracking courier data.

You can see an example here: http://www.hdnl.co.uk/tracker.aspx?UPI=806290025850a

The tracking number is held between <strong> tags, so it shouldn't be too difficult to isolate this.
I'd like to then extract the <tr> and <td> data and dump it into a table next to the tracking number.

Then I'll be able to query this information and find consignments that aren't being delivered on time.
Any help or advice you can give will be great.

Thanks very much
0
Comment
Question by:cobybenson
11 Comments
 
LVL 58

Accepted Solution

by:
harfang earned 125 total points
Comment Utility
Hello,

I guess that you would first need to save the HTML page locally. Something like in this question: http:Q_21597349.html

Then you can read the entire file from VB (to extract the portion between <strong> and </strong> I guess and import the table contained in the HTML body as an Access table. I would have to check out how to automate that part, though.

Hope this helps
(°v°)
0
 
LVL 58

Expert Comment

by:harfang
Comment Utility
Addendum: in the link above I used ....documentElement.innerText. In your case it would have to be ....documentElement.innerHTML of course.
(°v°)
0
 
LVL 49

Assisted Solution

by:Ryan Chong
Ryan Chong earned 125 total points
Comment Utility
You can try use M$ XML Object Library to get the content of an URL (you can have many other methods to download/get web content as well)

like:

'Get Status using Microsoft XML 3.0 Object Library (MSXML)
Public Function getHTTPRequest_XML30(url As String) As String
    On Error GoTo EH
    DeleteUrlCacheEntry url
   
    Dim xmlHttp As MSXML2.XMLHTTP30
    Set xmlHttp = New MSXML2.XMLHTTP30
   
    xmlHttp.Open "GET", url, False
    xmlHttp.send
    'Debug.Print xmlHttp.Status
    'Debug.Print xmlHttp.responseText
    getHTTPRequest_XML30 = xmlHttp.responseText
    Set xmlHttp = Nothing
    Exit Function
EH:
    Debug.Print Err.Number & ": " & Err.Description
    On Error Resume Next
    Set xmlHttp = Nothing
End Function


and to parse the content, you may need this function posted by waty:

http:Q_20898133.html

hope this helps
0
 
LVL 58

Expert Comment

by:harfang
Comment Utility
Once you have the file locally, this will import the central table:

DoCmd.TransferText acImportHTML, _
    TableName:="<choose table name here>", _
    FileName:="C:\Full Path To File\tracker.aspx.html", _
    HasFieldNames:=True, _
    HTMLTableName:="Home Delivery Network Limited"

(°v°)
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 58

Expert Comment

by:harfang
Comment Utility
Again, once you have the file locally, this would extract the parcel number:

    Dim strLine As String
   
    Open "C:\Full Path To File\tracker.aspx.html" For Input As #1
    Do While Not EOF(1)
        Line Input #1, strLine
        If InStr(strLine, "<strong>") Then Exit Do
        strLine = ""
    Loop
    Close #1
   
    If strLine <> "" Then
        strLine = Split(Split(strLine, "<strong>")(1), "</strong>")(0)
        MsgBox strLine
    End If

Good luck!
(°v°)
0
 
LVL 65

Assisted Solution

by:rockiroads
rockiroads earned 125 total points
Comment Utility
From what I understand u want to read the values in <TD> as well?

try this


where I put debug.print statements of sData, that is where one line is read
You can do whatever u want with that then
c:\tracker.html is your html file

Ive tried with your example and it seems to work. Ive tried to code it for multiple tracking numbers, that is untested


Public Sub ProcessFile()

    Dim iHandle As Integer
    Dim sLine As String
    Dim bReadOff As Boolean
    Dim sTrackingNumber As String
    Dim bLoop As Boolean
    Dim sData(4) As String
    Dim iDataCnt As Integer
    Dim bFirst As Boolean
   
   
    On Error Resume Next
   
    iHandle = FreeFile
    Open "c:\tracker.html" For Input As #iHandle
   
    bFirst = True
    bReadOff = False
    sTrackingNumber = ""
    Do While Not EOF(iHandle)
        Input #iHandle, sLine
        'Debug.Print sLine
       
        If InStr(1, UCase$(sLine), "<STRONG>") > 0 Then
            'Display last lot read
            If bFirst = False Then
                Debug.Print "Tracking Number : ", sTrackingNumber
                Debug.Print "Date", sData(0)
                Debug.Print "Depot", sData(1)
                Debug.Print "Action", sData(2)
                Debug.Print "Reason", sData(3)
            End If
           
            For iDataCnt = 0 To UBound(sData)
                sData(iDataCnt) = ""
            Next iDataCnt
           
            bFirst = False
            bReadOff = True
            sTrackingNumber = Mid$(sLine, InStr(1, UCase$(sLine), "<STRONG>") + 9, Len(sLine) - InStr(1, UCase$(sLine), "</STRONG>") - 1)
            Debug.Print "Found", sTrackingNumber
        End If
       
        If bReadOff = True And InStr(1, UCase$(sLine), "<TR>") > 0 Then
            'Read next line - expect TH or TD
            Input #iHandle, sLine
            'Ignore headings
            If InStr(1, UCase$(sLine), "<TD>") > 0 Then
                'Read off
                bLoop = True
                iDataCnt = 0
                Do While bLoop = True
                    If EOF(iHandle) = True Then
                        bLoop = False
                    ElseIf InStr(1, UCase$(sLine), "<TABLE>") > 0 Then
                        bLoop = False
                    ElseIf InStr(1, UCase$(sLine), "</TR>") > 0 Then
                        Debug.Print "Tracking Number : ", sTrackingNumber
                        Debug.Print "Date", sData(0)
                        Debug.Print "Depot", sData(1)
                        Debug.Print "Action", sData(2)
                        Debug.Print "Reason", sData(3)
                        iDataCnt = 0
                    ElseIf InStr(1, UCase$(sLine), "<TD>") > 0 Then
                        If iDataCnt < 4 Then
                            sData(iDataCnt) = Replace(Mid$(sLine, InStr(1, UCase$(sLine), "<TD>") + 4), UCase$("</TD>"), "")
                            Debug.Print sData(iDataCnt)
                            iDataCnt = iDataCnt + 1
                        End If
                    End If
                    Input #iHandle, sLine
                Loop
                bReadOff = False
            End If
        End If
    Loop
       
    If sTrackingNumber <> "" Then
        Debug.Print "Tracking Number : ", sTrackingNumber
        Debug.Print "Date", sData(0)
        Debug.Print "Depot", sData(1)
        Debug.Print "Action", sData(2)
        Debug.Print "Reason", sData(3)
    End If

    Close #iHandle
           
End Sub



0
 
LVL 49

Expert Comment

by:Gustav Brock
Comment Utility
You can also use Excel to read the data live, no copy.
Use this iqy query:

WEB
1
http://www.hdnl.co.uk/tracker.aspx?UPI=806290025850a

Selection=1
Formatting=None
PreFormattedTextToColumns=True
ConsecutiveDelimitersAsOne=True
SingleBlockTextImport=False
DisableDateRecognition=False

Now, update this, link the worksheet to Access.

/gustav
0
 
LVL 49

Assisted Solution

by:Gustav Brock
Gustav Brock earned 125 total points
Comment Utility
Looks like you need to download the page, then link to this with the html link wizard.

First download:

Declaration:
Private Declare Function URLDownloadToFile Lib "urlmon" Alias "URLDownloadToFileA" ( _
  ByVal pCaller As Long, _
  ByVal szURL As String, _
  ByVal szFileName As String, _
  ByVal dwReserved As Long, _
  ByVal lpfnCB As Long) _
  As Long

Public Function DownloadFile( _
  ByVal strURL As String, _
  ByVal strLocalFilename As String) _
  As Long
 
' Download file or page with public access from the web.
' 2004-12-17. Cactus Data ApS, CPH.

' Usage, download a file:
' lngRet = DownloadFile("http://www.databaseadvisors.com/Graphics/conf2002/2002ConferencePicsbySmolin/images/dba02smolin27.jpg", "c:\happybassett.jpg")
'
' Usage, download a page:
' lngRet = DownloadFile("http://www.databaseadvisors.com/conf2002/conf200202.htm", "c:\dbaconference.htm")

' Returns 0 if success, error code if not.
' Error codes:
' -2146697210 "file not found".
' -2146697211 "domain not found".

' Limitation.
' Does not check if local file was created successfully.

  Dim lngRetVal As Long
   
  lngRetVal = URLDownloadToFile(0, strURL & vbNullChar, strLocalFilename & vbNullChar, 0, 0)
 
  DownloadFile = lngRetVal
 
End Function

Now, any time before you open the linked table in Access, run the function to import. It's a split second.

/gustav
0
 
LVL 49

Expert Comment

by:Gustav Brock
Comment Utility
In Html wizard select Tables, not Lists.

/gustav
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

In the article entitled Working with Objects – Part 1 (http://www.experts-exchange.com/Microsoft/Development/MS_Access/A_4942-Working-with-Objects-Part-1.html), you learned the basics of working with objects, properties, methods, and events. In Work…
In a multiple monitor setup, if you don't want to use AutoCenter to position your popup forms, you have a problem: where will they appear?  Sometimes you may have an additional problem: where the devil did they go?  If you last had a popup form open…
What’s inside an Access Desktop Database. Will look at the basic interface, Navigation Pane (Database Container), Tables, Queries, Forms, Report, Macro’s, and VBA code.
In Microsoft Access, when working with VBA, learn some techniques for writing readable and easily maintained code.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

6 Experts available now in Live!

Get 1:1 Help Now