Save data from Msxml2.DOMDocument.6.0 object

I get a string returned containing the data but oRow is always = nothing, not sure why, I need to save that data as a row in a table currently its saving the whole xml string.

ive deleted references to the website so as not to break the rules suffice to say everything works upto  the for loop

I have tried looking at the DOM object model but not understanding how I pull the parts out.

Private Sub SubmitVIN(strVIN As String, strVeh As Long)
Dim rsmc As Dao.Recordset
  Dim oRE As Object
  Dim strHTML As String
  Dim oM As Object
    Dim sSearchString As String
    sSearchString = strVIN

    Set winHttpReq = CreateObject("WinHttp.WinHttpRequest.5.1")
    winHttpReq.SetTimeouts 0, 600000, 600000, 600000

    sHTML = winHttpReq.responseText

    sHTML = "<xml>" & sHTML & "</xml>"
    sHTML = Replace(sHTML, "</A></span>", "")
    sHTML = Replace(sHTML, "</A>", "</a>")
    sHTML = Replace(sHTML, "<TR>", "<tr>")
    sHTML = Replace(sHTML, "<TD>", "<td>")
    sHTML = Replace(sHTML, "</TD>", "</td>")
    Dim objDOM As Object
    Set objDOM = CreateObject("Msxml2.DOMDocument.6.0")
    objDOM.LoadXML sHTML
    Debug.Print sHTML
    Debug.Print "********************end get xml ***************"
    *********************** works fine till here ******************
    Dim oRow As Object
    'parse the result

    ' READ ROW 1 BY 1
    For Each oRow In objDOM.SelectNodes("//tr[position()>1]")
        MsgBox "TradeName: " & oRow.SelectSingleNode("td[1]").Text & " - Pattern:" & oRow.SelectSingleNode("td[2]").Text & " - Country:" & oRow.SelectSingleNode("td[3]").Text

 ' save the whole  return string for now
    rsmc.Fields("StrXML").Value = sHTML
    rsmc.Fields("Vin").Value = strVIN
    rsmc.Fields("VehicleID").Value = strVeh

End Sub

Open in new window

string output:
    <td><a href='moto-honda-identification-ST1100AV-8290.html'>
    PAN EUROPEAN ST 1100 ABS</a>
    <br><span class="art-indexbutton-wrapper"><span class="art-indexbutton-l"> </span><span class="art-indexbutton-r"> </span><a href="moto-honda-identification-ST1100AV-8290.html" class="art-indexbutton">parts for sale</td>
    <td><a href='moto-honda-identification-ST1100AV-8291.html'>
    PAN EUROPEAN ST 1100 ABS</a>
    <br><span class="art-indexbutton-wrapper"><span class="art-indexbutton-l"> </span><span class="art-indexbutton-r"> </span><a href="moto-honda-identification-ST1100AV-8291.html" class="art-indexbutton">parts for sale</td>
    <td>UNITED KINGDOM</td>
    <td><a href='moto-honda-identification-ST1100AV-8292.html'>
    PAN EUROPEAN ST 1100 ABS</a>
    <br><span class="art-indexbutton-wrapper"><span class="art-indexbutton-l"> </span><span class="art-indexbutton-r"> </span><a href="moto-honda-identification-ST1100AV-8292.html" class="art-indexbutton">parts for sale</td>
    <td><a href='moto-honda-identification-ST1100AV-8295.html'>
    PAN EUROPEAN ST 1100 ABS</a>
    <br><span class="art-indexbutton-wrapper"><span class="art-indexbutton-l"> </span><span class="art-indexbutton-r"> </span><a href="moto-honda-identification-ST1100AV-8295.html" class="art-indexbutton">parts for sale</td>

Open in new window

Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Robert ShermanOwnerCommented:
One thing I notice off the top is that you've got some anchor tags that are missing closing tags.   Not sure if the object is that strict about properly formatted XML/XHTML, but that might be the culprit.  

See line 6 and 19 of your output code (2nd code window)... You'll notice an open anchor tag, but then the next tag is a closing /td tag.  

I think when you are loading the XML with "objDOM.LoadXML sHTML", that method returns a boolean to indicate whether the load was successful or not... try debug.print'ing that value out to see if it's a parse error.
PeterBaileyUkAuthor Commented:
ok i amended to this and it returns an error as you said but not sure how to correct that:

    Dim objDOM As Object
    Set objDOM = CreateObject("Msxml2.DOMDocument.6.0")
    If Not objDOM.loadXML(sHTML) Then
           Err.Raise objDOM.parseError.errorCode, , objDOM.parseError.reason
    End If
Robert ShermanOwnerCommented:
So, that error confirms my suspicion that the parser is expecting well-formed XML.  

(Note: As I'm editing my comment here, my "can of worms" detector is about to explode.)

You already have some code that is doing some pre-processing of the response you receive from the web server your are polling.   From the looks of it you are already correcting for several non-conforming tags, so you could just do some additional correcting to make the string well-formed.  

However, now that I'm looking more closely, the missing /a tag may not be the only error.  From the snippet you provided, it looks like there is an opening "span" tag that is also never properly closed (in the same line as the problem-causing anchor tag).  

It's entirely possible to use Regular Expressions to add the necessary close tags, but keep in mind that future changes to the site on the server end could break things at any time.   If you have any influence over the web content that you are trying to digest, it may be easier in the end to have the site's developers fix the poor HTML they are serving.  

I'm no Regular Expression pro, but it looks like the lines in question all start with a <br> tag.   (assuming that the line breaks are correctly represented here on EE)   So you could conceivably come up with a regex that basically looks for lines that begin with "<br>" followed by an opening <span ..."> tag.   You would do a replace to add the stuff that the regex found up to that opening span tag, and insert a closing span tag along with the remainder of the line.  

Then, you look at lines that begin with <br> followed by some stuff that isn't an opening anchor tag followed by an opening anchor tag followed by some text that isn't a "</td>".  You then replace all that stuff with itself followed by an "</a>" followed by the remainder of the line.  (just a "</td>" at this point)

Sound complicated enough?   I'm not finished answering yet, but in the interest of getting this thought out there and perhaps open it up to a regex guru that can come up with an expression in her sleep, I'm going to submit this and follow up after stepping away from the computer for a bit.  :D

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

PeterBaileyUkAuthor Commented:
It might be worth leaving it as I have no influence over the return server side as I am scraping some data from the web and rules dont allow me to post the site url.

I wonder then to just strip out the tags and leave only the data maybe with regx?
PeterBaileyUkAuthor Commented:
ok ive almost got it, have gone with this:
Function stripHTML(strHTML)
'Strips the HTML tags from strHTML
   Dim oRE As Object
  Dim objRegExp, strOutput
  Set oRE = CreateObject("VBScript.RegExp")

  oRE.IgnoreCase = True
  oRE.Global = True
  oRE.Pattern = "<(.|\n)+?>"

  'Replace all HTML tag matches with the empty string
  strOutput = oRE.Replace(strHTML, "")
  'Replace all < and > with &lt; and &gt;
  strOutput = Replace(strOutput, "<", "&lt;")
  strOutput = Replace(strOutput, ">", "&gt;")
  stripHTML = strOutput    'Return the value of strOutput

  Set oRE = Nothing
End Function

which results in the out:
      parts for sale...............................****************** only rogue part left
PeterBaileyUkAuthor Commented:
your comments have been very very helpful, thank you
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Access

From novice to tech pro — start learning today.