Save data from Msxml2.DOMDocument.6.0 object

I get a string returned containing the data but oRow is always = nothing, not sure why, I need to save that data as a row in a table currently its saving the whole xml string.

ive deleted references to the website so as not to break the rules suffice to say everything works upto  the for loop

I have tried looking at the DOM object model but not understanding how I pull the parts out.

Private Sub SubmitVIN(strVIN As String, strVeh As Long)
Dim rsmc As Dao.Recordset
  Dim oRE As Object
  Dim strHTML As String
  Dim oM As Object
  
    Dim sSearchString As String
    
    sSearchString = strVIN


    'GETTING THE HTML CONTENT OF THE PAGE
    Set winHttpReq = CreateObject("WinHttp.WinHttpRequest.5.1")
    winHttpReq.SetTimeouts 0, 600000, 600000, 600000

    sHTML = winHttpReq.responseText

    
    
    'CLEANING AND FIXING NOT-WELL FORMED TAGS TO BE ABLE TO PROCESS IT IN AN XML DOM
    sHTML = "<xml>" & sHTML & "</xml>"
    sHTML = Replace(sHTML, "</A></span>", "")
    sHTML = Replace(sHTML, "</A>", "</a>")
    sHTML = Replace(sHTML, "<TR>", "<tr>")
    sHTML = Replace(sHTML, "<TD>", "<td>")
    sHTML = Replace(sHTML, "</TD>", "</td>")
    
    
    Dim objDOM As Object
    Set objDOM = CreateObject("Msxml2.DOMDocument.6.0")
    objDOM.LoadXML sHTML
    Debug.Print sHTML
    Debug.Print "********************end get xml ***************"
    *********************** works fine till here ******************
    Dim oRow As Object
    
    
    
    'parse the result

    ' READ ROW 1 BY 1
    For Each oRow In objDOM.SelectNodes("//tr[position()>1]")
        MsgBox "TradeName: " & oRow.SelectSingleNode("td[1]").Text & " - Pattern:" & oRow.SelectSingleNode("td[2]").Text & " - Country:" & oRow.SelectSingleNode("td[3]").Text
    Next


 ' save the whole  return string for now
    rsmc.AddNew
 
    rsmc.Fields("StrXML").Value = sHTML
    rsmc.Fields("Vin").Value = strVIN
    rsmc.Fields("VehicleID").Value = strVeh
    rsmc.Update

End Sub

Open in new window


string output:
        </tr>
            
    <tr>
    <td><a href='moto-honda-identification-ST1100AV-8290.html'>
    PAN EUROPEAN ST 1100 ABS</a>
    <br><span class="art-indexbutton-wrapper"><span class="art-indexbutton-l"> </span><span class="art-indexbutton-r"> </span><a href="moto-honda-identification-ST1100AV-8290.html" class="art-indexbutton">parts for sale</td>
    <td>ST1100AV</td>
    <td>AUSTRIA</td>
    <td>1997</td>
    <td>SC26B</td> 
    <td>JH2SC26B*</td>
    <td>VM100001</td>
    <td>VM100401</td>
  </tr>
      
    <tr>
    <td><a href='moto-honda-identification-ST1100AV-8291.html'>
    PAN EUROPEAN ST 1100 ABS</a>
    <br><span class="art-indexbutton-wrapper"><span class="art-indexbutton-l"> </span><span class="art-indexbutton-r"> </span><a href="moto-honda-identification-ST1100AV-8291.html" class="art-indexbutton">parts for sale</td>
    <td>ST1100AV</td>
    <td>UNITED KINGDOM</td>
    <td>1997</td>
    <td>SC26B</td> 
    <td>JH2SC26B*</td>
    <td>VM100001</td>
    <td>VM100635</td>
  </tr>
      
    <tr>
    <td><a href='moto-honda-identification-ST1100AV-8292.html'>
    PAN EUROPEAN ST 1100 ABS</a>
    <br><span class="art-indexbutton-wrapper"><span class="art-indexbutton-l"> </span><span class="art-indexbutton-r"> </span><a href="moto-honda-identification-ST1100AV-8292.html" class="art-indexbutton">parts for sale</td>
    <td>ST1100AV</td>
    <td>EUROPE</td>
    <td>1997</td>
    <td>SC26B</td> 
    <td>JH2SC26B*</td>
    <td>VM100001</td>
    <td>VM100624</td>
  </tr>
      
    <tr>
    <td><a href='moto-honda-identification-ST1100AV-8295.html'>
    PAN EUROPEAN ST 1100 ABS</a>
    <br><span class="art-indexbutton-wrapper"><span class="art-indexbutton-l"> </span><span class="art-indexbutton-r"> </span><a href="moto-honda-identification-ST1100AV-8295.html" class="art-indexbutton">parts for sale</td>
    <td>ST1100AV</td>
    <td>ITALY</td>
    <td>1997</td>
    <td>SC26B</td> 
    <td>JH2SC26B*</td>
    <td>VM100006</td>
    <td>VM100445</td>
  </tr>
  </table></xml>

Open in new window

PeterBaileyUkAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Robert ShermanOwnerCommented:
One thing I notice off the top is that you've got some anchor tags that are missing closing tags.   Not sure if the object is that strict about properly formatted XML/XHTML, but that might be the culprit.  

See line 6 and 19 of your output code (2nd code window)... You'll notice an open anchor tag, but then the next tag is a closing /td tag.  

I think when you are loading the XML with "objDOM.LoadXML sHTML", that method returns a boolean to indicate whether the load was successful or not... try debug.print'ing that value out to see if it's a parse error.
0
PeterBaileyUkAuthor Commented:
ok i amended to this and it returns an error as you said but not sure how to correct that:

    Dim objDOM As Object
    Set objDOM = CreateObject("Msxml2.DOMDocument.6.0")
    If Not objDOM.loadXML(sHTML) Then
           Err.Raise objDOM.parseError.errorCode, , objDOM.parseError.reason
    End If
eeex.JPG
0
Robert ShermanOwnerCommented:
So, that error confirms my suspicion that the parser is expecting well-formed XML.  

(Note: As I'm editing my comment here, my "can of worms" detector is about to explode.)

You already have some code that is doing some pre-processing of the response you receive from the web server your are polling.   From the looks of it you are already correcting for several non-conforming tags, so you could just do some additional correcting to make the string well-formed.  

However, now that I'm looking more closely, the missing /a tag may not be the only error.  From the snippet you provided, it looks like there is an opening "span" tag that is also never properly closed (in the same line as the problem-causing anchor tag).  

It's entirely possible to use Regular Expressions to add the necessary close tags, but keep in mind that future changes to the site on the server end could break things at any time.   If you have any influence over the web content that you are trying to digest, it may be easier in the end to have the site's developers fix the poor HTML they are serving.  

I'm no Regular Expression pro, but it looks like the lines in question all start with a <br> tag.   (assuming that the line breaks are correctly represented here on EE)   So you could conceivably come up with a regex that basically looks for lines that begin with "<br>" followed by an opening <span ..."> tag.   You would do a replace to add the stuff that the regex found up to that opening span tag, and insert a closing span tag along with the remainder of the line.  

Then, you look at lines that begin with <br> followed by some stuff that isn't an opening anchor tag followed by an opening anchor tag followed by some text that isn't a "</td>".  You then replace all that stuff with itself followed by an "</a>" followed by the remainder of the line.  (just a "</td>" at this point)

Sound complicated enough?   I'm not finished answering yet, but in the interest of getting this thought out there and perhaps open it up to a regex guru that can come up with an expression in her sleep, I'm going to submit this and follow up after stepping away from the computer for a bit.  :D
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

PeterBaileyUkAuthor Commented:
It might be worth leaving it as I have no influence over the return server side as I am scraping some data from the web and rules dont allow me to post the site url.

I wonder then to just strip out the tags and leave only the data maybe with regx?
0
PeterBaileyUkAuthor Commented:
ok ive almost got it, have gone with this:
Function stripHTML(strHTML)
'Strips the HTML tags from strHTML
   Dim oRE As Object
  Dim objRegExp, strOutput
  Set oRE = CreateObject("VBScript.RegExp")

  oRE.IgnoreCase = True
  oRE.Global = True
  oRE.Pattern = "<(.|\n)+?>"

  'Replace all HTML tag matches with the empty string
  strOutput = oRE.Replace(strHTML, "")
 
  'Replace all < and > with &lt; and &gt;
  strOutput = Replace(strOutput, "<", "&lt;")
  strOutput = Replace(strOutput, ">", "&gt;")
 
  stripHTML = strOutput    'Return the value of strOutput

  Set oRE = Nothing
End Function


which results in the out:
 PAN EUROPEAN 1300 ABS
      parts for sale...............................****************** only rogue part left
    ST1300A2
    UNITED KINGDOM
    2002
    SC51A
    JH2SC51A*
    2M000001
    2M099999
0
PeterBaileyUkAuthor Commented:
your comments have been very very helpful, thank you
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Access

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.