Link to home
Start Free TrialLog in
Avatar of keschuster
keschuster

asked on

Parsing XML in Access

I have a field in Access that contains
<data-xml><asset-columns misc-10-enabled="yes" misc-10-name="PC" misc-1-enabled="yes" misc-1-name="Make" misc-2-enabled="yes" misc-2-name="Model" misc-3-enabled="yes" misc-3-name="Serial" misc-4-enabled="no" misc-5-enabled="no" misc-6-enabled="no" misc-7-enabled="no" misc-8-enabled="yes" misc-8-name="Status Code" misc-9-enabled="yes" misc-9-name="Original Company" /><location-columns misc-10-enabled="yes" misc-10-name="PC" misc-1-enabled="no" misc-2-enabled="no" misc-3-enabled="no" misc-4-enabled="no" misc-5-enabled="no" misc-6-enabled="no" misc-7-enabled="no" misc-8-enabled="no" misc-9-enabled="yes" misc-9-name="Original Company" /></data-xml>

Is there a way to parse this out so it is usable - perhaps into another table?
Avatar of Rey Obrero (Capricorn1)
Rey Obrero (Capricorn1)
Flag of United States of America image

what do you mean by usable?
What is your definition of 'usable' ?

mx
Avatar of keschuster
keschuster

ASKER

name value pairs in two columns
misc-10-name="PC" should output to misc-1 and PC
misc-2-name="Model" => misc-2 and Model

there will always be the two groups in the XML "asset columns" and "location columns"
Within each there will always be misc-1 thru misc-10
The values assinged to misc1 - 10 will always change
do you have a table to hold the values?

post the table structure here.
you have to use VBA and open the table as recordset to do this.

how is your VBA?
I don't but it would simply be
id - autoincrement
field - text
value - text
i don't know what you are thinking...

post sample data you want to see from the table after parsing the XML field.
This xml is comes in one column in an access table just as it appears in my initial post.  From this I would like to see

type                              field                                 value
asset                       misc-10                               PC
asset                       misc-1                                 Make
....
location                    misc-10                               PC
location                    misc-1                                 null
location                    misc-2                                 null
....
location                     misc-9                                Original Company
<data-xml>
<asset-columns misc-10-enabled="yes" misc-10-name="PC" misc-1-enabled="yes" misc-1-name="Make" misc-2-enabled="yes" misc-2-name="Model" misc-3-enabled="yes" misc-3-name="Serial" misc-4-enabled="no" misc-5-enabled="no" misc-6-enabled="no" misc-7-enabled="no" misc-8-enabled="yes" misc-8-name="Status Code" misc-9-enabled="yes" misc-9-name="Original Company" />
 
<location-columns misc-10-enabled="yes" misc-10-name="PC" misc-1-enabled="no" misc-2-enabled="no" misc-3-enabled="no" misc-4-enabled="no" misc-5-enabled="no" misc-6-enabled="no" misc-7-enabled="no" misc-8-enabled="no" misc-9-enabled="yes" misc-9-name="Original Company" />
 
</data-xml>

Open in new window

Bad link
"Bad link"
Sorry ... it was meant to jump to where you showed the Name/Value pair examples.

mx
This XML data is not well formed. The data is presented as attributes instead of fields. There is no way Access will understand that (it normally skips attributes). If your XML was well formed, you could save it as a file and import it into a table.

As it stands, the only way I see would be to write your own parser. For example: find the string "<asset-columns" and from that point the next ">". Store that portion as string strAsset. Then look in turn for "misc-1-name" to "misc-10-name", and then extract whatever is between the next pair of double quotes. Note that you can neglect the "misc-?-enabled" attributes: they simply indicate if there is a value for the matching "name" attribute.

This can all be done with the functions InStr() and Mid(), but it's an actual development project. Small, but still a VB project in its own right.

Are you certain you cannot obtain the data in a more straightforward format?

(°v°)
thats how the data is coming.  It's stored in the db like that.  SO I would need to build a parser to move through the xml, grab the values, then insert each name/value into a table when a pair is found.
ASKER CERTIFIED SOLUTION
Avatar of Markus Fischer
Markus Fischer
Flag of Switzerland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial

   Nice work harfang..right on the money.   I just went down this road myself not too long ago.  

   It can be a little tricky walking the XML file.  Best approach is to consider it a typical Tree structure.  I found this:

http://www.w3schools.com/xml/default.asp

  which is an excellent step by step on understanding XML.

  I also found this:

http://support.microsoft.com/kb/285329

  Which seems to be a generic way of handling this problem, but haven't had a chance to try it out yet.  I went the route you did.

HTH,
JimD
harfang this looks good.  It should work as an access sub right?  
 FWIW, I posted the procedure I came up with to parse an XML file which contained time card data in a denormalized fashion (there was Emp1, Emp2, Emp3, ....Emp7 fields for each section of the time card - weird setup).

  I'm posting this because:

1. It shows how to load a XML file directly from disk.
2. Shows both loop and fixed fetch logic.

  Hopefully this will be of benefit to someone.

JimD.
 
PS - NOT expecting any points!
Public Function ImportXMLFile(ByVal strFile As String) As Boolean
 
        ' OCS - JRD - 04/05/08 - Initial write.
 
        ' This function will process a single "time form" XML file
        ' and makes fundamental assumptions about the format of the XML file.
        ' If the XML file format is changed in any way, this code will fail.
 
        ' Each file is expected to have the following data:
        '
        '     SmartformInstance        (multiple per file)
        '       Form                   (one set per form instance)
        '         JobInfo              (one set per form instance)
        '         TravelStart          (one set per form instance)
        '         TravelEnd            (one set per form instance)
        '         ClockIn              (one set per form instance)
        '         StartBreak           (one set per form instance)
        '         EndBreak             (one set per form instance)
        '         ClockOut             (one set per form instance)
        '
        ' For the sections (Travel Start through Clock Out), seven (7)
        ' employee fields are expected.
 
        Dim objDomDoc As New DOMDocument
        Dim objNodeSmartForm As IXMLDOMNode
        Dim objNode_Sections As IXMLDOMNode
        Dim objNodeData As IXMLDOMNode
        Dim objNodeItem As IXMLDOMNode
 
        Dim dbs As DAO.Database
        Dim rst As DAO.Recordset
        Dim strSQL As String
 
        Dim lngForm As Long
        Dim intSection As Integer
        Dim intEmployee As Integer
 
        Dim strInstanceID As String
        Dim strInstanceDate As String
        Dim strInstanceType As String
        Dim strFormName As String
        Dim strJobNumber As String
        Dim strPerDiem As String
        Dim strValue As String
 
10      On Error GoTo ImportXMLFile_Error
 
20      ImportXMLFile = False
 
        ' Open a DAO Database connection
30      Set dbs = CurrentDb()
 
        ' Verify that such a file exists.  If not, then exit
40      If Dir(strFile) <> "" Then
 
          ' Open DOM object by loading the xml file.
50        objDomDoc.async = False
60        objDomDoc.Load strFile
 
          ' Loop through the form nodes.
70        For Each objNodeSmartForm In objDomDoc.documentElement.childNodes
 
 
            '  Get the header info
80          strInstanceID = Nz(objNodeSmartForm.childNodes(0).Text, "")
90          strInstanceDate = Nz(objNodeSmartForm.childNodes(1).Text, "")
100         strInstanceType = Nz(objNodeSmartForm.childNodes(2).Text, "")
110         strFormName = Nz(objNodeSmartForm.childNodes(3).childNodes(0).Text, "")
120         strJobNumber = Nz(objNodeSmartForm.childNodes(3).childNodes(2).childNodes(1).childNodes(2).Text, "")
130         strPerDiem = Nz(objNodeSmartForm.childNodes(3).childNodes(2).childNodes(2).childNodes(2).Text, "")
 
 
            ' Save header info
140         strSQL = "SELECT * FROM tblSmartFormInstance WHERE 1=0"
150         Set rst = dbs.OpenRecordset(strSQL, dbOpenDynaset)
160         rst.AddNew
170         rst![FormInstanceID] = strInstanceID
180         rst![FormInstanceDate] = CDate(Mid(strInstanceDate, 6, 2) & "/" & Mid(strInstanceDate, 9, 2) & "/" & Mid(strInstanceDate, 1, 4) & " " & Mid(strInstanceDate, 12, 8))
190         rst![TypeCode] = strInstanceType
200         rst![FormName] = strFormName
210         rst![JobNumber] = strJobNumber
220         rst![PerDiem] = strPerDiem
230         rst.Update
 
240         rst.Close
 
            ' Now get the details for the form
 
            ' Open detail table
250         strSQL = "SELECT * FROM tblSmartFormInstanceDetail WHERE 1=0"
260         Set rst = dbs.OpenRecordset(strSQL, dbOpenDynaset)
 
            ' Process the sections.  The code below depends on the same sections
            ' being there each time, having the same number of elements,
            ' and being in the same order.
            ' A more flexible approach would be to check the element data,
            ' then do a lookup on the section and field names and map to the tables.
            ' However we are doing only one form at the moment and the data will
            ' always be provided in the current format.
270         Set objNode_Sections = objNodeSmartForm.childNodes(3)
 
280         For intSection = 3 To 8
290           Set objNodeData = objNode_Sections.childNodes(intSection)
              ' objNodeData.childNodes(0).text is the section name
300           For intEmployee = 1 To 7
310             Set objNodeItem = objNodeData.childNodes(intEmployee)
320             If Nz(objNodeItem.childNodes(2).Text, "") <> "" Then
330               strValue = Nz(objNodeItem.childNodes(2).Text, "")
340               If strValue <> "" Then
350                 rst.AddNew
360                 rst![FormInstanceID] = strInstanceID
370                 rst![SectionTypeID] = intSection - 2
380                 rst![EmployeeListOrder] = intEmployee
390                 rst![EmployeeName] = strValue
400                 rst.Update
410               End If
420             End If
430             Set objNodeItem = Nothing
440           Next intEmployee
450           Set objNodeData = Nothing
460         Next intSection
 
470         rst.Close
 
480       Next
 
490       ImportXMLFile = True
 
500     End If
 
ImportXMLFile_Exit:
510     On Error Resume Next
 
        ' Clean up
520     Set rst = Nothing
530     Set dbs = Nothing
 
540     Set objNodeItem = Nothing
550     Set objNodeData = Nothing
560     Set objNode_Sections = Nothing
570     Set objNodeSmartForm = Nothing
580     Set objDomDoc = Nothing
 
590     Exit Function
 
ImportXMLFile_Error:
600     MsgBox "Unexpected Error"
610     Resume ImportXMLFile_Exit
 
End Function

Open in new window

Good stuff - I'll try to work with these today
> It should work as an access sub right?  

Let's say it's a template. The sub ParseXML accepts a string containing XML data in a format similar to your sample, and prints out all items in the first "root" element, the single <data-xml> in this case, along with their attributes.

You probably don't want to print them to the Immediate pane. You will have to adapt the code to do something useful with the data, for example storing it in a table. But the way to do this is certainly related to other fields of the table, so I left that up to you: do you want to update the same table? create a new table? if so what would be the key? how are you going to use the data? does it need to be normalized as properties or as fields? ...

@JimD: "I'm posting this because..."

You also demonstrate how to store the retrieved data in a table, and I'm sure this will be useful. Btw, I haven't seen numbered lines in over 20 years... I wonder why you did that.

Cheers!
(°v°)
<<You also demonstrate how to store the retrieved data in a table, and I'm sure this will be useful. Btw, I haven't seen numbered lines in over 20 years... I wonder why you did that.>>

  I use the undocumented vba.erl to return the line number in my error handlers.  Speeds up troubleshooting.

JimD.
> the undocumented vba.erl
I see, thanks for the info.
(°v°)
 
  I should explain a bit more.  The code above was done for someone else and out of habit, I added the line numbers.  But my normal error handler looks like this:

410     UnexpectedError ModuleName, RoutineName, Version, Err.Number, Err.Description, Err.Source, VBA.Erl

  So I get feedback on the module, routine, and exact line number where an error occured, which speeds up the trouble shooting process.  The UnexpectedError procedure writes an entry to a log file and optionally e-mails me with an error.

JimD.
That makes perfect sense. I know of many instances where this would have saved me hours! Too bad the simple "source text" line number isn't available... but then again writing a code numbering tool is quite simple as well. I should seriously consider this!
(°v°)
<<That makes perfect sense. I know of many instances where this would have saved me hours! Too bad the simple "source text" line number isn't available... but then again writing a code numbering tool is quite simple as well. I should seriously consider this!>>

  Check out MZ Tools:

http://www.mztools.com/v3/mztools3.aspx

  It has the line numbering built-in along with a host of other features.  Best part; it's free<g>.

JimD.