<

A MySQL Tidbit: In-line XML Parsing

Published on
15,833 Points
8,033 Views
3 Endorsements
Last Modified:

Introduction

In this installment of my SQL tidbits, I will be looking at parsing Extensible Markup Language (XML) directly passed as string parameters to MySQL 5.1.5 or higher. These would be instances where LOAD_FILE (version 4.1 or higher) and LOAD XML (version 5.5 or higher) are not optimal either due to permission/file or application restrictions.

Background

This particular instance was true for EE member JagdishDevaku, who asked the following question: XML Parsing in MySQL.

We are planning for a database migration from MS SQL Server to MySQL. Most of our procedures in SQL Server are working with XML data, but we are issues where trying to convert these procedures to MySQL.

Below is the sample XML file. We are sending XML data (not the XML file) directly as a input to the procedure.
[step=""]
<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
   <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
   <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
   <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
   <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
   <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
   <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
   <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
   <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
   <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
   <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
   <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
   <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
   <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
   <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
   <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
   <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
   <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
</Attendance_Info>

Open in new window

[/step]
With the XML we want the data in the following tabular format in MySQL
[step=""]
Date | Attendance | Personal_Id

Open in new window

[/step]

The Approach
When it was determined that the other approaches I mentioned earlier from the MySQL tech resources article on XML, Using XML in MySQL 5.1 and 6.0, another EE Expert, aikimark, adapted the xmldump_load procedure from the MySQL XML article's Importing XML and XML Data section to the following (cleaned a bit for syntax correctness):
SET nrows = ExtractValue(xml_parm, 'count(//Attendance_Info/Attendance)');

SET rownum = 1;

WHILE rownum <= nrows DO
   SET At_Attendance = ExtractValue(xml_parm, '//Attendance_Info/Attendance[$rownum]/@Attendance');
   SET At_PersonalId = ExtractValue(xml_parm, '//Attendance_Info/Attendance[$rownum]/@PersonalId');
  
   SET rownum = rownum + 1;
END WHILE;

Open in new window


So in short:
Determine the total count of "Attendance" child nodes that appear in the XML passed using ExtractValue(), one of two XML functions introduced with MySQL 5.1.5, which provides a mechanism to "[return] the text (CDATA) of the first text node which is a child of the element(s) matched by the XPath [1.0] expression".
[step=""]Note: at the time of writing this article, all versions of MySQL above 5.1.5 still only support XPath 1.0 in a limited fashion, so you may find some pre XPath 2.0 features that are also non-functional. For a full listing of the limits|bugs, please refer to manual. Consequently, some nice location paths do exist, so some shorthand to above is possible like:
extractvalue(@xml, '//child::*[1]/@PersonalId')

Open in new window

extractvalue(@xml, '/*/@Date')

Open in new window

[/step]
Using this count as the upper bound, we iterate using a while loop until we have counted from 1 to total number of rows.
Again using ExtractValue() we take advantage of our loop index rownum as the index to our XPath results using '$'+{variable-name} or $rownum in this example.

Bueller, ... Bueller, ... Bueller
So with an attendance table structure like:
drop table if exists attendance;
create table `attendance` (
  `att_date` datetime,
  `att_attend_id` int(11),
  `att_person_id` int(11),
  `att_serial_no` int(11),
  `att_id` bigint(20) not null auto_increment,
  primary key (`att_id`)
) engine=InnoDB default charset=latin1;

Open in new window

...the iterative procedure may look something like this:
drop procedure if exists InsertAttendanceXmlProc;

delimiter //

create procedure InsertAttendanceXmlProc(in xml varchar(8000))
begin
   declare dt datetime;
   declare nrows int;
   declare rownum int default 1;
   
   -- to be safe, explicitly convert date from xml to datetime - expects mm/dd/yyyy hh:mi:ss 
   -- http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_str-to-date
   set dt = str_to_date(coalesce(nullif(extractvalue(xml, '//@Date'), ''), now()), '%m/%d/%Y %H:%i:%s');

   set nrows = extractvalue(xml, 'count(//Attendance)');
   
   -- decrement nrows to save on creating another variable
   while rownum <= nrows do
      insert into attendance(att_date, att_attend_id, att_person_id, att_serial_no)
      select dt
           , coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@Attendance'), ''),0)
           , coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@PersonalId'), ''),0)
           , coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@SerialNumber'), ''),0);
   
      set rownum = rownum + 1;
   end while;

end//

delimiter ;

Open in new window

As shown in the question thread, it is also a possibility, if the sequential order of XML nodes doesn't matter, to use the same variable used to store count of rows as index for our while loop but using a decrement approach instead.

[step=""]««test»»
See how our procedure works.
set @xml = '<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
   <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
   <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
   <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
   <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
   <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
   <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
   <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
   <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
   <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
   <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
   <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
   <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
   <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
   <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
   <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
   <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
   <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
</Attendance_Info>';
call InsertAttendanceXmlProc( @xml );

Open in new window

[/step]

But that is just background, right ?

It's All About Rownum

I brought you here to see some in-line SQL code and we haven't seen any yet now have we...well, for your patience, here you go (explanation to follow):
set @xml = '<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
   <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
   <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
   <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
   <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
   <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
   <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
   <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
   <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
   <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
   <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
   <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
   <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
   <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
   <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
   <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
   <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
   <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
</Attendance_Info>';
select extractvalue(@xml, '//@Date') as att_date
     , extractvalue(@xml, '//child::*[$@rownum]/@Attendance') as att_attend_id
     , extractvalue(@xml, '//child::*[$@rownum]/@PersonalId') as att_person_id
     , extractvalue(@xml, '//child::*[$@rownum]/@SerialNumber') as att_serial_no
from util.numbers
where n between 1 and extractvalue(@xml, 'count(//Attendance)')
and @rownum:=n
;

Open in new window

With luck, you will see that results of this query are similar to what our procedure added to the attendance table during our stored procedure.
XML Parse to Table Results
Since this solution is specific to a particular XML string, it is useful to be able to do this very painlessly and in-line at times as you now have control at time of writing as to how fields will be converted/handled versus getting how it was programmed in the procedure each time. Think that is cool? Well, here is how it works:
Rule 1: Use your tool belt, you built it for a reason. We looked previously at creating a sequential numbers table with the premise it had value. Well here we use util.numbers to create rows for each Attendance node we find in our XML based on the same XPath aggregate count() function we used in our iterative approach.
Rule 2: Continue to use what you know. Since we can't just stop at one nice utility in a solution, the next bit you will notice is "@rownum:=n" which is our ranking technique modified a bit. In this case, the numbers sequence is already getting us our row numbers; however, the XPath expression for ExtractValue() needs to be constant so concatenating "n" directly will not work, so we use @rownum (can be named @xyz by the way -- no magic here, just a plain user-defined variable) that we can now use in our XPath with $ as we did in our procedure.

Bueller
So now that we only have to take attendance at one time, we can update our procedure to something like this:
drop procedure if exists InsertAttendanceXmlProc2;

delimiter //

create procedure InsertAttendanceXmlProc2(in xml varchar(8000))
begin
   insert into attendance(att_date, att_attend_id, att_person_id, att_serial_no)
   select str_to_date(coalesce(nullif(extractvalue(@xml, '//@Date'), ''), now()), '%m/%d/%Y %H:%i:%s')
        , coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@Attendance'), ''),0)
        , coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@PersonalId'), ''),0)
        , coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@SerialNumber'), ''),0)
   from util.numbers
   where n between 1 and extractvalue(@xml, 'count(//Attendance)')
   and @rownum:=n
   ;
end//

delimiter ;

Open in new window


For such a small record set, there was not much performance issues to speak of; however, a nice side benefit of this neat trick is that it eliminates the extra overhead of all the added variables and iteration previously needed, hopefully bringing it in line with speed of file import approaches. As I get more factual statistics on this, I will be sure to post, but would love your feedback also.

That's it !

Conclusion

Yet another SQL tool to add to the arsenal. We saw here how useful and powerful our numbers table can be and another useful application of user-defined variables. In addition, we learned about  ExtractValue(), one of the two MySQL XML functions. With this knowledge, you can probably make nice use of UpdateXML() which is the counterpart to ExtractValue() that allows you to replace a particular fragment of XML matched by XPath to a new XML fragment.

Thank you, all, for reading. It is my sincere hope that this trick serves you well!

Best regards and happy coding,

Kevin C. Cross, Sr. (mwvisa1)

References

Related Articles
http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html
http://www.experts-exchange.com/Database/MySQL/A_3573-A-MySQL-Tidbit-Quick-Numbers-Table-Generation.html
http://www.experts-exchange.com/Database/Miscellaneous/A_1555-Analytical-SQL-Where-do-you-rank.html

Related MySQL Documentation
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_load-file
http://dev.mysql.com/doc/refman/5.5/en/load-xml.html
http://dev.mysql.com/doc/refman/5.1/en/user-variables.html
http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html

Related Q&A
http://www.experts-exchange.com/Database/MySQL/Q_26384151.html

 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you found this article helpful, please click the Yes button after the question just below. This will give me (the author) a few points and might encourage me to write more articles.

If you didn't or otherwise feel the need to vote No, please first leave a comment to give me a chance to answer and perhaps to improve this article.

Thank you!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 
3
Comment
Author:Kevin Cross
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
11 Comments
 
LVL 46

Expert Comment

by:aikimark
Excellent article, Kevin.  (and quickly written, too)

If the SQL optimizer is smart enough, then it will replace the str_to_date() column with a fixed value.  Otherwise, the dt variable would reduce function execution and perform faster -- probably undetectable for relatively small XML parameters.

=============
Question: In the Where clause (below), why doesn't the "and @rownum:=n" expression limit the rows, rather than increment the @rownum variable?  It's a really neat trick, but I don't understand how it works.

where n between 1 and extractvalue(@xml, 'count(//Attendance)')
   and @rownum:=n

Open in new window

0
 
LVL 60

Author Comment

by:Kevin Cross
Good question, Mark!

The := is an assignment operator in this instance, so @rownum will always be equal to n; therefore, it is a filter that doesn't impact your rows but will successfully filter out @rownum = null so that you don't get an invalid row where the index passes to XPath is null|0.

For other readers' reference:
You can also assign a value to a user variable in statements other than SET. In this case, the assignment operator must be := and not = because = is treated as a comparison operator in non-SET statements:

Given the goal was to select specific number of columns for an INSERT statement, I used this form of the trick; however, this form may be easier to understand from what the @rownum portion of the trick is doing.
set @xml = '<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
   <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
   <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
   <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
   <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
   <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
   <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
   <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
   <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
   <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
   <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
   <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
   <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
   <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
   <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
   <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
   <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
   <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
</Attendance_Info>';
select @rownum:=n as row_num
     , extractvalue(@xml, '//@Date') as att_date
     , extractvalue(@xml, '//child::*[$@rownum]/@Attendance') as att_attend_id
     , extractvalue(@xml, '//child::*[$@rownum]/@PersonalId') as att_person_id
     , extractvalue(@xml, '//child::*[$@rownum]/@SerialNumber') as att_serial_no
from util.numbers
where n between 1 and extractvalue(@xml, 'count(//Attendance)')
;

Open in new window


Guess it can be argued that instead of providing 2 tricks, I probably provided 3 with the 3rd being that you can assign the user variables in the where clause if you don't want the result of the assignment to be displayed as a separate column.

Hope that makes more sense now.

Thanks for reading and the kind words about the article.

Regards,
Kevin
0
 
LVL 60

Author Comment

by:Kevin Cross
P.S. to the younger readers who might not have caught the reference from my weak attempt at "attendance" humor, see http://en.wikipedia.org/wiki/Ferris_Bueller's_Day_Off
0
Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

 
LVL 46

Expert Comment

by:aikimark
@Kevin

Thanks.  I missed the colon and only read the equals.

The more tricks, the better.

========
I recommend adding "attribute" or "attributes" tag, since that is where we are getting our data values.  This article may be the only one that leads the reader through the process of attribute data extraction.
0
 
LVL 60

Author Comment

by:Kevin Cross
You are most welcome!
Thanks, the keywords idea is a good one. Will update article tags.
0
 
LVL 46

Expert Comment

by:aikimark
@Kevin

While you're at it, add a "parameter" tag and some tag that would indicate this applies to stored procs and user-defined functions.
0
 
LVL 14

Expert Comment

by:Jagdish Devaku
Great article...
0
 
LVL 60

Author Comment

by:Kevin Cross
Thanks, Jagdish!
0
 
LVL 60

Author Comment

by:Kevin Cross
0
 

Expert Comment

by:Satish Kumar
Hello,

My question is I have  XMLData I am trying to insert into 2 tables.
<Details>
    <Levels>
        <Level>
            <SectionId>681</SectionId>
            <RiskId>0</RiskId>
            <WAC>Test </WAC>
            <DOCReview>Test</DOCReview>
            <MAT>1</MAT>
            <SDV>1</SDV>
            <Questions>
                <QuestionId>1186</QuestionId>
                <QuestionId>1187</QuestionId>
            </Questions>
        </Level>
        <Level>
            <SectionId>681</SectionId>
            <RiskId>1</RiskId>
            <WAC>Test </WAC>
            <DOCReview>Test</DOCReview>
            <MAT>2</MAT>
            <SDV>2</SDV>
            <Questions>
                <QuestionId>1186</QuestionId>
                <QuestionId>1187</QuestionId>
            </Questions>
        </Level>
        <Level>
            <SectionId>681</SectionId>
            <RiskId>2</RiskId>
            <WAC>Test </WAC>
            <DOCReview>Test</DOCReview>
            <MAT>1</MAT>
            <SDV>3</SDV>
            <Questions>
                <QuestionId>1209</QuestionId>
                <QuestionId>1210</QuestionId>
            </Questions>
        </Level>
        <Level>
            <SectionId>681</SectionId>
            <RiskId>3</RiskId>
            <WAC>Test </WAC>
            <DOCReview>Test</DOCReview>
            <MAT>2</MAT>
            <SDV>1</SDV>
            <Questions>
                <QuestionId>1186</QuestionId>
                <QuestionId>1187</QuestionId>
                <QuestionId>1209</QuestionId>
                <QuestionId>1210</QuestionId>
            </Questions>
        </Level>
    </Levels>
</Details>

Open in new window


I can insert data into the first table which has no problem but the second table has a problem. Each Level has child elements " Questions " I unable to insert child data into the second table can you please help me.
Thanks in Advance.
0
 
 

Administrative Comment

by:Mr. Wolfe
@Satish Kumar,  typically questions like this will not be answered in the article comments. You will get much better response by asking a question. Just click on the blue, "Ask a Question" button at the top.  By asking a question in the QA thread, our Experts will be alerted to your query.

http://support.experts-exchange.com/customer/en/portal/articles/336330-how-do-i-ask-a-question-
http://support.experts-exchange.com/customer/portal/articles/756544-how-to-succeed-at-ee-as-an-asker

Mr Wolfe
EE Moderator
0

Featured Post

Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

Join & Write a Comment

In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month