Community Pick: Many members of our community have endorsed this article.

A MySQL Tidbit: In-line XML Parsing

Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Father, husband and general problem solver who loves coding SQL, C#, Salesforce Apex or whatever.
Published:
Updated:

Introduction

In this installment of my SQL tidbits, I will be looking at parsing Extensible Markup Language (XML) directly passed as string parameters to MySQL 5.1.5 or higher. These would be instances where LOAD_FILE (version 4.1 or higher) and LOAD XML (version 5.5 or higher) are not optimal either due to permission/file or application restrictions.

Background

This particular instance was true for EE member JagdishDevaku, who asked the following question: XML Parsing in MySQL.

We are planning for a database migration from MS SQL Server to MySQL. Most of our procedures in SQL Server are working with XML data, but we are issues where trying to convert these procedures to MySQL.

Below is the sample XML file. We are sending XML data (not the XML file) directly as a input to the procedure.
[step=""]
<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
                         <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
                         <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
                         <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
                         <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
                         <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
                         <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
                         <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
                         <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
                         <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
                         <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
                         <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
                         <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
                         <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
                         <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
                         <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
                         <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
                         <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
                      </Attendance_Info>

Open in new window

[/step]
With the XML we want the data in the following tabular format in MySQL
[step=""]
Date | Attendance | Personal_Id

Open in new window

[/step]

The Approach
When it was determined that the other approaches I mentioned earlier from the MySQL tech resources article on XML, Using XML in MySQL 5.1 and 6.0, another EE Expert, aikimark, adapted the xmldump_load procedure from the MySQL XML article's Importing XML and XML Data section to the following (cleaned a bit for syntax correctness):
SET nrows = ExtractValue(xml_parm, 'count(//Attendance_Info/Attendance)');
                      
                      SET rownum = 1;
                      
                      WHILE rownum <= nrows DO
                         SET At_Attendance = ExtractValue(xml_parm, '//Attendance_Info/Attendance[$rownum]/@Attendance');
                         SET At_PersonalId = ExtractValue(xml_parm, '//Attendance_Info/Attendance[$rownum]/@PersonalId');
                        
                         SET rownum = rownum + 1;
                      END WHILE;

Open in new window


So in short:
Determine the total count of "Attendance" child nodes that appear in the XML passed using ExtractValue(), one of two XML functions introduced with MySQL 5.1.5, which provides a mechanism to "[return] the text (CDATA) of the first text node which is a child of the element(s) matched by the XPath [1.0] expression".
[step=""]Note: at the time of writing this article, all versions of MySQL above 5.1.5 still only support XPath 1.0 in a limited fashion, so you may find some pre XPath 2.0 features that are also non-functional. For a full listing of the limits|bugs, please refer to manual. Consequently, some nice location paths do exist, so some shorthand to above is possible like:
extractvalue(@xml, '//child::*[1]/@PersonalId')

Open in new window

extractvalue(@xml, '/*/@Date')

Open in new window

[/step]
Using this count as the upper bound, we iterate using a while loop until we have counted from 1 to total number of rows.
Again using ExtractValue() we take advantage of our loop index rownum as the index to our XPath results using '$'+{variable-name} or $rownum in this example.

Bueller, ... Bueller, ... Bueller
So with an attendance table structure like:
drop table if exists attendance;
                      create table `attendance` (
                        `att_date` datetime,
                        `att_attend_id` int(11),
                        `att_person_id` int(11),
                        `att_serial_no` int(11),
                        `att_id` bigint(20) not null auto_increment,
                        primary key (`att_id`)
                      ) engine=InnoDB default charset=latin1;

Open in new window

...the iterative procedure may look something like this:
drop procedure if exists InsertAttendanceXmlProc;
                      
                      delimiter //
                      
                      create procedure InsertAttendanceXmlProc(in xml varchar(8000))
                      begin
                         declare dt datetime;
                         declare nrows int;
                         declare rownum int default 1;
                         
                         -- to be safe, explicitly convert date from xml to datetime - expects mm/dd/yyyy hh:mi:ss 
                         -- http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_str-to-date
                         set dt = str_to_date(coalesce(nullif(extractvalue(xml, '//@Date'), ''), now()), '%m/%d/%Y %H:%i:%s');
                      
                         set nrows = extractvalue(xml, 'count(//Attendance)');
                         
                         -- decrement nrows to save on creating another variable
                         while rownum <= nrows do
                            insert into attendance(att_date, att_attend_id, att_person_id, att_serial_no)
                            select dt
                                 , coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@Attendance'), ''),0)
                                 , coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@PersonalId'), ''),0)
                                 , coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@SerialNumber'), ''),0);
                         
                            set rownum = rownum + 1;
                         end while;
                      
                      end//
                      
                      delimiter ;

Open in new window

As shown in the question thread, it is also a possibility, if the sequential order of XML nodes doesn't matter, to use the same variable used to store count of rows as index for our while loop but using a decrement approach instead.

[step=""]««test»»
See how our procedure works.
set @xml = '<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
                         <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
                         <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
                         <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
                         <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
                         <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
                         <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
                         <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
                         <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
                         <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
                         <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
                         <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
                         <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
                         <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
                         <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
                         <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
                         <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
                         <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
                      </Attendance_Info>';
                      call InsertAttendanceXmlProc( @xml );

Open in new window

[/step]

But that is just background, right ?

It's All About Rownum

I brought you here to see some in-line SQL code and we haven't seen any yet now have we...well, for your patience, here you go (explanation to follow):
set @xml = '<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
                         <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
                         <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
                         <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
                         <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
                         <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
                         <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
                         <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
                         <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
                         <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
                         <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
                         <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
                         <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
                         <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
                         <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
                         <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
                         <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
                         <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
                      </Attendance_Info>';
                      select extractvalue(@xml, '//@Date') as att_date
                           , extractvalue(@xml, '//child::*[$@rownum]/@Attendance') as att_attend_id
                           , extractvalue(@xml, '//child::*[$@rownum]/@PersonalId') as att_person_id
                           , extractvalue(@xml, '//child::*[$@rownum]/@SerialNumber') as att_serial_no
                      from util.numbers
                      where n between 1 and extractvalue(@xml, 'count(//Attendance)')
                      and @rownum:=n
                      ;

Open in new window

With luck, you will see that results of this query are similar to what our procedure added to the attendance table during our stored procedure.
XML Parse to Table Results
Since this solution is specific to a particular XML string, it is useful to be able to do this very painlessly and in-line at times as you now have control at time of writing as to how fields will be converted/handled versus getting how it was programmed in the procedure each time. Think that is cool? Well, here is how it works:
Rule 1: Use your tool belt, you built it for a reason. We looked previously at creating a sequential numbers table with the premise it had value. Well here we use util.numbers to create rows for each Attendance node we find in our XML based on the same XPath aggregate count() function we used in our iterative approach.
Rule 2: Continue to use what you know. Since we can't just stop at one nice utility in a solution, the next bit you will notice is "@rownum:=n" which is our ranking technique modified a bit. In this case, the numbers sequence is already getting us our row numbers; however, the XPath expression for ExtractValue() needs to be constant so concatenating "n" directly will not work, so we use @rownum (can be named @xyz by the way -- no magic here, just a plain user-defined variable) that we can now use in our XPath with $ as we did in our procedure.

Bueller
So now that we only have to take attendance at one time, we can update our procedure to something like this:
drop procedure if exists InsertAttendanceXmlProc2;
                      
                      delimiter //
                      
                      create procedure InsertAttendanceXmlProc2(in xml varchar(8000))
                      begin
                         insert into attendance(att_date, att_attend_id, att_person_id, att_serial_no)
                         select str_to_date(coalesce(nullif(extractvalue(@xml, '//@Date'), ''), now()), '%m/%d/%Y %H:%i:%s')
                              , coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@Attendance'), ''),0)
                              , coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@PersonalId'), ''),0)
                              , coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@SerialNumber'), ''),0)
                         from util.numbers
                         where n between 1 and extractvalue(@xml, 'count(//Attendance)')
                         and @rownum:=n
                         ;
                      end//
                      
                      delimiter ;

Open in new window


For such a small record set, there was not much performance issues to speak of; however, a nice side benefit of this neat trick is that it eliminates the extra overhead of all the added variables and iteration previously needed, hopefully bringing it in line with speed of file import approaches. As I get more factual statistics on this, I will be sure to post, but would love your feedback also.

That's it !

Conclusion

Yet another SQL tool to add to the arsenal. We saw here how useful and powerful our numbers table can be and another useful application of user-defined variables. In addition, we learned about  ExtractValue(), one of the two MySQL XML functions. With this knowledge, you can probably make nice use of UpdateXML() which is the counterpart to ExtractValue() that allows you to replace a particular fragment of XML matched by XPath to a new XML fragment.

Thank you, all, for reading. It is my sincere hope that this trick serves you well!

Best regards and happy coding,

Kevin C. Cross, Sr. (mwvisa1)

References

Related Articles
http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html
https://www.experts-exchange.com/Database/MySQL/A_3573-A-MySQL-Tidbit-Quick-Numbers-Table-Generation.html
https://www.experts-exchange.com/Database/Miscellaneous/A_1555-Analytical-SQL-Where-do-you-rank.html

Related MySQL Documentation
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_load-file
http://dev.mysql.com/doc/refman/5.5/en/load-xml.html
http://dev.mysql.com/doc/refman/5.1/en/user-variables.html
http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html

Related Q&A
https://www.experts-exchange.com/questions/26384151/XML-Parsing-in-MySQL.html

 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you found this article helpful, please click the Yes button after the question just below. This will give me (the author) a few points and might encourage me to write more articles.

If you didn't or otherwise feel the need to vote No, please first leave a comment to give me a chance to answer and perhaps to improve this article.

Thank you!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 
3
12,389 Views
Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Father, husband and general problem solver who loves coding SQL, C#, Salesforce Apex or whatever.

Comments (10)

aikimarkGet vaccinated; Social distance; Wear a mask
CERTIFIED EXPERT
Top Expert 2014

Commented:
@Kevin

While you're at it, add a "parameter" tag and some tag that would indicate this applies to stored procs and user-defined functions.
Jagdish DevakuSenior Data Architect

Commented:
Great article...
Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Most Valuable Expert 2011

Author

Commented:
Thanks, Jagdish!
Kevin CrossChief Technology Officer
CERTIFIED EXPERT
Most Valuable Expert 2011

Author

Commented:
Hello,

My question is I have  XMLData I am trying to insert into 2 tables.
<Details>
    <Levels>
        <Level>
            <SectionId>681</SectionId>
            <RiskId>0</RiskId>
            <WAC>Test </WAC>
            <DOCReview>Test</DOCReview>
            <MAT>1</MAT>
            <SDV>1</SDV>
            <Questions>
                <QuestionId>1186</QuestionId>
                <QuestionId>1187</QuestionId>
            </Questions>
        </Level>
        <Level>
            <SectionId>681</SectionId>
            <RiskId>1</RiskId>
            <WAC>Test </WAC>
            <DOCReview>Test</DOCReview>
            <MAT>2</MAT>
            <SDV>2</SDV>
            <Questions>
                <QuestionId>1186</QuestionId>
                <QuestionId>1187</QuestionId>
            </Questions>
        </Level>
        <Level>
            <SectionId>681</SectionId>
            <RiskId>2</RiskId>
            <WAC>Test </WAC>
            <DOCReview>Test</DOCReview>
            <MAT>1</MAT>
            <SDV>3</SDV>
            <Questions>
                <QuestionId>1209</QuestionId>
                <QuestionId>1210</QuestionId>
            </Questions>
        </Level>
        <Level>
            <SectionId>681</SectionId>
            <RiskId>3</RiskId>
            <WAC>Test </WAC>
            <DOCReview>Test</DOCReview>
            <MAT>2</MAT>
            <SDV>1</SDV>
            <Questions>
                <QuestionId>1186</QuestionId>
                <QuestionId>1187</QuestionId>
                <QuestionId>1209</QuestionId>
                <QuestionId>1210</QuestionId>
            </Questions>
        </Level>
    </Levels>
</Details>

Open in new window


I can insert data into the first table which has no problem but the second table has a problem. Each Level has child elements " Questions " I unable to insert child data into the second table can you please help me.
Thanks in Advance.

View More

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.