A MySQL Tidbit: In-line XML Parsing

AID: 3574
  • Status: Published

4630 points

  • By
  • TypeTips/Tricks
  • Posted on2010-08-16 at 00:08:40
Awards
  • Community Pick

Introduction


In this installment of my SQL tidbits, I will be looking at parsing Extensible Markup Language (XML) directly passed as string parameters to MySQL 5.1.5 or higher. These would be instances where LOAD_FILE (version 4.1 or higher) and LOAD XML (version 5.5 or higher) are not optimal either due to permission/file or application restrictions.

Background


This particular instance was true for EE member JagdishDevaku, who asked the following question: XML Parsing in MySQL.

We are planning for a database migration from MS SQL Server to MySQL. Most of our procedures in SQL Server are working with XML data, but we are issues where trying to convert these procedures to MySQL.

Below is the sample XML file. We are sending XML data (not the XML file) directly as a input to the procedure.

""]

<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
   <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
   <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
   <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
   <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
   <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
   <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
   <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
   <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
   <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
   <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
   <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
   <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
   <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
   <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
   <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
   <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
   <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
</Attendance_Info>
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:

Select allOpen in new window


With the XML we want the data in the following tabular format in MySQL

""]

Date | Attendance | Personal_Id
                                    
1:

Select allOpen in new window



The Approach
When it was determined that the other approaches I mentioned earlier from the MySQL tech resources article on XML, Using XML in MySQL 5.1 and 6.0, another EE Expert, aikimark, adapted the xmldump_load procedure from the MySQL XML article's Importing XML and XML Data section to the following (cleaned a bit for syntax correctness):
SET nrows = ExtractValue(xml_parm, 'count(//Attendance_Info/Attendance)');

SET rownum = 1;

WHILE rownum <= nrows DO
   SET At_Attendance = ExtractValue(xml_parm, '//Attendance_Info/Attendance[$rownum]/@Attendance');
   SET At_PersonalId = ExtractValue(xml_parm, '//Attendance_Info/Attendance[$rownum]/@PersonalId');
  
   SET rownum = rownum + 1;
END WHILE;
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

Select allOpen in new window



So in short:
  • Determine the total count of "Attendance" child nodes that appear in the XML passed using ExtractValue(), one of two XML functions introduced with MySQL 5.1.5, which provides a mechanism to "[return] the text (CDATA) of the first text node which is a child of the element(s) matched by the XPath [1.0] expression".
""]Note: at the time of writing this article, all versions of MySQL above 5.1.5 still only support XPath 1.0 in a limited fashion, so you may find some pre XPath 2.0 features that are also non-functional. For a full listing of the limits|bugs, please refer to manual. Consequently, some nice location paths do exist, so some shorthand to above is possible like:

extractvalue(@xml, '//child::*[1]/@PersonalId')
                                    
1:

Select allOpen in new window

extractvalue(@xml, '/*/@Date')
                                    
1:

Select allOpen in new window


  • Using this count as the upper bound, we iterate using a while loop until we have counted from 1 to total number of rows.
  • Again using ExtractValue() we take advantage of our loop index rownum as the index to our XPath results using '$'+{variable-name} or $rownum in this example.


Bueller, ... Bueller, ... Bueller
So with an attendance table structure like:
drop table if exists attendance;
create table `attendance` (
  `att_date` datetime,
  `att_attend_id` int(11),
  `att_person_id` int(11),
  `att_serial_no` int(11),
  `att_id` bigint(20) not null auto_increment,
  primary key (`att_id`)
) engine=InnoDB default charset=latin1;
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:

Select allOpen in new window


...the iterative procedure may look something like this:
drop procedure if exists InsertAttendanceXmlProc;

delimiter //

create procedure InsertAttendanceXmlProc(in xml varchar(8000))
begin
   declare dt datetime;
   declare nrows int;
   declare rownum int default 1;
   
   -- to be safe, explicitly convert date from xml to datetime - expects mm/dd/yyyy hh:mi:ss 
   -- http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_str-to-date
   set dt = str_to_date(coalesce(nullif(extractvalue(xml, '//@Date'), ''), now()), '%m/%d/%Y %H:%i:%s');

   set nrows = extractvalue(xml, 'count(//Attendance)');
   
   -- decrement nrows to save on creating another variable
   while rownum <= nrows do
      insert into attendance(att_date, att_attend_id, att_person_id, att_serial_no)
      select dt
           , coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@Attendance'), ''),0)
           , coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@PersonalId'), ''),0)
           , coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@SerialNumber'), ''),0);
   
      set rownum = rownum + 1;
   end while;

end//

delimiter ;
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:

Select allOpen in new window


As shown in the question thread, it is also a possibility, if the sequential order of XML nodes doesn't matter, to use the same variable used to store count of rows as index for our while loop but using a decrement approach instead.

""]««test»»
See how our procedure works.

set @xml = '<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
   <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
   <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
   <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
   <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
   <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
   <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
   <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
   <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
   <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
   <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
   <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
   <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
   <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
   <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
   <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
   <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
   <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
</Attendance_Info>';
call InsertAttendanceXmlProc( @xml );
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:

Select allOpen in new window



But that is just background, right ?

It's All About Rownum


I brought you here to see some in-line SQL code and we haven't seen any yet now have we...well, for your patience, here you go (explanation to follow):
set @xml = '<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
   <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
   <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
   <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
   <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
   <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
   <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
   <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
   <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
   <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
   <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
   <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
   <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
   <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
   <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
   <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
   <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
   <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
</Attendance_Info>';
select extractvalue(@xml, '//@Date') as att_date
     , extractvalue(@xml, '//child::*[$@rownum]/@Attendance') as att_attend_id
     , extractvalue(@xml, '//child::*[$@rownum]/@PersonalId') as att_person_id
     , extractvalue(@xml, '//child::*[$@rownum]/@SerialNumber') as att_serial_no
from util.numbers
where n between 1 and extractvalue(@xml, 'count(//Attendance)')
and @rownum:=n
;
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:

Select allOpen in new window


With luck, you will see that results of this query are similar to what our procedure added to the attendance table during our stored procedure.
XmlParse-AttendanceImportResults.PNG
  • 32 KB
  • XML Parse to Table Results
XML Parse to Table Results


Since this solution is specific to a particular XML string, it is useful to be able to do this very painlessly and in-line at times as you now have control at time of writing as to how fields will be converted/handled versus getting how it was programmed in the procedure each time. Think that is cool? Well, here is how it works:
  • Rule 1: Use your tool belt, you built it for a reason. We looked previously at creating a sequential numbers table with the premise it had value. Well here we use util.numbers to create rows for each Attendance node we find in our XML based on the same XPath aggregate count() function we used in our iterative approach.
  • Rule 2: Continue to use what you know. Since we can't just stop at one nice utility in a solution, the next bit you will notice is "@rownum:=n" which is our ranking technique modified a bit. In this case, the numbers sequence is already getting us our row numbers; however, the XPath expression for ExtractValue() needs to be constant so concatenating "n" directly will not work, so we use @rownum (can be named @xyz by the way -- no magic here, just a plain user-defined variable) that we can now use in our XPath with $ as we did in our procedure.


Bueller
So now that we only have to take attendance at one time, we can update our procedure to something like this:
drop procedure if exists InsertAttendanceXmlProc2;

delimiter //

create procedure InsertAttendanceXmlProc2(in xml varchar(8000))
begin
   insert into attendance(att_date, att_attend_id, att_person_id, att_serial_no)
   select str_to_date(coalesce(nullif(extractvalue(@xml, '//@Date'), ''), now()), '%m/%d/%Y %H:%i:%s')
        , coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@Attendance'), ''),0)
        , coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@PersonalId'), ''),0)
        , coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@SerialNumber'), ''),0)
   from util.numbers
   where n between 1 and extractvalue(@xml, 'count(//Attendance)')
   and @rownum:=n
   ;
end//

delimiter ;
                                    
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:

Select allOpen in new window



For such a small record set, there was not much performance issues to speak of; however, a nice side benefit of this neat trick is that it eliminates the extra overhead of all the added variables and iteration previously needed, hopefully bringing it in line with speed of file import approaches. As I get more factual statistics on this, I will be sure to post, but would love your feedback also.

That's it !

Conclusion


Yet another SQL tool to add to the arsenal. We saw here how useful and powerful our numbers table can be and another useful application of user-defined variables. In addition, we learned about  ExtractValue(), one of the two MySQL XML functions. With this knowledge, you can probably make nice use of UpdateXML() which is the counterpart to ExtractValue() that allows you to replace a particular fragment of XML matched by XPath to a new XML fragment.

Thank you, all, for reading. It is my sincere hope that this trick serves you well!

Best regards and happy coding,

Kevin C. Cross, Sr. (mwvisa1)

References


Related Articles
http://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.html
http://www.experts-exchange.com/Database/MySQL/A_3573-A-MySQL-Tidbit-Quick-Numbers-Table-Generation.html
http://www.experts-exchange.com/Database/Miscellaneous/A_1555-Analytical-SQL-Where-do-you-rank.html

Related MySQL Documentation
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_load-file
http://dev.mysql.com/doc/refman/5.5/en/load-xml.html
http://dev.mysql.com/doc/refman/5.1/en/user-variables.html
http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html

Related Q&A
http://www.experts-exchange.com/Database/MySQL/Q_26384151.html

 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you found this article helpful, please click the Yes button after the question just below. This will give me (the author) a few points and might encourage me to write more articles.

If you didn't or otherwise feel the need to vote No, please first leave a comment to give me a chance to answer and perhaps to improve this article.

Thank you!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 
    Asked On
    2010-08-16 at 00:08:40ID3574
    Tags

    MySQL

    ,

    SQL

    ,

    in-line

    ,

    xml

    ,

    elements

    ,

    attributes

    ,

    parsing

    ,

    import

    ,

    string

    ,

    parameter

    ,

    extractvalue()

    ,

    tabular data

    Topic

    MySQL Server

    Views
    2388

    Comments

    Expert Comment

    by: aikimark on 2010-08-16 at 06:15:25ID: 18223

    Excellent article, Kevin.  (and quickly written, too)

    If the SQL optimizer is smart enough, then it will replace the str_to_date() column with a fixed value.  Otherwise, the dt variable would reduce function execution and perform faster -- probably undetectable for relatively small XML parameters.

    =============
    Question: In the Where clause (below), why doesn't the "and @rownum:=n" expression limit the rows, rather than increment the @rownum variable?  It's a really neat trick, but I don't understand how it works.
    where n between 1 and extractvalue(@xml, 'count(//Attendance)')
       and @rownum:=n
    
                                            
    1:
    2:
    

    Select allOpen in new window

    Author Comment

    by: mwvisa1 on 2010-08-16 at 07:47:05ID: 18228

    Good question, Mark!

    The := is an assignment operator in this instance, so @rownum will always be equal to n; therefore, it is a filter that doesn't impact your rows but will successfully filter out @rownum = null so that you don't get an invalid row where the index passes to XPath is null|0.

    For other readers' reference:
    You can also assign a value to a user variable in statements other than SET. In this case, the assignment operator must be := and not = because = is treated as a comparison operator in non-SET statements:


    Given the goal was to select specific number of columns for an INSERT statement, I used this form of the trick; however, this form may be easier to understand from what the @rownum portion of the trick is doing.
    set @xml = '<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
       <Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
       <Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
       <Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
       <Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
       <Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
       <Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
       <Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
       <Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
       <Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
       <Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
       <Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
       <Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
       <Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
       <Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
       <Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
       <Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
       <Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
    </Attendance_Info>';
    select @rownum:=n as row_num
         , extractvalue(@xml, '//@Date') as att_date
         , extractvalue(@xml, '//child::*[$@rownum]/@Attendance') as att_attend_id
         , extractvalue(@xml, '//child::*[$@rownum]/@PersonalId') as att_person_id
         , extractvalue(@xml, '//child::*[$@rownum]/@SerialNumber') as att_serial_no
    from util.numbers
    where n between 1 and extractvalue(@xml, 'count(//Attendance)')
    ;
                                            
    1:
    2:
    3:
    4:
    5:
    6:
    7:
    8:
    9:
    10:
    11:
    12:
    13:
    14:
    15:
    16:
    17:
    18:
    19:
    20:
    21:
    22:
    23:
    24:
    25:
    26:
    27:
    

    Select allOpen in new window



    Guess it can be argued that instead of providing 2 tricks, I probably provided 3 with the 3rd being that you can assign the user variables in the where clause if you don't want the result of the assignment to be displayed as a separate column.

    Hope that makes more sense now.

    Thanks for reading and the kind words about the article.

    Regards,
    Kevin

    Author Comment

    by: mwvisa1 on 2010-08-16 at 07:56:24ID: 18229

    P.S. to the younger readers who might not have caught the reference from my weak attempt at "attendance" humor, see http://en.wikipedia.org/wiki/Ferris_Bueller's_Day_Off

    Expert Comment

    by: aikimark on 2010-08-16 at 08:26:31ID: 18230

    @Kevin

    Thanks.  I missed the colon and only read the equals.

    The more tricks, the better.

    ========
    I recommend adding "attribute" or "attributes" tag, since that is where we are getting our data values.  This article may be the only one that leads the reader through the process of attribute data extraction.

    Author Comment

    by: mwvisa1 on 2010-08-16 at 08:38:38ID: 18231

    You are most welcome!
    Thanks, the keywords idea is a good one. Will update article tags.

    Expert Comment

    by: aikimark on 2010-08-16 at 09:06:35ID: 18234

    @Kevin

    While you're at it, add a "parameter" tag and some tag that would indicate this applies to stored procs and user-defined functions.

    Expert Comment

    by: JagdishDevaku on 2010-08-16 at 22:49:58ID: 18265

    Great article...

    Author Comment

    by: mwvisa1 on 2010-08-17 at 04:49:49ID: 18284

    Thanks, Jagdish!

    Author Comment

    by: mwvisa1 on 2010-08-25 at 21:37:53ID: 18745

    Add your Comment

    Please Sign up or Log in to comment on this article.

    Join Experts Exchange Today

    Gain Access to all our Tech Resources

    Get personalized answers

    Ask unlimited questions

    Access Proven Solutions

    Search 3.2 million solutions

    Read In-Depth How-To Guides

    1000+ articles, demos, & tips

    Watch Step by Step Tutorials

    Learn direct from top tech pros

    And Much More!

    Your complete tech resource

    See Plans and Pricing

    30-day free trial. Register in 60 seconds.

    Loading Advertisement...

    Top MySQL Server Experts

    1. johanntagle

      286,814

      Guru

      6,000 points yesterday

      Profile
      Rank: Sage
    2. Ray_Paseur

      216,557

      Guru

      0 points yesterday

      Profile
      Rank: Savant
    3. DaveBaldwin

      119,595

      Master

      1,400 points yesterday

      Profile
      Rank: Genius
    4. angelIII

      61,340

      Master

      0 points yesterday

      Profile
      Rank: Elite
    5. mwvisa1

      57,185

      Master

      30 points yesterday

      Profile
      Rank: Genius
    6. HainKurt

      41,850

      0 points yesterday

      Profile
      Rank: Genius
    7. ralmada

      39,250

      0 points yesterday

      Profile
      Rank: Genius
    8. Roads_Roads

      33,080

      0 points yesterday

      Profile
      Rank: Genius
    9. arnold

      29,812

      0 points yesterday

      Profile
      Rank: Genius
    10. theGhost_k8

      29,785

      0 points yesterday

      Profile
      Rank: Sage
    11. Kdo

      29,682

      0 points yesterday

      Profile
      Rank: Genius
    12. bportlock

      26,604

      0 points yesterday

      Profile
      Rank: Genius
    13. jason1178

      23,574

      0 points yesterday

      Profile
      Rank: Genius
    14. maeltar

      23,236

      0 points yesterday

      Profile
      Rank: Guru
    15. StingRaY

      21,500

      0 points yesterday

      Profile
      Rank: Wizard
    16. smadeira

      19,968

      0 points yesterday

      Profile
      Rank: Wizard
    17. fundacionrts

      18,200

      0 points yesterday

      Profile
      Rank: Master
    18. gr8gonzo

      17,019

      0 points yesterday

      Profile
      Rank: Sage
    19. ChrisStanyon

      16,964

      0 points yesterday

      Profile
      Rank: Sage
    20. pratima_mcs

      16,614

      0 points yesterday

      Profile
      Rank: Genius
    21. TempDBA

      16,400

      0 points yesterday

      Profile
      Rank: Sage
    22. Sharath_123

      16,268

      0 points yesterday

      Profile
      Rank: Genius
    23. for_yan

      16,000

      0 points yesterday

      Profile
      Rank: Genius
    24. matthewspatrick

      15,800

      0 points yesterday

      Profile
      Rank: Savant
    25. AielloJ

      13,732

      0 points yesterday

      Profile
      Rank: Wizard

    Hall Of Fame