Introduction
In this installment of my SQL tidbits, I will be looking at parsing Extensible Markup Language (XML) directly passed as string parameters to MySQL 5.1.5 or higher. These would be instances where
LOAD_FILE (version 4.1 or higher) and
LOAD XML (version 5.5 or higher) are not optimal either due to permission/file or application restrictions.
Background
This particular instance was true for EE member
JagdishDevaku, who asked the following question:
XML Parsing in MySQL.
We are planning for a database migration from MS SQL Server to MySQL. Most of our procedures in SQL Server are working with XML data, but we are issues where trying to convert these procedures to MySQL.
Below is the sample XML file. We are sending XML data (not the XML file) directly as a input to the procedure.
<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
<Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
<Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
<Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
<Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
<Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
<Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
<Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
<Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
<Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
<Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
<Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
<Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
<Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
<Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
<Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
<Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
<Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
</Attendance_Info>
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
Select allOpen in new window
With the XML we want the data in the following tabular format in MySQL
The ApproachWhen it was determined that the other approaches I mentioned earlier from the MySQL tech resources article on XML,
Using XML in MySQL 5.1 and 6.0, another EE Expert,
aikimark, adapted the xmldump_load procedure from the MySQL XML article's
Importing XML and XML Data section to the following (cleaned a bit for syntax correctness):
SET nrows = ExtractValue(xml_parm, 'count(//Attendance_Info/Attendance)');
SET rownum = 1;
WHILE rownum <= nrows DO
SET At_Attendance = ExtractValue(xml_parm, '//Attendance_Info/Attendance[$rownum]/@Attendance');
SET At_PersonalId = ExtractValue(xml_parm, '//Attendance_Info/Attendance[$rownum]/@PersonalId');
SET rownum = rownum + 1;
END WHILE;
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
Select allOpen in new window
So in short:
- Determine the total count of "Attendance" child nodes that appear in the XML passed using ExtractValue(), one of two XML functions introduced with MySQL 5.1.5, which provides a mechanism to "[return] the text (CDATA) of the first text node which is a child of the element(s) matched by the XPath [1.0] expression".
- ""]Note: at the time of writing this article, all versions of MySQL above 5.1.5 still only support XPath 1.0 in a limited fashion, so you may find some pre XPath 2.0 features that are also non-functional. For a full listing of the limits|bugs, please refer to manual. Consequently, some nice location paths do exist, so some shorthand to above is possible like:
- Using this count as the upper bound, we iterate using a while loop until we have counted from 1 to total number of rows.
- Again using ExtractValue() we take advantage of our loop index rownum as the index to our XPath results using '$'+{variable-name} or $rownum in this example.
Bueller, ... Bueller, ... BuellerSo with an attendance table structure like:
drop table if exists attendance;
create table `attendance` (
`att_date` datetime,
`att_attend_id` int(11),
`att_person_id` int(11),
`att_serial_no` int(11),
`att_id` bigint(20) not null auto_increment,
primary key (`att_id`)
) engine=InnoDB default charset=latin1;
1:
2:
3:
4:
5:
6:
7:
8:
9:
Select allOpen in new window
...the iterative procedure may look something like this:
drop procedure if exists InsertAttendanceXmlProc;
delimiter //
create procedure InsertAttendanceXmlProc(in xml varchar(8000))
begin
declare dt datetime;
declare nrows int;
declare rownum int default 1;
-- to be safe, explicitly convert date from xml to datetime - expects mm/dd/yyyy hh:mi:ss
-- http://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_str-to-date
set dt = str_to_date(coalesce(nullif(extractvalue(xml, '//@Date'), ''), now()), '%m/%d/%Y %H:%i:%s');
set nrows = extractvalue(xml, 'count(//Attendance)');
-- decrement nrows to save on creating another variable
while rownum <= nrows do
insert into attendance(att_date, att_attend_id, att_person_id, att_serial_no)
select dt
, coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@Attendance'), ''),0)
, coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@PersonalId'), ''),0)
, coalesce(nullif(extractvalue(xml, '//child::*[$rownum]/@SerialNumber'), ''),0);
set rownum = rownum + 1;
end while;
end//
delimiter ;
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
Select allOpen in new window
As shown in
the question thread, it is also a possibility, if the sequential order of XML nodes doesn't matter, to use the same variable used to store count of rows as index for our while loop but using a decrement approach instead.
- ""]««test»»
See how our procedure works.
set @xml = '<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
<Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
<Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
<Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
<Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
<Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
<Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
<Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
<Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
<Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
<Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
<Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
<Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
<Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
<Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
<Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
<Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
<Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
</Attendance_Info>';
call InsertAttendanceXmlProc( @xml );
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
Select allOpen in new window
But that is just background, right ?It's All About Rownum
I brought you here to see some in-line SQL code and we haven't seen any yet now have we...well, for your patience, here you go (explanation to follow):
set @xml = '<Attendance_Info Date="07/28/2010 00:00:00" Flag="1">
<Attendance Attendance="2" PersonalId="33304" SerialNumber="0"/>
<Attendance Attendance="2" PersonalId="33305" SerialNumber="1"/>
<Attendance Attendance="0" PersonalId="33306" SerialNumber="2"/>
<Attendance Attendance="2" PersonalId="33307" SerialNumber="3"/>
<Attendance Attendance="2" PersonalId="33308" SerialNumber="4"/>
<Attendance Attendance="2" PersonalId="33309" SerialNumber="5"/>
<Attendance Attendance="2" PersonalId="33310" SerialNumber="6"/>
<Attendance Attendance="2" PersonalId="33311" SerialNumber="7"/>
<Attendance Attendance="2" PersonalId="33312" SerialNumber="8"/>
<Attendance Attendance="2" PersonalId="33313" SerialNumber="9"/>
<Attendance Attendance="2" PersonalId="33314" SerialNumber="10"/>
<Attendance Attendance="2" PersonalId="33377" SerialNumber="11"/>
<Attendance Attendance="2" PersonalId="33651" SerialNumber="12"/>
<Attendance Attendance="2" PersonalId="33652" SerialNumber="13"/>
<Attendance Attendance="2" PersonalId="33734" SerialNumber="14"/>
<Attendance Attendance="2" PersonalId="34086" SerialNumber="15"/>
<Attendance Attendance="2" PersonalId="34170" SerialNumber="16"/>
</Attendance_Info>';
select extractvalue(@xml, '//@Date') as att_date
, extractvalue(@xml, '//child::*[$@rownum]/@Attendance') as att_attend_id
, extractvalue(@xml, '//child::*[$@rownum]/@PersonalId') as att_person_id
, extractvalue(@xml, '//child::*[$@rownum]/@SerialNumber') as att_serial_no
from util.numbers
where n between 1 and extractvalue(@xml, 'count(//Attendance)')
and @rownum:=n
;
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
Select allOpen in new window
With luck, you will see that results of this query are similar to what our procedure added to the attendance table during our stored procedure.
Since this solution is specific to a particular XML string, it is useful to be able to do this very painlessly and in-line at times as you now have control at time of writing as to how fields will be converted/handled versus getting how it was programmed in the procedure each time. Think that is cool? Well, here is how it works:
- Rule 1: Use your tool belt, you built it for a reason. We looked previously at creating a sequential numbers table with the premise it had value. Well here we use util.numbers to create rows for each Attendance node we find in our XML based on the same XPath aggregate count() function we used in our iterative approach.
- Rule 2: Continue to use what you know. Since we can't just stop at one nice utility in a solution, the next bit you will notice is "@rownum:=n" which is our ranking technique modified a bit. In this case, the numbers sequence is already getting us our row numbers; however, the XPath expression for ExtractValue() needs to be constant so concatenating "n" directly will not work, so we use @rownum (can be named @xyz by the way -- no magic here, just a plain user-defined variable) that we can now use in our XPath with $ as we did in our procedure.
BuellerSo now that we only have to take attendance at one time, we can update our procedure to something like this:
drop procedure if exists InsertAttendanceXmlProc2;
delimiter //
create procedure InsertAttendanceXmlProc2(in xml varchar(8000))
begin
insert into attendance(att_date, att_attend_id, att_person_id, att_serial_no)
select str_to_date(coalesce(nullif(extractvalue(@xml, '//@Date'), ''), now()), '%m/%d/%Y %H:%i:%s')
, coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@Attendance'), ''),0)
, coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@PersonalId'), ''),0)
, coalesce(nullif(extractvalue(@xml, '//child::*[$@rownum]/@SerialNumber'), ''),0)
from util.numbers
where n between 1 and extractvalue(@xml, 'count(//Attendance)')
and @rownum:=n
;
end//
delimiter ;
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
Select allOpen in new window
For such a small record set, there was not much performance issues to speak of; however, a nice side benefit of this neat trick is that it eliminates the extra overhead of all the added variables and iteration previously needed, hopefully bringing it in line with speed of file import approaches. As I get more factual statistics on this, I will be sure to post, but would love your feedback also.
That's it !Conclusion
Yet another SQL tool to add to the arsenal. We saw here how useful and powerful our numbers table can be and another useful application of user-defined variables. In addition, we learned about ExtractValue(), one of the two MySQL XML functions. With this knowledge, you can probably make nice use of UpdateXML() which is the counterpart to ExtractValue() that allows you to replace a particular fragment of XML matched by XPath to a new XML fragment.
Thank you, all, for reading. It is my sincere hope that this trick serves you well!
Best regards and happy coding,
Kevin C. Cross, Sr. (
mwvisa1)
References
Related Articleshttp://dev.mysql.com/tech-resources/articles/xml-in-mysql5.1-6.0.htmlhttp://www.experts-exchange.com/Database/MySQL/A_3573-A-MySQL-Tidbit-Quick-Numbers-Table-Generation.htmlhttp://www.experts-exchange.com/Database/Miscellaneous/A_1555-Analytical-SQL-Where-do-you-rank.htmlRelated MySQL Documentationhttp://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_load-filehttp://dev.mysql.com/doc/refman/5.5/en/load-xml.htmlhttp://dev.mysql.com/doc/refman/5.1/en/user-variables.htmlhttp://dev.mysql.com/doc/refman/5.1/en/xml-functions.htmlRelated Q&Ahttp://www.experts-exchange.com/Database/MySQL/Q_26384151.html =-=-=-=-=-=-=-=-=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=
If you found this article helpful, please click the
Yes button after the question just below. This will give me (the author) a few points and might encourage me to write more articles.
If you didn't or otherwise feel the need to vote No, please first leave a comment to give me a chance to answer and perhaps to improve this article.
Thank you!=-=-=-=-=-=-=-=-=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=-=-=-
=-=-=
by: aikimark on 2010-08-16 at 06:15:25ID: 18223
If the SQL optimizer is smart enough, then it will replace the str_to_date() column with a fixed value. Otherwise, the dt variable would reduce function execution and perform faster -- probably undetectable for relatively small XML parameters.
=============
Question: In the Where clause (below), why doesn't the "and @rownum:=n" expression limit the rows, rather than increment the @rownum variable? It's a really neat trick, but I don't understand how it works.
Select allOpen in new window