<

A MySQL Tidbit: In-line CSV Parsing

Published on
15,666 Points
4,966 Views
7 Endorsements
Last Modified:
Awarded
More Fun with XML and MySQL – Parsing Delimited String with a Single SQL Statement

Are you ready for another of my SQL tidbits?  Hopefully so, as in this adventure, I will be covering a topic that comes up a lot which is parsing a comma (or otherwise) delimited string into rows (or columns) of data.

The methodology to be used takes advantage of some of MySQL's Extensible Markup Language (XML) functionality and so this Article will apply to MySQL 5.1.5 or higher.  Consequently, if you are a MySQL developer using the newer version of MySQL or even just an avid reader, please read on and enjoy!

Sections

Background
Thinking XML
XML Parsing Trick in Action

1. Background


Recently, EE Member transitnetwork, asked a Question, How do I make delimited fields come out in a list with totals, which my technique fit perfectly (at least as one method of getting the job done).

The Request
Let's say I have a list of five restaurants with two fields, name and cuisine: Figure 1.1
I want to make the results look like this: Figure 1.2, 3.1.

[step=""]««setup»»
To follow along in the article, here are scripts to get started with.

(table structure -- create statement)
CREATE TABLE `restaurants` (
  `id` INT(11) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(45) DEFAULT NULL,
  `cuisine` VARCHAR(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Open in new window

(sample data -- insert statement)
INSERT INTO `restaurants`(`name`, `cuisine`)
SELECT 'restaurant 1', 'pho, vietnamese, canadian'
UNION SELECT 'restaurant 2', 'asian, pho, vietnamese, sushi'
UNION SELECT 'restaurant 3', 'asian, pho, vietnamese, sushi'
UNION SELECT 'restaurant 4', 'asian, vietnamese'
UNION SELECT 'restaurant 5', 'vietnamese, spanish, mexican, canadian'
;

Open in new window

With a quick selection of the above data, you can verify that your input looks like Figure 1.1.
SELECT `name`, `cuisine` FROM `restaurants`;

Open in new window

In addition, we will take advantage of a having a numbers utility table in our database, so run this:
NumbersUtilityTable.sql (explanation: A MySQL Tidbit: Quick Numbers Table Generation)
[/step]

2. Thinking XML


My mind has been on XML because of what I am researching at work coupled with other Article code I have been playing with of late.  Others and I have used this same logic in T-SQL here at EE before; therefore, for me it isn't too far a stretch to consider taking delimited text and converting it to well-formed XML, so good bad or indifferent that is what this technique is all about!

To accomplish this transformation, we simply replace comma (or other delimiter) with element tags we want, ensuring to wrap (concatenate) the result with a start and end tag.
CONCAT('<cuisine>', REPLACE(cuisine, ', ', '</cuisine><cuisine>'), '</cuisine>')

Open in new window

(resulting in something like this for example)
'<cuisine>pho</cuisine><cuisine>vietnamese</cuisine><cuisine>canadian</cuisine>'

Open in new window


In the code of the Question, I put a root element (e.g., <cuisines>) around the above because it didn't feel right to not make a valid XML document, but note that the XML fragment without root node would work just as fine.  I understand that as programmers, we love anything that will save keystrokes.  Therefore, on that note, keep in mind that this XML is just a means to an end, so it doesn't matter how it looks.  I spelled out elements, originally, to make it clearer what I was doing, but we can make it real short hand (i.e., more generic and reusable like r = row; c = column):
'<r><c>vietnamese</c><c>spanish</c><c>mexican</c><c>canadian</c></r>'

Open in new window


(and now the big unveil) TADA!
SELECT `name`
     , EXTRACTVALUE(xml, '//c[1]/text()') AS `cuisine1`
     , EXTRACTVALUE(xml, '//c[2]/text()') AS `cuisine2`
     , EXTRACTVALUE(xml, '//c[3]/text()') AS `cuisine3`
     , EXTRACTVALUE(xml, '//c[4]/text()') AS `cuisine4`
     , EXTRACTVALUE(xml, '//c[5]/text()') AS `cuisine5`
FROM (
   SELECT *,
      CONCAT('<r><c>', REPLACE(cuisine, ', ', '</c><c>'), '</c></r>') AS xml
   FROM restaurants
) AS tbl
;

Open in new window

(results) Figure 2.1
Putting our query into a derived table for ease of referencing the field xml, the solution above makes use of ExtractValue(), one of the two MySQL XML functions.  As a refresher, ExtractValue() takes the following form:
EXTRACTVALUE(xml_frag, xpath_expr)

Open in new window


So, the xml column we crafted for our trick is the first parameter and the second is standard XPath, which requires you specify an index (e.g., 1-5) when there are multiple nodes that can match a specific path.  Note:  The requirement for the index (or more precisely having to code a finite number of columns) is a limitation or at a consideration.  In the example, I simply chose 5 cuisines to display.  If we had restaurants with more, only the first 5 would show and, as you can see by the highlighted cell above, those with less simply show with blank spots ('').

But having to type out (really, cut and paste) the formula for each column isn't too bad, but look at how the limitation is exaggerated when you want to get rows instead.
-- create view to lessen repetition of xml concatenation
CREATE OR REPLACE VIEW restaurants_xml
AS
SELECT *,
   CONCAT('<r><c>', REPLACE(cuisine, ', ', '</c><c>'), '</c></r>') AS xml
FROM restaurants
;

-- generate rows using xml parsing
SELECT *
FROM (
   SELECT `name`
        , EXTRACTVALUE(xml, '//c[1]/text()') AS `cuisine`
   FROM restaurants_xml
   UNION
   SELECT `name`
        , EXTRACTVALUE(xml, '//c[2]/text()') AS `cuisine`
   FROM restaurants_xml
   UNION
   SELECT `name`
        , EXTRACTVALUE(xml, '//c[3]/text()') AS `cuisine`
   FROM restaurants_xml
   UNION
   SELECT `name`
        , EXTRACTVALUE(xml, '//c[4]/text()') AS `cuisine`
   FROM restaurants_xml
   UNION
   SELECT `name`
        , EXTRACTVALUE(xml, '//c[5]/text()') AS `cuisine`
   FROM restaurants_xml
) derived
WHERE `cuisine` <> ''
;

Open in new window


It works, though, so there's nothing wrong with using it if it comes in handy within a given scenario.

But there is a more efficient way, right ?

3. XML Parsing Trick in Action


Now that we have built a piece of XML string, we can apply our MySQL in-line XML parsing trick to it.  Therefore, putting the creation of XML from the CSV string together with our numbers table technique for parsing XML in a single query could result in a solution to the original problem that looks like this:
SELECT EXTRACTVALUE(xml, '//c[$@rownum]/text()') AS `cuisine`
     , COUNT(*) AS `count`
FROM util.numbers AS nums
INNER JOIN (
   SELECT
      CONCAT('<r><c>', REPLACE(cuisine, ', ', '</c><c>'), '</c></r>') AS xml
   FROM restaurants
) AS tbl ON nums.n BETWEEN 1 AND EXTRACTVALUE(xml, 'count(//c)')
   AND @rownum:=n
GROUP BY EXTRACTVALUE(xml, '//c[$@rownum]/text()')
ORDER BY `cuisine`
; 

Open in new window

(results) Figure 1.2, 3.1
[step=""]««bonus»»
With MySQL's Group_Concat, we can get list of restaurants in a delimited list by each count.
(http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat)
 
SELECT EXTRACTVALUE(xml, '//c[$@rownum]/text()') AS `cuisine`
     , COUNT(*) AS `count`
     , GROUP_CONCAT(`name` SEPARATOR ', ') AS `restaurants`
FROM util.numbers AS nums
INNER JOIN (
   SELECT *,
      CONCAT('<r><c>', REPLACE(cuisine, ', ', '</c><c>'), '</c></r>') AS xml
   FROM restaurants
) AS tbl ON nums.n BETWEEN 1 AND EXTRACTVALUE(xml, 'count(//c)')
   AND @rownum:=n
GROUP BY EXTRACTVALUE(xml, '//c[$@rownum]/text()')
ORDER BY cuisine
;

Open in new window

(results) Figure 3.2[/step]
'tis complete!


Conclusion

Simple enough to follow?  By this point, I hope that you are comfortable with using Replace() and Concat() to transform a delimited string value into XML; subsequently, I hope that you are now knowledgeable on the ExtractValue() technique for parsing this same XML into columns or rows of data.  Have you been following along?  You can test for yourself and you should see the power and results work magically in front of your eyes, proving this is a useful tip and not just Article trickery by using the same image *smile* for both code block results (lazy author *laughing*).

As usual, I would love to hear what you think and any questions regarding this technique in comments.  If you have a specific application of this that you are having difficulty with, just remember the Q&A topic area for MySQL is the best place to go.

Thank you, all, for reading…until next time!

Best regards and happy coding,

Kevin C. Cross, Sr. (mwvisa1)

References

Related Articles
http://www.experts-exchange.com/Database/MySQL/A_3573-A-MySQL-Tidbit-Quick-Numbers-Table-Generation.html
http://www.experts-exchange.com/Database/MySQL/A_3574-A-MySQL-Tidbit-In-line-XML-Parsing.html
http://www.experts-exchange.com/Database/MySQL/A_2250-Load-Delimited-Data-into-MySQL-Server.html
http://www.experts-exchange.com/Database/Miscellaneous/A_1536-delimited-list-as-parameter-what-are-the-options.html

Related MySQL Documentation
http://dev.mysql.com/doc/refman/5.1/en/user-variables.html
http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html

Related Q&A
http://www.experts-exchange.com/Database/MySQL/Q_26429210.html

 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you found this article helpful, please click the Yes button after the question just below. This will give me (the author) a few points and might encourage me to write more articles.

If you didn't or otherwise feel the need to vote No, please first leave a comment to give me a chance to answer and perhaps to improve this article.

Thank you!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
7
Comment
Author:Kevin Cross
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 46

Expert Comment

by:aikimark
@Kevin

It's good to see you're getting some mileage out of the XML parsing trick.  This is an excellent sequel.  As usual, I've learned new things like Group_Concat()

Thank you for the http://e-e.com/A_1536.html link.  During the editing process, I asked about performance and the author ran a 150,000 item test.  It performed quite well.
0
 
LVL 60

Author Comment

by:Kevin Cross
Thanks!

And, good idea! I actually considered mentioning performance when I used the word efficient as in the context I was referring mainly to the efficiency gained by programmer in amount of code to type; however, that doesn't always translate well to query optimization and efficiency.

So let's look at some measures ...


Performance Metrics

From true performance perspective, of course the XML adds some overhead as a3 postulated; however, here is a look at how the methods above stack up.

Parsing to columns took between 0.016 (duration) / 2.4333 (fetch) to 0.374 / 1.997 seconds operating on 150,005 rows.  The low end is using the `restaurants_xml` view and the upper end the derived table.

Comparatively, the execution of union approach to generate rows and an ungrouped version of the numbers utility table based query, both using the view instead of derived tables, ran for 0.016 / 10.311 to ? / ? seconds (I stopped the UNION query after 2006.672 seconds of duration, so never got to fetch).  Another test of my XML parsing trick yielded results in 21.169 / 3.822, which is slower than columns method; however, the UNION approach is a drastic slowdown, but very explainable, I guess, since we are exploding 150,000+ rows to 500,000+ in the first place then couple that with the multiple table scans.

Moral: I would not recommend doing this (more so the UNION for obvious reasons) on a large record set on a routine basis, at least not live.  When needing to do statistical analysis, I know I for one don't care as much about performance as I need to get at a particular answer; however, performance has to be reasonable as impatience with computers is typical human characteristic.


Case and point, the scenario we have hear called for a grouping of the results by cuisine and when using the source code from section #3 that gets the cuisine and aggregate count(), the performance is 29.094 / 0.000.  Not the greatest, but acceptable for me to get the answer ... Now, each environment and tolerance for lag may be different, so please keep this in mind.


As with anything, there are always consequences positive and negative to utilizing certain methodologies.  I probably don't say it enough, but please always test and use EXPLAIN to determine what will work with your record set.


Thanks again, aikimark.  This is what I love about Articles: the research and discussion.

Best regards,

Kevin
0
 
LVL 60

Author Comment

by:Kevin Cross
New Related MySQL Tidbit:
A MySQL Tidbit: Dynamic Export To_XML Stored Procedure

The "Export" section of this new article shows how to use MySQL SQL code to generate CSV files and the overall dynamic procedure can be (or at least, should be) easily modified to export to CSV versus XML.  Hope you like it ...
0

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Join & Write a Comment

In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses
Next Article:

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month