<

A MySQL Tidbit: In-line CSV Parsing

Published on
16,604 Points
5,904 Views
7 Endorsements
Last Modified:
Awarded
Community Pick
Kevin Cross
Father, husband and general problem solver who loves coding SQL, C#, Salesforce Apex or whatever.
More Fun with XML and MySQL – Parsing Delimited String with a Single SQL Statement

Are you ready for another of my SQL tidbits?  Hopefully so, as in this adventure, I will be covering a topic that comes up a lot which is parsing a comma (or otherwise) delimited string into rows (or columns) of data.

The methodology to be used takes advantage of some of MySQL's Extensible Markup Language (XML) functionality and so this Article will apply to MySQL 5.1.5 or higher.  Consequently, if you are a MySQL developer using the newer version of MySQL or even just an avid reader, please read on and enjoy!

Sections

Background
Thinking XML
XML Parsing Trick in Action

1. Background


Recently, EE Member transitnetwork, asked a Question, How do I make delimited fields come out in a list with totals, which my technique fit perfectly (at least as one method of getting the job done).

The Request
Let's say I have a list of five restaurants with two fields, name and cuisine: Figure 1.1
I want to make the results look like this: Figure 1.2, 3.1.

[step=""]««setup»»
To follow along in the article, here are scripts to get started with.

(table structure -- create statement)
CREATE TABLE `restaurants` (
  `id` INT(11) NOT NULL AUTO_INCREMENT,
  `name` VARCHAR(45) DEFAULT NULL,
  `cuisine` VARCHAR(100) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Open in new window

(sample data -- insert statement)
INSERT INTO `restaurants`(`name`, `cuisine`)
SELECT 'restaurant 1', 'pho, vietnamese, canadian'
UNION SELECT 'restaurant 2', 'asian, pho, vietnamese, sushi'
UNION SELECT 'restaurant 3', 'asian, pho, vietnamese, sushi'
UNION SELECT 'restaurant 4', 'asian, vietnamese'
UNION SELECT 'restaurant 5', 'vietnamese, spanish, mexican, canadian'
;

Open in new window

With a quick selection of the above data, you can verify that your input looks like Figure 1.1.
SELECT `name`, `cuisine` FROM `restaurants`;

Open in new window

In addition, we will take advantage of a having a numbers utility table in our database, so run this:
NumbersUtilityTable.sql (explanation: A MySQL Tidbit: Quick Numbers Table Generation)
[/step]

2. Thinking XML


My mind has been on XML because of what I am researching at work coupled with other Article code I have been playing with of late.  Others and I have used this same logic in T-SQL here at EE before; therefore, for me it isn't too far a stretch to consider taking delimited text and converting it to well-formed XML, so good bad or indifferent that is what this technique is all about!

To accomplish this transformation, we simply replace comma (or other delimiter) with element tags we want, ensuring to wrap (concatenate) the result with a start and end tag.
CONCAT('<cuisine>', REPLACE(cuisine, ', ', '</cuisine><cuisine>'), '</cuisine>')

Open in new window

(resulting in something like this for example)
'<cuisine>pho</cuisine><cuisine>vietnamese</cuisine><cuisine>canadian</cuisine>'

Open in new window


In the code of the Question, I put a root element (e.g., <cuisines>) around the above because it didn't feel right to not make a valid XML document, but note that the XML fragment without root node would work just as fine.  I understand that as programmers, we love anything that will save keystrokes.  Therefore, on that note, keep in mind that this XML is just a means to an end, so it doesn't matter how it looks.  I spelled out elements, originally, to make it clearer what I was doing, but we can make it real short hand (i.e., more generic and reusable like r = row; c = column):
'<r><c>vietnamese</c><c>spanish</c><c>mexican</c><c>canadian</c></r>'

Open in new window


(and now the big unveil) TADA!
SELECT `name`
     , EXTRACTVALUE(xml, '//c[1]/text()') AS `cuisine1`
     , EXTRACTVALUE(xml, '//c[2]/text()') AS `cuisine2`
     , EXTRACTVALUE(xml, '//c[3]/text()') AS `cuisine3`
     , EXTRACTVALUE(xml, '//c[4]/text()') AS `cuisine4`
     , EXTRACTVALUE(xml, '//c[5]/text()') AS `cuisine5`
FROM (
   SELECT *,
      CONCAT('<r><c>', REPLACE(cuisine, ', ', '</c><c>'), '</c></r>') AS xml
   FROM restaurants
) AS tbl
;

Open in new window

(results) Figure 2.1
Putting our query into a derived table for ease of referencing the field xml, the solution above makes use of ExtractValue(), one of the two MySQL XML functions.  As a refresher, ExtractValue() takes the following form:
EXTRACTVALUE(xml_frag, xpath_expr)

Open in new window


So, the xml column we crafted for our trick is the first parameter and the second is standard XPath, which requires you specify an index (e.g., 1-5) when there are multiple nodes that can match a specific path.  Note:  The requirement for the index (or more precisely having to code a finite number of columns) is a limitation or at a consideration.  In the example, I simply chose 5 cuisines to display.  If we had restaurants with more, only the first 5 would show and, as you can see by the highlighted cell above, those with less simply show with blank spots ('').

But having to type out (really, cut and paste) the formula for each column isn't too bad, but look at how the limitation is exaggerated when you want to get rows instead.
-- create view to lessen repetition of xml concatenation
CREATE OR REPLACE VIEW restaurants_xml
AS
SELECT *,
   CONCAT('<r><c>', REPLACE(cuisine, ', ', '</c><c>'), '</c></r>') AS xml
FROM restaurants
;

-- generate rows using xml parsing
SELECT *
FROM (
   SELECT `name`
        , EXTRACTVALUE(xml, '//c[1]/text()') AS `cuisine`
   FROM restaurants_xml
   UNION
   SELECT `name`
        , EXTRACTVALUE(xml, '//c[2]/text()') AS `cuisine`
   FROM restaurants_xml
   UNION
   SELECT `name`
        , EXTRACTVALUE(xml, '//c[3]/text()') AS `cuisine`
   FROM restaurants_xml
   UNION
   SELECT `name`
        , EXTRACTVALUE(xml, '//c[4]/text()') AS `cuisine`
   FROM restaurants_xml
   UNION
   SELECT `name`
        , EXTRACTVALUE(xml, '//c[5]/text()') AS `cuisine`
   FROM restaurants_xml
) derived
WHERE `cuisine` <> ''
;

Open in new window


It works, though, so there's nothing wrong with using it if it comes in handy within a given scenario.

But there is a more efficient way, right ?

3. XML Parsing Trick in Action


Now that we have built a piece of XML string, we can apply our MySQL in-line XML parsing trick to it.  Therefore, putting the creation of XML from the CSV string together with our numbers table technique for parsing XML in a single query could result in a solution to the original problem that looks like this:
SELECT EXTRACTVALUE(xml, '//c[$@rownum]/text()') AS `cuisine`
     , COUNT(*) AS `count`
FROM util.numbers AS nums
INNER JOIN (
   SELECT
      CONCAT('<r><c>', REPLACE(cuisine, ', ', '</c><c>'), '</c></r>') AS xml
   FROM restaurants
) AS tbl ON nums.n BETWEEN 1 AND EXTRACTVALUE(xml, 'count(//c)')
   AND @rownum:=n
GROUP BY EXTRACTVALUE(xml, '//c[$@rownum]/text()')
ORDER BY `cuisine`
; 

Open in new window

(results) Figure 1.2, 3.1
[step=""]««bonus»»
With MySQL's Group_Concat, we can get list of restaurants in a delimited list by each count.
(http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat)
 
SELECT EXTRACTVALUE(xml, '//c[$@rownum]/text()') AS `cuisine`
     , COUNT(*) AS `count`
     , GROUP_CONCAT(`name` SEPARATOR ', ') AS `restaurants`
FROM util.numbers AS nums
INNER JOIN (
   SELECT *,
      CONCAT('<r><c>', REPLACE(cuisine, ', ', '</c><c>'), '</c></r>') AS xml
   FROM restaurants
) AS tbl ON nums.n BETWEEN 1 AND EXTRACTVALUE(xml, 'count(//c)')
   AND @rownum:=n
GROUP BY EXTRACTVALUE(xml, '//c[$@rownum]/text()')
ORDER BY cuisine
;

Open in new window

(results) Figure 3.2[/step]
'tis complete!


Conclusion

Simple enough to follow?  By this point, I hope that you are comfortable with using Replace() and Concat() to transform a delimited string value into XML; subsequently, I hope that you are now knowledgeable on the ExtractValue() technique for parsing this same XML into columns or rows of data.  Have you been following along?  You can test for yourself and you should see the power and results work magically in front of your eyes, proving this is a useful tip and not just Article trickery by using the same image *smile* for both code block results (lazy author *laughing*).

As usual, I would love to hear what you think and any questions regarding this technique in comments.  If you have a specific application of this that you are having difficulty with, just remember the Q&A topic area for MySQL is the best place to go.

Thank you, all, for reading…until next time!

Best regards and happy coding,

Kevin C. Cross, Sr. (mwvisa1)

References

Related Articles
http://www.experts-exchange.com/Database/MySQL/A_3573-A-MySQL-Tidbit-Quick-Numbers-Table-Generation.html
http://www.experts-exchange.com/Database/MySQL/A_3574-A-MySQL-Tidbit-In-line-XML-Parsing.html
http://www.experts-exchange.com/Database/MySQL/A_2250-Load-Delimited-Data-into-MySQL-Server.html
http://www.experts-exchange.com/Database/Miscellaneous/A_1536-delimited-list-as-parameter-what-are-the-options.html

Related MySQL Documentation
http://dev.mysql.com/doc/refman/5.1/en/user-variables.html
http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html
http://dev.mysql.com/doc/refman/5.1/en/string-functions.html
http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html

Related Q&A
http://www.experts-exchange.com/Database/MySQL/Q_26429210.html

 =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you found this article helpful, please click the Yes button after the question just below. This will give me (the author) a few points and might encourage me to write more articles.

If you didn't or otherwise feel the need to vote No, please first leave a comment to give me a chance to answer and perhaps to improve this article.

Thank you!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
7
Author:Kevin Cross
Ask questions about what you read
If you have a question about something within an article, you can receive help directly from the article author. Experts Exchange article authors are available to answer questions and further the discussion.
Get 7 days free