asked on

Parsing data from 1 field to multiple fields in Oracle Database

I have a column containing data as below

data
aaaaa.2
bbbbb.4/5
aaaaa.4/6, ccccc.6/10
aaaaa.1/2/10, bbbbb.7

I need to parse the data into separate columns per below

data1 data2 data3 data4
aaaaa.2
bbbbb.4 bbbbb.5
aaaaa.4 aaaaa.6 ccccc.6 ccccc.10
aaaaa.1 aaaaa.2 aaaaa.10 bbbbb.7

Can anyone help?

Sean Stuber

SELECT MAX(CASE WHEN rn = 1 THEN s3 END) data1,
         MAX(CASE WHEN rn = 2 THEN s3 END) data2,
         MAX(CASE WHEN rn = 3 THEN s3 END) data3,
         MAX(CASE WHEN rn = 4 THEN s3 END) data4
    FROM (SELECT data, s2 || x s3, ROW_NUMBER() OVER(PARTITION BY data ORDER BY n, m) rn
            FROM (SELECT data,
                         s,
                         n,
                         m,
                         REGEXP_SUBSTR(s, '^.+\.')
                             s2,
                         TO_NUMBER(REGEXP_SUBSTR(s,
                                                 '[0-9]+',
                                                 INSTR(s, '.'),
                                                 m))
                             x
                    FROM (SELECT data,
                                 n,
                                 TRIM(REGEXP_SUBSTR(data,
                                                    '[^,]+',
                                                    1,
                                                    n))
                                     s
                            FROM yourtable,
                                 (    SELECT LEVEL n
                                        FROM DUAL
                                  CONNECT BY LEVEL <= 4)),
                         (    SELECT LEVEL m
                                FROM DUAL
                          CONNECT BY LEVEL <= 4)
                   WHERE s IS NOT NULL)
           WHERE x IS NOT NULL)
GROUP BY data;

Open in new window

or, using PIVOT syntax

SELECT data1,
       data2,
       data3,
       data4
  FROM (SELECT data,
               s,
               x,
               ROW_NUMBER() OVER(PARTITION BY data ORDER BY n, m) rn
          FROM (SELECT data,
                       n,
                       m,
                       REGEXP_SUBSTR(s, '^.+\.')
                           s,
                       TO_NUMBER(REGEXP_SUBSTR(s,
                                               '[0-9]+',
                                               INSTR(s, '.'),
                                               m))
                           x
                  FROM (SELECT data,
                               n,
                               TRIM(REGEXP_SUBSTR(data,
                                                  '[^,]+',
                                                  1,
                                                  n))
                                   s
                          FROM yourtable,
                               (    SELECT LEVEL n
                                      FROM DUAL
                                CONNECT BY LEVEL <= 4)),
                       (    SELECT LEVEL m
                              FROM DUAL
                        CONNECT BY LEVEL <= 4)
                 WHERE s IS NOT NULL)
         WHERE x IS NOT NULL)
       PIVOT (MAX(s || x) FOR rn IN (1 AS data1, 2 AS data2, 3 AS data3, 4 AS data4))

Open in new window

you didn't specify so I made the following assumptions
You are using 11gR2 or higher
Your main field delimiters will always be commas
Your sub-field delimiters will always be forward slashes
Your numeric portion will always exist and be delimited by a period from the text portion.
You will never have more than 4 main (text) fields
You will never have more then 4 sub-fields (numeric) within a main field
You will never have more than 4 total data elements (columns) after all subfields have been extracted from all main fields.

If any of these assumptions are not correct, please provide the correct rule to clarify, plus sample data and expected results

Dianna Chua

ASKER

Thanks for the quick response.

I am using Oracle 11.2.0.3.0.
The rest of the assumptions are correct.

But I get the following error when I execute the command
ORA-01428: argument '0' is out of range

Please advise. Thanks,

Sean Stuber

That probably means one of the assumptions is NOT correct.
Most likely this one
---- Your numeric portion will always exist and be delimited by a period from the text portion.

Which would cause this INSTR(s, '.') to return 0, which would make the regexp function fail with the error you noted.

Can you provide some sample data where the error occurs as well as what you expect the result to be

Sean Stuber

I'm just taking a guess at what your data might look like, try this...

SELECT data1,
       data2,
       data3,
       data4
  FROM (SELECT data,
                  REGEXP_SUBSTR(s, '[^.]+\.?')
               || REGEXP_SUBSTR(s,
                                '[0-9]+',
                                GREATEST(INSTR(s, '.'), 1),
                                m.COLUMN_VALUE)
                   s,
               ROW_NUMBER() OVER(PARTITION BY data ORDER BY n, m.COLUMN_VALUE)
                   rn
          FROM (SELECT data,
                       n.COLUMN_VALUE
                           n,
                       TRIM(REGEXP_SUBSTR(data,
                                          '[^,]+',
                                          1,
                                          n.COLUMN_VALUE))
                           s
                  FROM yourtable,
                       TABLE(    SELECT COLLECT(LEVEL)
                                   FROM DUAL
                             CONNECT BY LEVEL <= REGEXP_COUNT(data, '[^,]+')) n),
               TABLE(    SELECT COLLECT(LEVEL)
                           FROM DUAL
                     CONNECT BY LEVEL <= REGEXP_COUNT(s, '[0-9]+', GREATEST(INSTR(s, '.'), 1))) m)
       PIVOT (MAX(s) FOR rn IN (1 AS data1, 2 AS data2, 3 AS data3, 4 AS data4))

Open in new window

same assumptions as earlier except I've removed this one:
--- Your numeric portion will always exist and be delimited by a period from the text portion.

So data like this 'abcd,xyz.1/2,mnop'
would be parsed into 4 columns
abcd xyz.1 xyz.2 mnop

Also, I removed the arbitrary limits of 4 on the text and numeric parsing, but I did keep the total column limit at 4 because the columns must be defined in the sql, they can't be derived on the fly.

But, if you want more columns all you would have to change are these 2 sections to add more columns, the rest of the sql will generate the "rn" counter of each field and subfield regardless of size.

SELECT data1, data2, data3, data4

PIVOT (MAX(s) FOR rn IN (1 AS data1, 2 AS data2, 3 AS data3, 4 AS data4))

Here is the sample data I used in my tests of the query above

WITH
    yourtable(data)
    AS
        (SELECT 'aaaaa.2' FROM DUAL
         UNION ALL
         SELECT 'bbbbb.4/5' FROM DUAL
         UNION ALL
         SELECT 'aaaaa.4/6, ccccc.6/10' FROM DUAL
         UNION ALL
         SELECT 'aaaaa.1/2/10, bbbbb.7' FROM DUAL
         UNION ALL
         SELECT 'ddddd.1/2/3/4' FROM DUAL
         UNION ALL
         SELECT 'aaa.1,bbb.2,ccc.3,ddd.4' FROM DUAL
         UNION ALL
         SELECT 'abcd,xyz.1/2,mnop' FROM DUAL
         UNION ALL
         SELECT NULL
           FROM DUAL)

Open in new window

Dianna Chua

ASKER

Oops, you are right. I didn't understand your question earlier but you are correct, the numeric portion does not always exist.

The query runs now, but when the data is as below
abcd,efgh

the result i get is
abcdabc efghefgh

Please help. Thanks.

Sean Stuber

I get results like this from 'abcd,efgh'

abcd efgh

I got these results using the query from post 42418913

did you make any changes to the query?

Dianna Chua

ASKER

Hi

No changes made to the query, I just replaced yourtable with my query for extracting the data.

Dianna Chua

ASKER

Hi

I encountered the following error when I included the query in a procedure.

PL/SQL: ORA-22905: cannot access rows from a non-nested table item

with the following line highlighted

TABLE( SELECT COLLECT(LEVEL)

Sean Stuber

The query works when I tested it. Please post your entire query. So we can compare what might be different. Even something that appears minor might have caused the problem.

Dianna Chua

ASKER

The query is per below. It works when I execute it in PLSQL with line 2 marked off, but when I include it into my procedure, the procedure fails to compile with the ORA-22905 error

SELECT data1, data2, data3, data4
into m_cartlot1, m_cartlot2, m_cartlot3, m_cartlot4
FROM (SELECT data, REGEXP_SUBSTR(s, '[^.]+\.?') || REGEXP_SUBSTR(s, '[0-9]+', GREATEST(INSTR(s, '.'), 1), m.COLUMN_VALUE) s,
ROW_NUMBER() OVER(PARTITION BY data ORDER BY n, m.COLUMN_VALUE) rn
FROM (SELECT data, n.COLUMN_VALUE n, TRIM(REGEXP_SUBSTR(data, '[^,]+', 1, n.COLUMN_VALUE)) s
FROM (select charactervalue as data from tres_rawdata_lot
where testopno='PAC02.02'
and itemprompt = 'Welded Cartridge Lot #'
and lotid ='964291.2'),
TABLE( SELECT COLLECT(LEVEL)
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(data, '[^,]+')) n),
TABLE( SELECT COLLECT(LEVEL)
FROM DUAL
CONNECT BY LEVEL <= REGEXP_COUNT(s, '[0-9]+', GREATEST(INSTR(s, '.'), 1))) m)
PIVOT (MAX(s) FOR rn IN (1 AS data1, 2 AS data2, 3 AS data3, 4 AS data4));

Dianna Chua

ASKER

Hi

Is the information provided sufficient? Do you need me to provide the full procedure?

Thanks.

ASKER CERTIFIED SOLUTION

Sean Stuber

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Dianna Chua

ASKER

Thanks! The query works fine within the procedure now.