Link to home
Start Free TrialLog in
Avatar of Iver Erling Arva
Iver Erling ArvaFlag for Norway

asked on

FInd a Regex that spits string into actual pieces even when they are empty

I am struggeling a bit with Oracle and Regex. I want to use Oracle to split strings into pieces or to select certain pieces from a string, if you like.

E.g.

select regexp_substr('This;is;a;small;;test','¨[^;]+',1,1) from dual

Open in new window

works fine., 'This' is returned.
So does:
select regexp_substr('This;is;a;small;;test','¨[^;]+',1.4) from dual;

Open in new window

where 'small' is returned,
But putting in
select regexp_substr('This;is;a;small;;test','¨[^;]+',1,5) from dual;

Open in new window

returns 'test' and not NULL.

Piece 5 is empty, and still 'test' is returned.
Doing the same for piece 6 returns NULL. This immediately makes more sense due to the fact that there isn't a ';' after 'test', but you see my point, I would like to have returned
1   This
2   is
3   a
4   small
5   (NULL)
6   test

Open in new window

Is there an easy way to achieve this? After all it is very common that some pieces in a delimited string are empty.

Thanks!

Brgds
IVer in Oslo
Avatar of Bill Prew
Bill Prew

It felt like this might / should be a way to work around this, but it isn't working quite the way I would expect here.  Notice that the first and last token would be treated as a special case, and then the "middle" tokens would be surrounded by semi-colons, and could be empty.  Not sure what Oracle's regex is doing though that isn't behaving as I expected.  Sharing here in case a different perspective sees something I am missing...

select trim(';' from regexp_substr('This;is;a;small;;test','^[^;]*;',1,1)) as Col1 from dual;
select trim(';' from regexp_substr('This;is;a;small;;test',';[^;]*;',1,1)) as Col2 from dual;
select trim(';' from regexp_substr('This;is;a;small;;test',';[^;]*;',1,2)) as Col3 from dual;
select trim(';' from regexp_substr('This;is;a;small;;test',';[^;]*;',1,3)) as Col4 from dual;
select trim(';' from regexp_substr('This;is;a;small;;test',';[^;]*;',1,4)) as Col5 from dual;
select trim(';' from regexp_substr('This;is;a;small;;test',';[^;]*$',1,1)) as Col6 from dual;

Open in new window


»bp
ASKER CERTIFIED SOLUTION
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Bill, your suggestion was very close to working. The reason it didn't quite work is because of the semicolon character at the end of your pattern, which is "consumed" in each match, causes an incorrect result where you're taking a match beyond the first. You can remedy this with a lookahead (?=;) but it's actually not necessary in the pattern as the * wildcard is greedy.
Good, glad we sorted that out.

The only case it doesn't seem to handle well is a leading semicolon in the string, it seems to skip over the first non blank token.

http://sqlfiddle.com/#!4/8203ff/74


»bp
Avatar of Iver Erling Arva

ASKER

Yes, that is true! So you get pieces 1,3,4,5,6 etc.

IVer
Iver, did you see my solution? It works...
Yes, sorry, Terry! It works. Thanks! I would probably use REPLACE rather than TRIM FROM, but would be interrested in hearing if there is a reason to prefer TRIM FROM.

All the best from Iver