Finding parts of a string with regexp

I would like to return a string using a regular expression where anything with a dash and data either side is returned.
Example.
select
'test data 123-456 and more test data 6789-789b and more test data' as col1
from dual;

So, I would like the following returned with a comma seperating each find - is this possible and how?
'123-456, 6789-789b'
PHIL SawyerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Wasim Akram ShaikCommented:
try something like this

select    
        regexp_replace(regexp_replace(
             'Test data 123-456 and More test data 6789-789b and more test data 1245-092',
             '[A-Za-z]+', ',') ,',', '')
      as Col1
 from dual;

http://www.sqlfiddle.com/#!4/d41d8/32928/0
0
PHIL SawyerAuthor Commented:
Can you explain how this works - Thanks
0
sdstuberCommented:
Actually, the first post doesn't work.
Note the "b" is stripped off the end of the second substring.  Also it doesn't do the comma delimiters, it leaves multiple characters of whitespace instead.


Try this, I made the counter go up to 10, if you might have more substrings than that, then simply make the 10 something bigger.


SELECT LISTAGG(
           REGEXP_SUBSTR(
               col1,
               '[^ ]+-[^ ]+',
               1,
               n
           ),
           ','
       )
       WITHIN GROUP (ORDER BY n)
  FROM (SELECT 'test data 123-456 and more test data 6789-789b and more test data' AS col1
          FROM DUAL),
       (    SELECT LEVEL n
              FROM DUAL
        CONNECT BY LEVEL < 10)
 WHERE REGEXP_SUBSTR(
           col1,
           '[^ ]+-[^ ]+',
           1,
           n
       )
           IS NOT NULL

Open in new window

;

The way this works is it generates a list of numbers N 1-10 (but you can change that).

Then for each N it looks for patterns of  '[^ ]+-[^ ]+'
which means one or more non-space characters followed by a "-" followed by  one or more non-space characters.

Since this example only has 2 substrings, N 3-10 will find nothing and hence return NULL, so the where clause exludes those.

LISTAGG pulls the 2 substrings together, ordered by the sequence found in the original string and puts a comma between them.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

Wasim Akram ShaikCommented:
I completely agree with sdstuber.. though thats the best i could do with regex..

I think Sdstuber would be able to explain it in a better way.. lets wait for him..

mean while i would explain what i did..

regexp_replace(
              'Test data 123-456 and More test data 6789-789b and more test data 1245-092',
              '[A-Za-z]+', ',')

the first regex_replace will find out the alphabetical characters in a given string and it will replace them with commas..

see the range A-Z and a-z.. this will leave the -(hyphen) and digits(numbers) as is..
and the second regex is replacing the , (comma) character with whitespace..
0
sdstuberCommented:
I'm not sure what you would like me to explain all the code above has been described adequately I think.  If there is something specific you'd like me to discuss let me know.


This isn't necessary but... here's another variation that might be a little easier to follow but does require creating a type and function.

SELECT LISTAGG(str, ',') WITHIN GROUP (ORDER BY rn)
  FROM (SELECT COLUMN_VALUE str, ROWNUM rn
          FROM TABLE(
                   str2tbl(
                       'Test data 123-456 and More test data 6789-789b and more test data 1245-092',
                       ' '
                   )
               )
         WHERE COLUMN_VALUE LIKE '%_-_%');

Open in new window


Here again the idea is to use the spaces as delimiters.  str2tbl will take the string and split each word into its own row of one column called COLUMN_VALUE.   Then it's just a matter of looking for those strings that have a dash in them and then aggregate into a comma delimited string as shown in my previous post.



You can find the str2tbl and associated type definitions here:
http://www.e-e.com/Q_27233295.html
0
PHIL SawyerAuthor Commented:
Brilliant - thanks
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Oracle Database

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.