?
Solved

Finding parts of a string with regexp

Posted on 2014-07-21
6
Medium Priority
?
329 Views
Last Modified: 2014-07-21
I would like to return a string using a regular expression where anything with a dash and data either side is returned.
Example.
select
'test data 123-456 and more test data 6789-789b and more test data' as col1
from dual;

So, I would like the following returned with a comma seperating each find - is this possible and how?
'123-456, 6789-789b'
0
Comment
Question by:PHIL Sawyer
  • 2
  • 2
  • 2
6 Comments
 
LVL 16

Expert Comment

by:Wasim Akram Shaik
ID: 40208531
try something like this

select    
        regexp_replace(regexp_replace(
             'Test data 123-456 and More test data 6789-789b and more test data 1245-092',
             '[A-Za-z]+', ',') ,',', '')
      as Col1
 from dual;

http://www.sqlfiddle.com/#!4/d41d8/32928/0
0
 

Author Comment

by:PHIL Sawyer
ID: 40208638
Can you explain how this works - Thanks
0
 
LVL 74

Accepted Solution

by:
sdstuber earned 2000 total points
ID: 40208776
Actually, the first post doesn't work.
Note the "b" is stripped off the end of the second substring.  Also it doesn't do the comma delimiters, it leaves multiple characters of whitespace instead.


Try this, I made the counter go up to 10, if you might have more substrings than that, then simply make the 10 something bigger.


SELECT LISTAGG(
           REGEXP_SUBSTR(
               col1,
               '[^ ]+-[^ ]+',
               1,
               n
           ),
           ','
       )
       WITHIN GROUP (ORDER BY n)
  FROM (SELECT 'test data 123-456 and more test data 6789-789b and more test data' AS col1
          FROM DUAL),
       (    SELECT LEVEL n
              FROM DUAL
        CONNECT BY LEVEL < 10)
 WHERE REGEXP_SUBSTR(
           col1,
           '[^ ]+-[^ ]+',
           1,
           n
       )
           IS NOT NULL

Open in new window

;

The way this works is it generates a list of numbers N 1-10 (but you can change that).

Then for each N it looks for patterns of  '[^ ]+-[^ ]+'
which means one or more non-space characters followed by a "-" followed by  one or more non-space characters.

Since this example only has 2 substrings, N 3-10 will find nothing and hence return NULL, so the where clause exludes those.

LISTAGG pulls the 2 substrings together, ordered by the sequence found in the original string and puts a comma between them.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 16

Expert Comment

by:Wasim Akram Shaik
ID: 40208800
I completely agree with sdstuber.. though thats the best i could do with regex..

I think Sdstuber would be able to explain it in a better way.. lets wait for him..

mean while i would explain what i did..

regexp_replace(
              'Test data 123-456 and More test data 6789-789b and more test data 1245-092',
              '[A-Za-z]+', ',')

the first regex_replace will find out the alphabetical characters in a given string and it will replace them with commas..

see the range A-Z and a-z.. this will leave the -(hyphen) and digits(numbers) as is..
and the second regex is replacing the , (comma) character with whitespace..
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 40208916
I'm not sure what you would like me to explain all the code above has been described adequately I think.  If there is something specific you'd like me to discuss let me know.


This isn't necessary but... here's another variation that might be a little easier to follow but does require creating a type and function.

SELECT LISTAGG(str, ',') WITHIN GROUP (ORDER BY rn)
  FROM (SELECT COLUMN_VALUE str, ROWNUM rn
          FROM TABLE(
                   str2tbl(
                       'Test data 123-456 and More test data 6789-789b and more test data 1245-092',
                       ' '
                   )
               )
         WHERE COLUMN_VALUE LIKE '%_-_%');

Open in new window


Here again the idea is to use the spaces as delimiters.  str2tbl will take the string and split each word into its own row of one column called COLUMN_VALUE.   Then it's just a matter of looking for those strings that have a dash in them and then aggregate into a comma delimited string as shown in my previous post.



You can find the str2tbl and associated type definitions here:
http://www.e-e.com/Q_27233295.html
0
 

Author Closing Comment

by:PHIL Sawyer
ID: 40209089
Brilliant - thanks
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Working with Network Access Control Lists in Oracle 11g (part 2) Part 1: http://www.e-e.com/A_8429.html Previously, I introduced the basics of network ACL's including how to create, delete and modify entries to allow and deny access.  For many…
Note: this article covers simple compression. Oracle introduced in version 11g release 2 a new feature called Advanced Compression which is not covered here. General principle of Oracle compression Oracle compression is a way of reducing the d…
Via a live example show how to connect to RMAN, make basic configuration settings changes and then take a backup of a demo database
This video shows how to Export data from an Oracle database using the Original Export Utility.  The corresponding Import utility, which works the same way is referenced, but not demonstrated.
Suggested Courses

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question