Solved

Oracle - Replace repeated characters in string

Posted on 2015-02-16
5
536 Views
Last Modified: 2015-02-17
Hi Experts,

I've a requirement to take a string from a CSV file, non qualified. I'm loading this into a staging table using UTL_FILE, and then splitting out the line into columns using the 'replace' function shown as column 'Existing' below.

The issue is that multiple columns can be empty, meaning that the line can have consecutive commas in it, and the replace function cannot deal with this. While I can nest numerous replace functions, I'm after advice on if this is the best approach.

I could write a function to keep applying replaces to a variable if it finds consecutive commas in place, however this seems incredibly inefficient.

select a, 
       replace(replace(replace(x.a,',,',', ,'),',,', ', ,'),',,', ', ,') "Existing",
       regexp_replace(x.a,',,',', ,') "RegEx1",
       regexp_replace(x.a, '(,,)', ', ,') "RegEx2",
       regexp_replace(x.a, '(,){2}', ', , ') "RegEx3"
from   (select '1,2,3,4,5,6,7,8,9,10,11' a from dual union all
        select ',2,,4,,6,,8,,10,' a from dual union all
        select ',,3,,,6,,,9,,' a from dual union all
        select ',,,4,,,,8,,,' a from dual union all
        select '1,,,4,5,,,,,,11' a from dual union all
        select '1,,,,,,,,,,11' a from dual union all
        select ',,' from dual
       ) x

Open in new window


The most promising is the 'RegEx3' column, however this hasn't put a space at the end, but if '11' is populated, it's prefixed with a space.

The end goal for this is to take this string, split it into columns (Example code below), whilst replacing 'null' or empty instances with space.

select LineNumber,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 1)), chr(13), '')  a1, 
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 2)), chr(13), '')  a2,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 3)), chr(13), '')  a3,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 4)), chr(13), '')  a4,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 5)), chr(13), '')  a5,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 6)), chr(13), '')  a6,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 7)), chr(13), '')  a7,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 8)), chr(13), '')  a8,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 9)), chr(13), '')  a9,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 10)), chr(13), '') a10,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 11)), chr(13), '') a11,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 12)), chr(13), '') a12

Open in new window


Note the above also replaces {CR} with nothing as well.
0
Comment
Question by:UBB
  • 3
  • 2
5 Comments
 
LVL 74

Expert Comment

by:sdstuber
ID: 40612536
just one replace is sufficient and then simple regexp_substr to extract.
you do still need to cleanup the added spaces though when done.



SELECT linenumber,
trim(regexp_substr(linecontent,'[^,]+',1,1))a1,
trim(regexp_substr(linecontent,'[^,]+',1,2))a2,
trim(regexp_substr(linecontent,'[^,]+',1,3))a3,
trim(regexp_substr(linecontent,'[^,]+',1,4))a4,
trim(regexp_substr(linecontent,'[^,]+',1,5))a5,
trim(regexp_substr(linecontent,'[^,]+',1,6))a6,
trim(regexp_substr(linecontent,'[^,]+',1,7))a7,
trim(regexp_substr(linecontent,'[^,]+',1,8))a8,
trim(regexp_substr(linecontent,'[^,]+',1,9))a9,
trim(regexp_substr(linecontent,'[^,]+',1,10))a10,
trim(regexp_substr(linecontent,'[^,]+',1,11))a11,
trim(regexp_substr(linecontent,'[^,]+',1,12))a12
from (select linenumber,replace(linecontent,',',' ,') linecontent      
  FROM yourdata)
0
 
LVL 74

Accepted Solution

by:
sdstuber earned 500 total points
ID: 40612550
or, with no replace at all

SELECT linenumber,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,1),',')a1,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,2),',')a2,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,3),',')a3,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,4),',')a4,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,5),',')a5,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,6),',')a6,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,7),',')a7,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,8),',')a8,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,9),',')a9,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,10),',')a10,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,11),',')a11,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,12),',')a12
from yourdata
0
 

Author Comment

by:UBB
ID: 40613972
Thanks ststubber,

That works perfectly. Could you walk me through the regex pattern please - I can see that we're looking for zero or 1 comma, followed by either a comma or end of line - how does the regex know to pick up between the first and proceeding comma.

I shouldn't question too much but i'm nervous about deploying code I don't 100% understand!

Thanks
0
 

Author Closing Comment

by:UBB
ID: 40613973
Perfect - Understood the context of the question and provided exactly what was asked!
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 40614364
>>> can see that we're looking for zero or 1 comma,

no, it's looking for 0 or more non-commas followed by a comma or end of line.

that's why it also requires the RTRIM when done to strip off the trailing comma from each field read
0

Featured Post

Secure Your Active Directory - April 20, 2017

Active Directory plays a critical role in your company’s IT infrastructure and keeping it secure in today’s hacker-infested world is a must.
Microsoft published 300+ pages of guidance, but who has the time, money, and resources to implement? Register now to find an easier way.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Note: this article covers simple compression. Oracle introduced in version 11g release 2 a new feature called Advanced Compression which is not covered here. General principle of Oracle compression Oracle compression is a way of reducing the d…
This post first appeared at Oracleinaction  (http://oracleinaction.com/undo-and-redo-in-oracle/)by Anju Garg (Myself). I  will demonstrate that undo for DML’s is stored both in undo tablespace and online redo logs. Then, we will analyze the reaso…
Via a live example, show how to take different types of Oracle backups using RMAN.
This video shows how to copy an entire tablespace from one database to another database using Transportable Tablespace functionality.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question