?
Solved

Oracle - Replace repeated characters in string

Posted on 2015-02-16
5
Medium Priority
?
646 Views
Last Modified: 2015-02-17
Hi Experts,

I've a requirement to take a string from a CSV file, non qualified. I'm loading this into a staging table using UTL_FILE, and then splitting out the line into columns using the 'replace' function shown as column 'Existing' below.

The issue is that multiple columns can be empty, meaning that the line can have consecutive commas in it, and the replace function cannot deal with this. While I can nest numerous replace functions, I'm after advice on if this is the best approach.

I could write a function to keep applying replaces to a variable if it finds consecutive commas in place, however this seems incredibly inefficient.

select a, 
       replace(replace(replace(x.a,',,',', ,'),',,', ', ,'),',,', ', ,') "Existing",
       regexp_replace(x.a,',,',', ,') "RegEx1",
       regexp_replace(x.a, '(,,)', ', ,') "RegEx2",
       regexp_replace(x.a, '(,){2}', ', , ') "RegEx3"
from   (select '1,2,3,4,5,6,7,8,9,10,11' a from dual union all
        select ',2,,4,,6,,8,,10,' a from dual union all
        select ',,3,,,6,,,9,,' a from dual union all
        select ',,,4,,,,8,,,' a from dual union all
        select '1,,,4,5,,,,,,11' a from dual union all
        select '1,,,,,,,,,,11' a from dual union all
        select ',,' from dual
       ) x

Open in new window


The most promising is the 'RegEx3' column, however this hasn't put a space at the end, but if '11' is populated, it's prefixed with a space.

The end goal for this is to take this string, split it into columns (Example code below), whilst replacing 'null' or empty instances with space.

select LineNumber,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 1)), chr(13), '')  a1, 
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 2)), chr(13), '')  a2,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 3)), chr(13), '')  a3,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 4)), chr(13), '')  a4,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 5)), chr(13), '')  a5,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 6)), chr(13), '')  a6,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 7)), chr(13), '')  a7,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 8)), chr(13), '')  a8,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 9)), chr(13), '')  a9,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 10)), chr(13), '') a10,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 11)), chr(13), '') a11,
              replace(trim(REGEXP_SUBSTR(replace(replace(LineContent,',,',', ,'),',,',', ,'), '[^,]+', 1, 12)), chr(13), '') a12

Open in new window


Note the above also replaces {CR} with nothing as well.
0
Comment
Question by:UBB
  • 3
  • 2
5 Comments
 
LVL 74

Expert Comment

by:sdstuber
ID: 40612536
just one replace is sufficient and then simple regexp_substr to extract.
you do still need to cleanup the added spaces though when done.



SELECT linenumber,
trim(regexp_substr(linecontent,'[^,]+',1,1))a1,
trim(regexp_substr(linecontent,'[^,]+',1,2))a2,
trim(regexp_substr(linecontent,'[^,]+',1,3))a3,
trim(regexp_substr(linecontent,'[^,]+',1,4))a4,
trim(regexp_substr(linecontent,'[^,]+',1,5))a5,
trim(regexp_substr(linecontent,'[^,]+',1,6))a6,
trim(regexp_substr(linecontent,'[^,]+',1,7))a7,
trim(regexp_substr(linecontent,'[^,]+',1,8))a8,
trim(regexp_substr(linecontent,'[^,]+',1,9))a9,
trim(regexp_substr(linecontent,'[^,]+',1,10))a10,
trim(regexp_substr(linecontent,'[^,]+',1,11))a11,
trim(regexp_substr(linecontent,'[^,]+',1,12))a12
from (select linenumber,replace(linecontent,',',' ,') linecontent      
  FROM yourdata)
0
 
LVL 74

Accepted Solution

by:
sdstuber earned 2000 total points
ID: 40612550
or, with no replace at all

SELECT linenumber,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,1),',')a1,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,2),',')a2,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,3),',')a3,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,4),',')a4,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,5),',')a5,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,6),',')a6,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,7),',')a7,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,8),',')a8,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,9),',')a9,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,10),',')a10,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,11),',')a11,
rtrim(regexp_substr(linecontent,'[^,]*(,|$)',1,12),',')a12
from yourdata
0
 

Author Comment

by:UBB
ID: 40613972
Thanks ststubber,

That works perfectly. Could you walk me through the regex pattern please - I can see that we're looking for zero or 1 comma, followed by either a comma or end of line - how does the regex know to pick up between the first and proceeding comma.

I shouldn't question too much but i'm nervous about deploying code I don't 100% understand!

Thanks
0
 

Author Closing Comment

by:UBB
ID: 40613973
Perfect - Understood the context of the question and provided exactly what was asked!
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 40614364
>>> can see that we're looking for zero or 1 comma,

no, it's looking for 0 or more non-commas followed by a comma or end of line.

that's why it also requires the RTRIM when done to strip off the trailing comma from each field read
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Working with Network Access Control Lists in Oracle 11g (part 1) Part 2: http://www.e-e.com/A_9074.html So, you upgraded to a shiny new 11g database and all of a sudden every program that used UTL_MAIL, UTL_SMTP, UTL_TCP, UTL_HTTP or any oth…
This post first appeared at Oracleinaction  (http://oracleinaction.com/undo-and-redo-in-oracle/)by Anju Garg (Myself). I  will demonstrate that undo for DML’s is stored both in undo tablespace and online redo logs. Then, we will analyze the reaso…
This video shows information on the Oracle Data Dictionary, starting with the Oracle documentation, explaining the different types of Data Dictionary views available by group and permissions as well as giving examples on how to retrieve data from th…
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
Suggested Courses

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question