FutureDBA-
asked on
REGEXP_SUBSTR help
Hello All,
I am trying to clean up an address table with thousands of records.
using
REGEXP_SUBSTR(ADDR_LINE1, '^[^@]+ ')
REGEXP_SUBSTR(ADDR_LINE1, '^[^&]+ ')
I've been able to clean up the addresses that had an @ or & in the middle of addressline1, delete the character and anything after.
Now I need to go through it and do the same for Double spaces ' ' or certain keywords. like
ST
STREET
BETWEEN
This will allow our software to properly geocode these addresses.
the trick with the previous is that unlike @ and &, i do not want to delete the words ST (just what comes afterwords)
I do however want to delete the words BETWEEN and anything that comes afterwords.
thank you for reading
I am trying to clean up an address table with thousands of records.
using
REGEXP_SUBSTR(ADDR_LINE1, '^[^@]+ ')
REGEXP_SUBSTR(ADDR_LINE1, '^[^&]+ ')
I've been able to clean up the addresses that had an @ or & in the middle of addressline1, delete the character and anything after.
Now I need to go through it and do the same for Double spaces ' ' or certain keywords. like
ST
STREET
BETWEEN
This will allow our software to properly geocode these addresses.
the trick with the previous is that unlike @ and &, i do not want to delete the words ST (just what comes afterwords)
I do however want to delete the words BETWEEN and anything that comes afterwords.
thank you for reading
Here is a function that I described that uses the two samples I posted above.
You just need to keep adding more scrubbing requirements to the function.
You just need to keep adding more scrubbing requirements to the function.
create or replace function clean_my_data(p_str in varchar2) return varchar2
is
v_str varchar2(4000);
begin
v_str := p_str;
--remove everything AFTER BETWEEN
v_str := substr(v_str,1,instr(v_str,'BETWEEN')-1);
--remove mutiple spaces
v_str := regexp_replace(v_str,'[ ]+',' ');
return v_str;
end;
/
show errors
select clean_my_data('Hello World BETWEEN 1 and 0') from dual;
how about something like this?
SELECT addr_line1,
REGEXP_REPLACE(
RTRIM(REGEXP_SUBSTR(addr_l ine1, '^.*?(@|&| ST| BETWEEN| |$)'), ' @&'),
'BETWEEN$'
)
clean_line
FROM yourtable
SELECT addr_line1,
REGEXP_REPLACE(
RTRIM(REGEXP_SUBSTR(addr_l
'BETWEEN$'
)
clean_line
FROM yourtable
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
We can provide specific code for your specific examples. We have a hard time providing code for general requirements.
Each of your requirements seems doable with individual calls. The problem is REGEXP functions are very inefficient and costly.
I have a similar requirement for cleaning up strings. I wrote a custom function that allows me more flexibility than many nested SQL function calls. It also allows for some 'custom' features that cannot be done with straight SQL.
Here is an example of the SQL for two of your requirements:
Open in new window