Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

REGEXP_SUBSTR help

Posted on 2014-09-18
4
Medium Priority
?
217 Views
Last Modified: 2014-10-02
Hello All,

I am trying to clean up an address table with thousands of records.

using
REGEXP_SUBSTR(ADDR_LINE1, '^[^@]+ ')
REGEXP_SUBSTR(ADDR_LINE1, '^[^&]+ ')

I've been able to clean up the addresses that had an @ or & in the middle of addressline1, delete the character and anything after.

Now I need to go through it and do the same for Double spaces '  ' or certain keywords. like

ST
STREET
BETWEEN

This will allow our software to properly geocode these addresses.

the trick with the previous is that unlike @ and &, i do not want to delete the words ST (just what comes afterwords)

I do however want to delete the words BETWEEN and anything that comes afterwords.

thank you for reading
0
Comment
Question by:FutureDBA-
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40331648
PLEASE:  Sample data and expeected results

We can provide specific code for your specific examples.  We have a hard time providing code for general requirements.

Each of your requirements seems doable with individual calls.  The problem is REGEXP functions are very inefficient and costly.

I have a similar requirement for cleaning up strings.  I wrote a custom function that allows me more flexibility than many nested SQL function calls.  It also allows for some 'custom' features that cannot be done with straight SQL.

Here is an example of the SQL for two of your requirements:
--remove everything AFTER BETWEEN
with mydata as
(
select 'This is BETWEEN 0 and 1' mydata from dual
)
select substr(mydata,1,instr(mydata,'BETWEEN')-1) from mydata;


--remove multiple spaces
select regexp_replace('a        b','[ ]+',' ') from dual;

Open in new window

0
 
LVL 77

Expert Comment

by:slightwv (䄆 Netminder)
ID: 40331654
Here is a function that I described that uses the two samples I posted above.

You just need to keep adding more scrubbing requirements to the function.

create or replace function clean_my_data(p_str in varchar2) return varchar2
is
v_str varchar2(4000);
begin

	v_str := p_str;

	--remove everything AFTER BETWEEN
	v_str := substr(v_str,1,instr(v_str,'BETWEEN')-1);

	--remove mutiple spaces
	v_str :=  regexp_replace(v_str,'[ ]+',' ');

	return v_str;

end;
/

show errors


select clean_my_data('Hello             World BETWEEN 1 and 0') from dual;

Open in new window

0
 
LVL 74

Expert Comment

by:sdstuber
ID: 40332650
how about something like this?

SELECT addr_line1,
       REGEXP_REPLACE(
           RTRIM(REGEXP_SUBSTR(addr_line1, '^.*?(@|&| ST| BETWEEN|  |$)'), ' @&'),
           'BETWEEN$'
       )
           clean_line
  FROM yourtable
0
 
LVL 74

Accepted Solution

by:
sdstuber earned 2000 total points
ID: 40332664
breaking my previous post down...

REGEXP_SUBSTR(addr_line1, '^.*?(@|&| ST| BETWEEN|  |$)')

  read until you find one of the elements in the pipe-delimited list, you can expand that, just make sure you include $ as the last choice

RTRIM(....,' @&')

remove any single character trash from the end

REGEXP_REPLACE(...,'BETWEEN$')

remove whole words from the end

if you want to create a list of words then change it to something like this, be sure to include the same words in the first regexp too

REGEXP_REPLACE(...,'(BETWEEN|SOMEWORD|ANOTHERWORD)$')


As long as your requirements are fairly simple like these the regexp will probably be fine, but if they get complicated you may want to go with a pl/sql solution as slightwv suggested.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction A previously published article on Experts Exchange ("Joins in Oracle", http://www.experts-exchange.com/Database/Oracle/A_8249-Joins-in-Oracle.html) makes a statement about "Oracle proprietary" joins and mixes the join syntax with gen…
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
This video explains at a high level about the four available data types in Oracle and how dates can be manipulated by the user to get data into and out of the database.
This video explains what a user managed backup is and shows how to take one, providing a couple of simple example scripts.

721 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question