Solved

Oracle SQL - REGEXP_REPLACE

Posted on 2014-03-06
20
986 Views
Last Modified: 2014-03-11
Hi, I have a long text field in my table (which is in 'BLOB' type) with the following data format:

:10: ABCDEF
:20: Happy Customer
:30: \646-798-2312

From this column, I want to extract only the characters that come between :20: and :30: which is 'Happy Customer' in this case.
how can I construct this pattern using RegExp_Replace?

Thank you!
0
Comment
Question by:iamnamja
  • 11
  • 7
  • 2
20 Comments
 
LVL 23

Expert Comment

by:David
ID: 39910604
What do you have in mind for first converting the binary large object into VARCHAR2?  Otherwise you're basically doing a simple function, instr(schema.table,20,10) as NEWCOL.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39910608
>>how can I construct this pattern using RegExp_Replace?

From your last question on this:  I would stay away from regex calls because they are expensive.  There are alternatives.

I assume all that data is in a single blob.

I assume you are wanting to parse that data into three individual rows?

Is there a carriage-return (CR) or a carriage-return/linefeed (CR/LF) at the end of each line?

What is the maximum size of the BLOB data in your database?
0
 

Author Comment

by:iamnamja
ID: 39910619
Hi,

Yes, all that data is in a single blob column. I want to parse the data into three individual columns (not rows).
There is a carriage-return/linefeed (CR/LF) at the end of each line.
The maximum size of the BLOB data is over 4000+ characters.

I don't need to convert the column to varchar2 as long as I can parse it and put in a new column.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39910631
>>The maximum size of the BLOB data is over 4000+ characters.

What is the maximum size?

>> I want to parse the data into three individual columns (not rows).

So every row in a single blob will be multiple columns?

Do you have a maximum number of columns you may have in the BLOB?


Can you change the original app to NOT use BLOBs?
0
 

Author Comment

by:iamnamja
ID: 39910652
I'm not sure how I can find out the maximum size of the BLOB column.
I don't need to parse every row in a single blob to be put in a column.
I just need to find a particular pattern which is :20: in this example, and put whatever text is in between that pattern and the next pattern (:30).
I cannot change the original app to not use BLOBs.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39910687
>>I'm not sure how I can find out the maximum size of the BLOB column.

select max(dbms_lob.getlength(BLOB_COLUMN)) from some_table;

>>I don't need to parse every row in a single blob to be put in a column


I mis-read the question.  You want to search the BLOB and extract text from between two delimiters you provide?

For example: given your sample, if you provided :10: and :30: you want a single string returned containing:

ABCDEF
:20: Happy Customer

If so, what is the maximum size of string that can be returned?
0
 

Author Comment

by:iamnamja
ID: 39910708
Yes, I want to search the BLOB and extract text from between two delimiters.

>>For example, given your sample if you provided :10: and :30: you want a single string >>returned containing:

>>ABCDEF
>>:20: Happy Customer

Correct. Except that I would only want something between :10: and :20:, or :20: and :30:. Never between :10: and :30:.

The maximum size of string that can be returned would be less than 1000. I would say, max 200-300.

It may take a while to run the max size query for my BLOB column.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39910717
Based on you closing your other question, you can already convert the BLOB to a varchar2, correct?

If not, you shouldn't have closed that question.  I also have a comment in it about the grade you provided.
0
 

Author Comment

by:iamnamja
ID: 39910760
Hi, sorry about that.
I still cannot convert the BLOB to a varchar2. I thought the command you provided was not relevant because I would have to do it in the remote server and I only have read access on the remote server.

I gave the grade B because I couldn't use any solutions provided, but it was still very useful information. I could change to grade A, but not sure how to do it at this point.
Also, I'll be more careful in giving grades in the future.
Could you assist with this question? I'm just leaving the BLOB field as is, and just searching in it.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39910771
>> I thought the command you provided was not relevant because I would have to do it in the remote server and I only have read access on the remote server

Incorrect.  You should have asked for clarification.

I'll unaccept the other question so it can be closed properly.

Let's leave this question on parsing the data once you get it converted.

I'll still like the answers on sizes etc...

No I would also like more about the requirements:  Is this going to be an on-going thing or a 1 time extract/parse?
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 

Author Comment

by:iamnamja
ID: 39910817
Ok, I updated my other question. Thank you for that.

This is going to be just 1 time parse. What I really would like to know here is if this was a normal string (varchar2) column, how should I write regexp_replace to find that kind of a pattern.

Not sure what difference it makes whether the column is in BLOB or VARCHAR2 in terms of writing the regexp function.

(The purpose of asking this question was also so that I can learn how to write regexp_replace pattern).
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39910825
>>Ok, I updated my other question. Thank you for that.

I unaccepted it again.  Let's keep these two questions.

>>Not sure what difference it makes whether the column is in BLOB or VARCHAR2 in terms of writing the regexp function.

Yes, BLOBs cannot be parsed with regexp functions.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39911074
Here are two examples using your data (once in varchar2 state).

drop table tab1 purge;
create table tab1(col1 varchar2(100));

insert into tab1 values('
:10: ABCDEF
:20: Happy Customer
:30: \646-798-2312
');
commit;

select substr(col1, instr(col1,':20:')+5, instr(col1, ':30:')-(instr(col1,':20:')+5)-1)
from tab1;

select regexp_replace(col1,'(.*:20: )(.*)([[:space:]]*:30:.*)','\2',1,1,'n')
from tab1;

Open in new window


I have an idea about working with the native BLOBs but I need to know your exact database version (all 4 numbers, like 10.2.0.4) and if you have Oracle Text installed?

A quick check for Text is:
select username from dba_users where username='CTXSYS';
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39911173
OK, I'm going to assume you can install Oracle Text on the local database if it isn't already installed.

Here's a neat trick to convert a BLOB into filtered text.

See if this will work for you.  It does it all in one step (except selecting the BLOB across a database link).

drop table tab1 purge;
create table tab1(col1 blob);

insert into tab1 values(utl_raw.cast_to_raw('
:10: ABCDEF
:20: Happy Customer
:30: \646-798-2312
'));
commit;

exec ctx_ddl.drop_policy(policy_name => 'MyPolicy');
exec ctx_ddl.create_policy(policy_name => 'MyPolicy');

create or replace function blobSearch (
	p_blob in blob,
	p_start_delimiter in varchar2,
	p_end_delimiter in varchar2 ) return varchar2
is
	myClob clob;
begin
	ctx_doc.policy_filter('MyPolicy', p_blob, myClob, TRUE, null, null, null);

	return trim(substr(myClob, instr(myClob,p_start_delimiter)+length(p_start_delimiter)+1, instr(myClob, p_end_delimiter)-(instr(myClob,p_start_delimiter)+length(p_start_delimiter)+1)-1));

end;
/

show errors

select blobSearch(col1,':20:',':30:') from tab1;
select blobSearch(col1,':10:',':20:') from tab1;

Open in new window

0
 
LVL 23

Expert Comment

by:David
ID: 39911215
This is a fun one!
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39911245
dvz,
This is the easy part.  check out the first piece of the puzzle:
http://www.experts-exchange.com/Database/Oracle/Q_28381989.html

Query remote BLOBs.
0
 

Author Comment

by:iamnamja
ID: 39913043
Thank you for all your help and comments.

From your function:
select regexp_replace(col1,'(.*:20: )(.*)([[:space:]]*:30:.*)','\2',1,1,'n')
from tab1;

what does the last three part mean?
1,1,'n'

I think this works, but because my column has many different versions of this tag, I need to write something flexible. For example, :10: can be changed to :10A:, so I need to add something to detect whether there's a character after :10:

Also, this doesn't take care of line spacing (CR LF) in contents between tags. I need to capture everything whether there's a line space or not.

Sorry for making this so complicated.

The max size of the BLOB column is 7557.
0
 
LVL 76

Expert Comment

by:slightwv (䄆 Netminder)
ID: 39913804
>>what does the last three part mean?

Check the online docs for that function call:
http://docs.oracle.com/cd/E11882_01/server.112/e41084/functions149.htm#SQLRF06302

The docs will explain the parameters as well as what they mean.

The important one is the 'n'.  Once you read the docs I hope you see why.  If not, just post back and I'll explain it.

>>Sorry for making this so complicated.

Not complicated but you need to try and post all the requirements up front.

Then do you also want the space character in front of "Happy Customer" since there is one after the ":20:"?

Keeping the CR/LF before the :30: is easy.  Just get rid of the '-1'.

from:
return trim(substr(myClob, instr(myClob,p_start_delimiter)+length(p_start_delimiter)+1, instr(myClob, p_end_delimiter)-(instr(myClob,p_start_delimiter)+length(p_start_delimiter)+1)-1));

to:
return trim(substr(myClob, instr(myClob,p_start_delimiter)+length(p_start_delimiter)+1, instr(myClob, p_end_delimiter)-(instr(myClob,p_start_delimiter)+length(p_start_delimiter)+1)));

>>For example, :10: can be changed to :10A:, so I need to add something to detect whether there's a character after :10:

Are the sure it can only be JUST ONE?  or is ':10ABC DEF GHI:' possible?

It is now looking good for a regexp call...  I'll see what I can do.
0
 
LVL 76

Accepted Solution

by:
slightwv (䄆 Netminder) earned 500 total points
ID: 39914004
Here is the updated version.

Note that I now allow the leading space character after the starting delimiter since the data has it.

If it still has issues, please add additional test data to tab1 and post the expected results.

drop table tab1 purge;
create table tab1(col1 blob);

insert into tab1 values(utl_raw.cast_to_raw('
:10: ABCDEF
:20: Happy Customer
:30: \646-798-2312
'));

--A new row based on the new requirement that has a character AFTER the 10
insert into tab1 values(utl_raw.cast_to_raw('
:10A: ABCDEF
:20: Unhappy Customer
:30: \646-798-2312
'));

--A new row based on the possibility of anything after the ':10' and before the ':'
insert into tab1 values(utl_raw.cast_to_raw('
:10 Just in case there is a TON of stuff: ABCDEF
:20: Another Customer
:30: \646-798-2312
'));
commit;

exec ctx_ddl.drop_policy(policy_name => 'MyPolicy');
exec ctx_ddl.create_policy(policy_name => 'MyPolicy');

create or replace function blobSearch (
	p_blob in blob,
	p_start_delimiter in varchar2,
	p_end_delimiter in varchar2 ) return varchar2
is
	myClob clob;
begin
	ctx_doc.policy_filter('MyPolicy', p_blob, myClob, TRUE, null, null, null);

	--original:  return '-' || trim(substr(myClob, instr(myClob,p_start_delimiter)+length(p_start_delimiter)+1, instr(myClob, p_end_delimiter)-(instr(myClob,p_start_delimiter)+length(p_start_delimiter)+1))) || '-';

	return regexp_replace(myClob,'(.*' || regexp_replace(p_start_delimiter,':$','.*:') || ')(.*)([[:space:]]*' || p_end_delimiter || '.*)','\2',1,1,'n');

end;
/

show errors

--I added the '-' to the front and end of the return value to make sure I have the correct values
select '-' || blobSearch(col1,':20:',':30:') || '-' from tab1;
select '-' || blobSearch(col1,':10:',':20:') || '-' from tab1;

Open in new window

0
 

Author Closing Comment

by:iamnamja
ID: 39920460
This works great. Thank you so much for your help!
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Working with Network Access Control Lists in Oracle 11g (part 2) Part 1: http://www.e-e.com/A_8429.html Previously, I introduced the basics of network ACL's including how to create, delete and modify entries to allow and deny access.  For many…
Configuring and using Oracle Database Gateway for ODBC Introduction First, a brief summary of what a Database Gateway is.  A Gateway is a set of driver agents and configurations that allow an Oracle database to communicate with other platforms…
This video shows how to copy a database user from one database to another user DBMS_METADATA.  It also shows how to copy a user's permissions and discusses password hash differences between Oracle 10g and 11g.
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now