Oracle PL/SQL - Repetitive Special Character Search

Hi,

I have a table with a text field and I need to search for records with repetitive characters/special characters.

For example, I need to find names that have 'XXX XXX XXXX' or '1111111'.
I also need to find repetitive string with special characters such as '%%$$$$$@@'.

Could anyone help me how I can construct the query to find such patterns?

Thank you.
iamnamjaAsked:
Who is Participating?
 
awking00Connect With a Mentor Commented:
SQL> select * from rpt;

STR
--------------------
aaaaaaaaa
bb
cccc
ddddddd
11
22222
@@@@@
123456
%%
eee%%$$$$@@

SQL> select * from rpt
  2  where regexp_like(translate(str,'#@$%','####'),'(.)\1{4}');

STR
--------------------
aaaaaaaaa
ddddddd
22222
@@@@@
eee%%$$$$@@
0
 
awking00Commented:
Can you provide more descriptive examples of what your search should fine and the kind of data that would contain such characters? I suspect we can use regular expressions to find repeating and special characters.
0
 
iamnamjaAuthor Commented:
Hi,

For example, in my customer name field, I have strings such as:

John Brown
Paul Smith
Mary Cohen
Susan XXXXX
11111111

My search should find 'Susan XXXXX', and '11111111'.

I tried using regular expressions in my where clause : regexp_like(Cust_Name, '[X]'), but this finds all the names that contain 'X'. I need to find strings that have any characters that show up repetitively whether it's alphanumeric or special characters.
0
Introducing Cloud Class® training courses

Tech changes fast. You can learn faster. That’s why we’re bringing professional training courses to Experts Exchange. With a subscription, you can access all the Cloud Class® courses to expand your education, prep for certifications, and get top-notch instructions.

 
slightwv (䄆 Netminder) Commented:
See if this works for you.  The 'flaw' is legitimate repeating characters.   Note in the following test case I added 'Allen Jackson' and it is returned because of the two 'l's in Allen.

drop table tab1 purge;
create table tab1(col1 varchar2(40));

insert into tab1 values('John Brown');
insert into tab1 values('Paul Smith');
insert into tab1 values('Mary Cohen');
insert into tab1 values('Susan XXXXX');
insert into tab1 values('11111');
insert into tab1 values('Allen Jackson');
commit;


select col1 from tab1 
where regexp_like(col1,'(.)\1')
/

Open in new window

0
 
iamnamjaAuthor Commented:
Thank you for the comment. However, my search shouldn't catch 'Allen Jackson' because that is a legitimate customer name. I'm trying to find any strings that contain data quality issues.
Is there any way to increase the number of occurance in the regexp_like search? For example, to return only the names with more than 5+ repeating characters.
0
 
awking00Commented:
To avoid also getting rows like Allen or Miller, you can modify the regexp to
select * from rpt
where regexp_like(str,'(.)\1{3}');
which should only retrieve where characters repeat at least 3 times.
0
 
awking00Commented:
Sorry  iposted after your comment, to retrieve characters repeating at least 5 times -
select * from rpt
where regexp_like(str,'(.)\1{5}');
0
 
slightwv (䄆 Netminder) Commented:
>>to retrieve characters repeating at least 5 times

Slight correction:
regexp_like(str,'(.)\1{4}')

The backreference counts as one.  So to get 5 repeating, you need the number 4.

If you run it against my test case with a 5, you won't get 'Susan XXXXX'
0
 
awking00Commented:
I just discovered the qualifier inside the braces is >. I always thought it was >= so for 5-time repeats
select * from rpt
where regexp_like(str,'(.)\1{4}');
0
 
awking00Commented:
Thanks, slightwv. I forgot about the backreference having to be counted.
0
 
iamnamjaAuthor Commented:
Thank you so much. I think 'select * from rpt where regexp_like(str,'(.)\1{4}'); ' will work for my search. I have one more related question:
Is there a way to find a string with combination of repeating special characters such as '%%$$$$$@@'? Not treating %, $, @ differently and to catch any kind of special non-alphanumeric characters if they are repeating more than 5+ occurances?
0
 
slightwv (䄆 Netminder)Connect With a Mentor Commented:
Off the top of my head, the way I can think of:
replace all non-alpha, non-number or a space with the same character, then do the check as already posted.

drop table tab1 purge;
create table tab1(col1 varchar2(40));

insert into tab1 values('John Brown');
insert into tab1 values('Paul Smith');
insert into tab1 values('Mary Cohen');
insert into tab1 values('Susan XXXXX');
insert into tab1 values('111111111111111111');
insert into tab1 values('Allen Jackson');
insert into tab1 values('%%$$$$@@');
commit;


with mydata as
(
select col1, regexp_replace(col1,'[^0-9a-zA-Z ]','#') fudged from tab1 
)
select col1 from mydata
where regexp_like(fudged,'(.)\1{4,}')
/

Open in new window

0
 
awking00Commented:
You might use the translate function on your string first
where regexp_like(translate(str,'#@$%','####'),'(.)\1{4}');

This example changes # to #. @ to #, $ to #, and % to #. You can expand the list to include other special characters that may exist by adding a '#' to the third translate parameter for each special character added.
0
 
iamnamjaAuthor Commented:
Thank you both awking00 and slightwv for your comments. This returns exactly what I need to search. Thanks!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.