Link to home
Start Free TrialLog in
Avatar of partyon
partyon

asked on

I need to find credit card numbers hidden in text columns, using Oracle 11g and SQLPlus

During an audit it was discovered that users where hiding customers credit card info in text columns like a description or notes column vs. where it belongs. We can't filter it on the front-end because it's 3rd party software and we are unable to make changes and the vender refuses to make changes based on our needs.
So I need to use some SQL commands to scan columns for matches. Been looking at the REGEXP_SUBSTR commands but I have to write something that will scan all the columns in a particular schema with applying the expression.

Has anyone accomplished this task already ? If so any direction or examples are greatly appreciated.
Avatar of slightwv (䄆 Netminder)
slightwv (䄆 Netminder)

How much data are you talking about?

I would look towards regexp_instr but it is the same principle.

The problems you'll have is how to search for the number.  Off the top of my head, I can think of at least 3 ways:  series of numbers with spaces, without spaces and with dashes.

Depending on how much data you are talking, I would probably create a pl/sql loop, take each row and convert it to XML using DBMS_XMLGEN.GETXML and perform the regexp_instr on the row.
I agree with the recommendation to use a PL\SQL procedure for this with a cursor loop to retrieve the possible varchar2 (and/or number?) columns that may contain credit card numbers.  Then exactly which evaluation technique you use for each value retrieved may depend a bit on your data volumes and on which technique(s) you may have some experience with.  I don't see how converting the rows to XML would help (but then I'm no expert in XML, so I don't know what operators might be available in XML).  Regexp_instr may be the most efficient operator to use for searching through varchar2 data in PL\SQL, but I'm no expert with this either.  It does look like a good idea but I can't help you with the syntax for that.
>>I don't see how converting the rows to XML would help

It would convert the entire row to a 'string'.  Then one call to regexp_instr would check ALL columns at once.

That way you don't have to worry about conversions, string concatenation, individual columns, etc...

Here's an example:
drop table tab1 purge;

create table tab1(id number primary key, col1 varchar2(20), col2 varchar2(20));


insert into tab1 values(1,'No card here','Hello');
insert into tab1 values(2,'1111222233334444','Hello');
insert into tab1 values(3,'Hello','1111 2222 3333 4444');
insert into tab1 values(4,'Hello','1111-2222-3333-4444');
commit;


declare
	myrow clob;
begin
	for i in (select id from tab1) loop
		select dbms_xmlgen.getxml('select * from tab1 where id=' || i.id) into myrow from dual;
		if regexp_instr(myrow,'[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}') > 0 then
			dbms_output.put_line('May have found one: ' || to_char(i.id));
		end if;
	end loop;
end;
/

Open in new window

Avatar of partyon

ASKER

I'm first searching all tables and columns in a particular schema that has data types of 'CHAR', 'VARCHAR2', 'NCHAR' 'NVARCHAR2', 'CLOB' and "NCLOB'  these would be columns that could hold a credit card number in them.

So, If I use the regexp_instr(myrow,'[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}') > 0  will it return credit card numbers for all the variations ? meaning Visa, MC, Amex, Discover ..... ?  So my output should be something like this as a pattern 1234-1234-1234-1234 correct ?
>>If I use the regexp_instr(myrow,'[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}[ -]?[0-9]{4}') > 0  will it return credit card numbers for all the variations ? meaning Visa, MC, Amex, Discover ..... ?  

It looks for 4 numbers followed by an optional space or dash then another set of 4 numbers... etc...

It matches the patterns:
1112223334444
1111 2222 3333 4444
1111-2222-3333-4444

I cannot say ALL credit card numbers are 12 digits but all mine are...  Memory tells me that there are some that are not.

It also doesn't guarantee the number it finds is a credit card number.  It could be just a number that fits the 'pattern'.
SOLUTION
Avatar of Mark Geerlings
Mark Geerlings
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.