Link to home
Start Free TrialLog in
Avatar of steveoskh

asked on

Strip Credit Card numbers from Text files

We have historical POS transaction files that contain credit card information.  We would like strip or truncate the credit card numbers from these files.  I am looking for programs or scripts that would accomplish this.  
All files are in a single directory structure, so the tool should look at all files and folders at a starting level.
The files contain other numbers, so it needs to know the difference between a credit card and a UPC or other string of numbers.

Sample file data:

Sample 2:
Term2,03/05/06,09:35:23,139,*VISA*                      100.00
Term2,03/05/06,09:35:23,139,S 4400000000000001 EXP. 0105
Term2,03/05/06,09:35:23,139,LOC: 05123402 INV: 00001117
Term2,03/05/06,09:35:23,139,ATH:   01231B APPRVD    01231B
Term2,03/05/06,09:35:23,139,CHANGE DUE                    0.00

I would also like to scan all files on our network for potential card numbers.  A program that would look at tthe contents of every file for CC info and list the file name location and the line of text for cell with the card number.  This would allow me to scan other files for "forgotten" files with this information.
Avatar of bbao
Flag of Australia image

it seems that you need an utility supporting Regular Expressions to search text files from a specifc folder.

You may consider choosing a text editor from here:

Free Programmers' Editors, IDEs, ASCII Text Editors

as for the pattern to determine a credit card number, it could be the following one as a CC number commonly composes of 16 digits.


hope it helps,
Avatar of steveoskh


I am looking for more than a text utility that I can program.  This should be a common problem for many companies.  Some has to have done the work of additing intelligence that can know the proper starting numbers and length of string to only remove credit card numbers.
True but they may be proprietary programs they are not allowed to share.
I am willing to pay for a program and will certainly award point for pointing out commercial programs that will do this.  What I don't need is a $5,000 PCI-DSS security compliance package or consulting tha covers more than just removal of old stored numbers.
We are addressing the PCI-DSS security, by not storing credit card numbers in any electronic form.  We don't need to secure what we do not have.  just need to remove numbers from old files.
I understand that.  
American Express cards are 15 digits.  Some gift cards follow the format of the parent service (Visa, MC, Disc, Amex), so you'll have to include those transaction types also.

You may try contacting the provider of your POS systems.  The data storage was _not_ PCI compliant (CC numbers should not be stored without the individual customer's explicit permission).  They should be assisting in your PCI compliance project.

If the tools exist, the POS software suppliers should have them.
> I am looking for more than a text utility that I can program.

by using a Regular Expressions enabled text editor, you don't need to do programming. just search files with the specific CC number patterns, get a list having the suspected numbers, and remove the numbers using the editor's Replace function associated with regular expressions.

for example, EditPlus is a good choice. it supports the functionalities i mentioned above.

hope it helps,
could you please give an example how to detect the lines kredit card numbers, are these those lines with UPC only?
Are there other "kinds" of such numbers like Luhn, IBM, EFT?
Sorry for the delay in responding.
ahoffmann, I am looking for something that can identify the credit card number without having to know any type of leading identifier.  In addition to the samples I provided, which do have identifying code, I want to be able to find CC numbers inside of Word and XLS documents.
My need is for something that already knows this information.

> .. want to be able to find CC numbers inside of Word and XLS documents.
for that I can't give regex suggestions, as I cannot 'script' proprietary formats :-/
waiting for unique lines ...
Avatar of steveoskh

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
A quick an easy way to solve this problem could just be to encrypt the files?  As they're historical is there a real need for any real-time access, or are they just records you need to cover 6 years worth of trading data under some sort of financial regulation?
Am speaking from a QSA perspective.  If we found tlogs in the clear, not compliant, but if the tlogs were encrypted and keys managed securely, then compliant.
Hope this helps.  This tool may help you find the data (but not change it):