?
Solved

RegEx/Postini CC & SSN Filtering

Posted on 2011-09-29
16
Medium Priority
?
947 Views
Last Modified: 2012-05-12
Hello again. I am again having problems with Postini catching false positives, or not catching CC numbers.

Using Postini's built in filters is not robust enough for us to meet PCI requirements, so I've been tasked with creating custom RegEx's that will block emails. The problem, though, is that Postini's RegEx engine doesn't seem to conform to most standards.

Through multiple variations, I've finally came up with the following RegEx for Visa, MasterCard and JCB credit card filtering:

(^|\s|:)(3|4|5)\d{3}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}($|\s|\.)

It basically states that the number should be at the beginning of a line, or preceded by a space, or preceded by a colon, and contain a set of four numbers starting with a 3,4, or 5, and 3 blocks of four digits, separated by a variety of separators. I also have a similar RegEx looking for a string of 16 digits starting with a 3,4, or 5, and various other similar RegEx's for other cards, or SSN's.

For the most part, it seems to work fine, however there are two problems:

1. It catches false positives in the form of "5000" followed by NO other blocks of digits.

So, for example and email like this:

"Hello, here are the short codes I'd like to order:

5123
5327
7347
2236
3456

Thanks"

It will flag it.

2. If a CC number is sent in this format: "5XXX XXXXXXXXXXXX" it will get by. Presumably, any variation on that will get by as well.

Any help would be appreciated.

0
Comment
Question by:bradyhummel
  • 8
  • 4
  • 4
16 Comments
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36815622
According to this site, ContentManager, which I assume is related to Postini since the page is hosted at Postini.com, supports PCRE. PCRE is a very rich set of regular expressions. Please forgive me if I am incorrect about ContentManager. I haven't used Postini before.

As to your issue, your pattern seems a bit difficult to read, and subsequently a bit difficult to debug. What if we try cleaning it up a tad?

^[\s:]?[345]\d{3}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}[\s.]?$

Open in new window


Also, I believe why you would get a false positive in your example is that you are using "\s" to refer to spaces, but "\s" refers to any whitespace, including newlines. Since you have at least 4 groups of digits, the pattern still matches. The pattern I proposed has accounted for this, but you can just as easily modify yours to exchange the "\s" occurrences with literal space characters.
0
 

Author Comment

by:bradyhummel
ID: 36815933
Thanks for the response. I've tested this against the email that was brought to my attention regarding false positives, however when I test the RegEx in Postini (Content Manager is correct), against this sample email (this is a fake CC number):

"Hey guys, thanks for the placing the order.

Here's the credit card info: 4555 6631 3445 2304.

Thanks"

It doesn't catch what is blatantly a CC number.
0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 500 total points
ID: 36815979
I'm wondering if you really need the start-of-line ( ^ ) and end-of-line anchors ( $ ). What happens if you do:

You
(3|4|5)\d{3}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}

Open in new window



Me
[345]\d{3}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}

Open in new window

0
Automating Your MSP Business

The road to profitability.
Delivering superior services is key to ensuring customer satisfaction and the consequent long-term relationships that enable MSPs to lock in predictable, recurring revenue. What's the best way to deliver superior service? One word: automation.

 

Author Comment

by:bradyhummel
ID: 36816281
Thanks, I think that that may have done it. A few more questions, though.

1. My current RegEx for 16 straight digits for Visa, Mastercard, JCB is (^|\s|:)(3|4|5)\d{15}($|\s|\.), but following your example, it should be [345]\d{15}, is that correct?

2. Is there any way to prevent these combinations (X's being numbers) using one RegEx?

4XXX XXXXXXXXXXXX
4XXXXXXX XXXXXXXX
4XXXXXXXXXXX XXXX

0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36816774
You could probably do something like this:

[345](?:\D*\d){15}

Open in new window


which equates to:

[345]      - 3, 4, or 5
(?: ... )  - non-capturing group
\D*        - zero-or-more ( * ) of any character NOT a digit ( \D )
\d         - single digit
{15}       - 15 of the thing to the left, which in this case is the entire group

Open in new window


This should prevent even occurrences such as:

There were 4 books on 3 bookshelves. Each book had 88 pages...

Essentialy, the numbers split by any number of arbitrary characters. The only issue you could run into is actually having valid text with numeric values in it that are not actually credit card numbers. I would expect such an occurrence to be rare, though.
0
 

Author Comment

by:bradyhummel
ID: 36818280
Alright... I think I'm making headway. Your second response seems to be the trick for the most part. If i gave you a few more RegEx's that I've created, would you mind reviewing them as you had the given example? I have a seperate RegEx for Discover Card, another for AmEx and Diners Club, and another for SSNs. They all follow the same format, but I'm very much a novice, you seem to be very much an expert in this. Would that be OK?
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36818661
Certainly, no problem. Even if I'm not the one to look at them, there are several experts who participate on the regex boards who are extremely knowledgeable in the area  = )
0
 

Author Comment

by:bradyhummel
ID: 36891848
Thanks! Here's what I have in addition to the one you already assisted me with:

DiscoverCard
(^|\s|:)(6011)(-|_|,|;|:|'|\.|\s|\\|/)\d{4}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}($|\s|\.)  OR
(^|\s|:)(6011)\d{12}($|\s|\.)

AmExSinersClub
(^|\s|:)(3\d{3})(-|_|,|;|:|'|\.|\s|\\|/)\d{6}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}($|\s|\.) OR
(^|\s|:)(3\d{13})($|\s|\.)

SSNFilter
(^|\s|:)([0-9]{3})(-|_|,|;|:|'|\.|\s|\\|/)([0-9]{2})(-|_|,|;|:|'|\.|\s|\\|/)([0-9]{4})($|\s|\.)


0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36900670
DiscoverCard
(^|\s|:)(6011)(-|_|,|;|:|'|\.|\s|\\|/)\d{4}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}($|\s|\.)  OR

Can be more concisely written as:
(^|[\s:])(6011)([-_,;:'.\s\\/])\d{4}([-_,;:'.\s\\/])\d{4}([-_,;:'.\s\\/])\d{4}($|[\s.])
or maybe even:
(^|[\s:])(6011)(([-_,;:'.\s\\/])\d{4}){3}($|[\s.])
though in the 2nd case, the capturing groups will be different (and may cause problems if you're making use of them)

(^|\s|:)(6011)\d{12}($|\s|\.)

Can be more neatly (debatable maybe!) written as:
(^|[\s:])(6011)\d{12}($|[\s.])

You could potentially combine the 2 different expressions to:
(^|[\s:])(6011)(([-_,;:'.\s\\/])?\d{4}){3}($|[\s.])
though it would also accept values missing some (but not all) of the delimiters like
6011-12345678.1234
and
601112345678:1234

AmExSinersClub
(^|\s|:)(3\d{3})(-|_|,|;|:|'|\.|\s|\\|/)\d{6}(-|_|,|;|:|'|\.|\s|\\|/)\d{4}($|\s|\.) OR

Can be more concisely written as:
(^|[\s:])(3\d{3})([-_,;:'.\s\\/])\d{6}([-_,;:'.\s\\/])\d{4}($|[\s.])

(^|\s|:)(3\d{13})($|\s|\.)

Can be more neatly (debatable maybe!) written as:
(^|[\s:])(3\d{13})($|[\s.])

SSNFilter
(^|\s|:)([0-9]{3})(-|_|,|;|:|'|\.|\s|\\|/)([0-9]{2})(-|_|,|;|:|'|\.|\s|\\|/)([0-9]{4})($|\s|\.)

Can be more concisely written as:
(^|[\s:])([0-9]{3})([-_,;:'.\s\\/])([0-9]{2})([-_,;:'.\s\\/])([0-9]{4})($|[\s.])
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36900686
Note also that by allowing : and . as characters for indicating the start and end of a value, you'll get a match on values like this:
6011.1234.1234.1234.5678
1234:6011:1234:1234:1234.5678
1234:6011:1234:1234:1234
the number matched in all above cases for the DiscoverCard pattern would be
6011 1234 1234 1234

And bear in mind that the - character within [] brackets is a special character unless it's the first or last one listed, so don't change:
[-_,;:'.\s\\/]
to
[=-_,;:'.\s\\/]
as it would match any character between = and _

Instead, you'd change it to:
[=\-_,;:'.\s\\/]
or
[-=_,;:'.\s\\/]
0
 

Author Comment

by:bradyhummel
ID: 36913310
Thanks everyone, for your help. It seems with kaufmed's example:

[345]\d{3}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}

Works, but it seems it catches the following:

5111 1111 1111 11111

It doesn't seem to matter where the extra digit is, it will catch it, so 51111 1111 1111 1111 flags as a CC too.

Any suggestions?
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36913352
Are you trying to validation these in a single field value, or capture them from text?

For validation, you'd generally use start-of-line and end-of-line placeholders as part of the validation like this:
^[345]\d{3}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}$

Capturing them from text, you need to consider what you'll allow at either end of the pattern, eg using a negative lookbehind and negative lookahead:
(?<!\d)[345]\d{3}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}(?!\d)
disallows:
5111 1111 1111 11111
but will match the 5111 1111 1111 1111 from:
5111 1111 1111 1111 1
0
 

Author Comment

by:bradyhummel
ID: 36913371
To answer your question, we are trying to prevent customers from using credit card numbers (and socials) in emails to us due to PCI requirements.

We're using Postini's Content Manager to scan incoming email.

The code you suggested (second line, since we need to scan the text of a document) I get this error when I check the syntax in Postini: "Assertions starting with (? are not supported in (?<!\d), (?!\d)"
0
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 36913388
Some regular expression engines don't support lookahead and lookbehind - sounds like that's one of them.

This might do it - it requires the CC number to be (pre/suf)fixed with a non-digit or the start/end of line (ie nothing).
(^|\D)[345]\d{3}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}[-_,;:'. \\/]?\d{4}(\D|$)
0
 

Author Comment

by:bradyhummel
ID: 36913499
That may have done it. Let me do some testing with this and I'll let you know.
0
 

Author Closing Comment

by:bradyhummel
ID: 36930895
Thanks for your help, guys. It seems to be working now. Appreciate the input. I'm clearly a novice when it comes to writing these.
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Sometimes Administrators rights are not enough. These cases call for the SYSTEM account. The process in this article outlines the steps required to execute commands using the SYSTEM account.
Last month Marc Laliberte, WatchGuard’s Senior Threat Analyst, contributed reviewed the three major email authentication anti-phishing technology standards: SPF, DKIM, and DMARC. Learn more in part 2 of the series originally posted in Cyber Defense …
We’ve all felt that sense of false security before—locking down external access to a database or component and feeling like we’ve done all we need to do to secure company data. But that feeling is fleeting. Attacks these days can happen in many w…
When cloud platforms entered the scene, users and companies jumped on board to take advantage of the many benefits, like the ability to work and connect with company information from various locations. What many didn't foresee was the increased risk…

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question