Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Regex pattern for SSN (OCR'd)

Posted on 2008-10-28
10
Medium Priority
?
485 Views
Last Modified: 2013-11-26
Hi,

I'm working on a small application where I've to read TIF files, OCR them and look for SSNs in the OCR'd text. I use the following regular expression, but the issue is that depending on the quality of the TIF file, the SSNs get OCR'd in few different patterns. I would like to know if there is any single pattern that can help me identify all such patterns in one pass. I greatly appreciate your help.

Pattern that I use - \b\S{3}\-\S{2}\-\S{4}\b

Patterns I found so far
1. 99-99(space)-9999
2. 999-99-9999 (straight forward)
3. x99-99-9999
4. xx9-99-x9999
5. x9x-x9(space)-9999
where x could be any alphabet or special character. If the OCR software cannot identify the exact character, it injects some character in that position.

Thank you for looking into my problem.

Mohan
0
Comment
Question by:mohan_sekar
  • 5
  • 3
  • 2
10 Comments
 
LVL 82

Expert Comment

by:hielo
ID: 22822657
>>If the OCR software cannot identify the exact character, it injects some character in that position.
And what do you want to do in said situation? Keep the non-numeric character or not?
Try:
 \b\d+\-\d+\s*\-\d{4}\b
0
 
LVL 15

Author Comment

by:mohan_sekar
ID: 22823040
Thanks Hielo, but it didn't work.
Yes, I want to keep the non-numeric character as is.
For example, iii-99-9999 or i99-99-9999 will not match.
0
 
LVL 18

Accepted Solution

by:
Pawel Witkowski earned 1500 total points
ID: 22823080
try this one:

\b.{2,3}\-.{2,3}\-.{4,5}\b

0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 15

Author Comment

by:mohan_sekar
ID: 22823434
Thanks, Wilg32. Your expression covers most of my cases, but I have issues with the following ones

1. 999-99x9999 (instead of hyphen I get characters like ~ or i)
2. 999x-99-9999 (I get an extra character before the hyphen here)
0
 
LVL 82

Expert Comment

by:hielo
ID: 22823642
try:
\b.{11,12}\b
0
 
LVL 15

Author Comment

by:mohan_sekar
ID: 22823736
Heilo,
Your expression is too generic. It might match with any 11 or 12 character strings and not just SSNs. Example phone numbers.
0
 
LVL 82

Expert Comment

by:hielo
ID: 22823929
but what you are describing is also a "Generic" pattern. You don't always get a hyphen, but you also don't know what you are going to get instead of the hyphen. IF you are always getting a "~" and an "i" as alternatives, then try:
\b.{2,3}[~i\-].{2,3}[~u\-].{4,5}\b
0
 
LVL 15

Author Comment

by:mohan_sekar
ID: 22824264
I've slightly modified Wilg32's expression to suit my requirements. Thanks, Wilg32.
.{2,3}(\-|.(?=\d)).{2,3}(\-|.(?=\d)).{4,5}
Thanks for your help, Hielo.
0
 
LVL 15

Author Closing Comment

by:mohan_sekar
ID: 31510766
Thanks, Wilg32
0
 
LVL 18

Expert Comment

by:Pawel Witkowski
ID: 22826692
I just glad that I can help, sorry that I couldnt help further but I was playing volleyball ^^
0

Featured Post

Become an Android App Developer

Ready to kick start your career in 2018? Learn how to build an Android app in January’s Course of the Month and open the door to new opportunities.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
This article discusses how to create an extensible mechanism for linked drop downs.
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
Viewers will learn about the different types of variables in Java and how to declare them. Decide the type of variable desired: Put the keyword corresponding to the type of variable in front of the variable name: Use the equal sign to assign a v…
Suggested Courses
Course of the Month14 days, 13 hours left to enroll

577 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question