Link to home
Start Free TrialLog in
Avatar of Pau Lo
Pau Lo

asked on

regex assistance

I need to create some regex criteria to assist with a custom content filter in some eDiscovery software, to filter a 100 GB directory of data for any documents which contain a certain pattern of text to meet the criteria 9 characters long, first two characters any alpha (a-z), characters 3-8 any numeric characters, the final character any alpha (a-z). The alpha characters are likely to be uppercase but not always. Any sort of starting point will help as very new to regex
ASKER CERTIFIED SOLUTION
Avatar of oBdA
oBdA

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Any sort of starting point will help as very new to regex
You can use online services such as RegEx101 to have a live playpen to experiment with RegEx patterns

For example, the above
https://regex101.com/r/nFCgE6/1
You shouldn't use the ^ and $ unless the entire document matches your pattern
[A-Za-z]{2}[0-9]{6}[A-Za-z]

Open in new window

Avatar of oBdA
oBdA

Somehow I had the impression you were filtering the document names, but if you're going for content (and there's more than just the pattern you're searching for), then aikimark is correct, the pattern shouldn't be anchored like that.
But since you're looking for a string with very specific properties, you should probably anchor it with "\b" (word boundary), otherwise the pattern will match strings starting with more than two and/or ending with more than 1 character of the set as well.
\b[a-zA-Z]{2}[0-9]{6}[a-zA-Z]\b

Open in new window