Pau Lo
asked on
regex assistance
I need to create some regex criteria to assist with a custom content filter in some eDiscovery software, to filter a 100 GB directory of data for any documents which contain a certain pattern of text to meet the criteria 9 characters long, first two characters any alpha (a-z), characters 3-8 any numeric characters, the final character any alpha (a-z). The alpha characters are likely to be uppercase but not always. Any sort of starting point will help as very new to regex
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You shouldn't use the ^ and $ unless the entire document matches your pattern
[A-Za-z]{2}[0-9]{6}[A-Za-z]
Somehow I had the impression you were filtering the document names, but if you're going for content (and there's more than just the pattern you're searching for), then aikimark is correct, the pattern shouldn't be anchored like that.
But since you're looking for a string with very specific properties, you should probably anchor it with "\b" (word boundary), otherwise the pattern will match strings starting with more than two and/or ending with more than 1 character of the set as well.
But since you're looking for a string with very specific properties, you should probably anchor it with "\b" (word boundary), otherwise the pattern will match strings starting with more than two and/or ending with more than 1 character of the set as well.
\b[a-zA-Z]{2}[0-9]{6}[a-zA-Z]\b
For example, the above
https://regex101.com/r/nFCgE6/1