Slimshaneey
asked on
help amend this regex to only include part numbers that have both letters and numbers
I have the following regex to parse a string looking for likely candidates for part numbers/serial numbers.
$regexForModelNo = "@\b((?=[A-Za-z/ -]{0,13}\d)[A-Za-z0-9/ -]{3,14})\b@";
It works fine but the odd time it spits out some random matches along with good matches, for example it sometimes includes parts of words that arent really any likely match to part numbers. How would I modify the above to check that all matched parts must contain at least one number in there?
It would help to know the input values, and the values that don't match correctly...
ASKER
An example would be "Stringstringstring part rb212x stringstringstring" would return "part rb212x", but I want to ignore the "part bit as it contains no numbers.
Thank you. I just realized that it would help to know where you are using this regular expression (PHP, .NET, Perl, ...)?
ASKER
Its PHP, sorry, should have mentioned in opening post
Can you provide the rules for a valid serial number? You've got spaces in your regex, but I would think that a serial number wouldn't contain spaces.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If you can define what is being looked for, it is a lot easier to create a regex (or any function) to find it. TheLearnedOne asked about positive and negative test cases. kaufmed asked for the definition of what you are looking for. Either of these would help in coming up with the proper regex (or function, if you want).
Based on what has been written so far, it sounds like a serial number is defined as a sequence of characters with the following attributes:
1. 3-14 characters in length;
2. Contains a combination of letters, numbers, dashes, slashes (/), and spaces;
3. Is not touching other words (i.e., contained between word boundaries);
4. Contains at least one number; and
5. (I assume) Contains at least one number prior to any spaces.
Adding more description to the definition (or correcting errors in it) will allow for more refining of the regex.
Based on what has been written so far, it sounds like a serial number is defined as a sequence of characters with the following attributes:
1. 3-14 characters in length;
2. Contains a combination of letters, numbers, dashes, slashes (/), and spaces;
3. Is not touching other words (i.e., contained between word boundaries);
4. Contains at least one number; and
5. (I assume) Contains at least one number prior to any spaces.
Adding more description to the definition (or correcting errors in it) will allow for more refining of the regex.
ASKER
Thanks guys, I looked at many options on this, it ended up being a much more simple regex query (Ive included it below, it basically finds groups of clustered characters that contains at least a number). You advice and insight set me on a very productive learning curve though. Sorry for getting back to this so late. Finally getting round to some cleanup.
[A-Za-z0-9-.]{4,}(?<=\d)
[A-Za-z0-9-.]{4,}(?<=\d)