Solved

help amend this regex to only include part numbers that have both letters and numbers

Posted on 2012-03-20
10
329 Views
Last Modified: 2012-07-27
I have the following regex to parse a string looking for likely candidates for part numbers/serial numbers.
$regexForModelNo = "@\b((?=[A-Za-z/ -]{0,13}\d)[A-Za-z0-9/ -]{3,14})\b@";

Open in new window

It works fine but the odd time it spits out some random matches along with good matches, for example it sometimes includes parts of words that arent really any likely match to part numbers. How would I modify the above to check that all matched parts must contain at least one number in there?
0
Comment
Question by:Slimshaneey
  • 3
  • 2
  • 2
  • +2
10 Comments
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
It would help to know the input values, and the values that don't match correctly...
0
 
LVL 11

Author Comment

by:Slimshaneey
Comment Utility
An example would be "Stringstringstring  part rb212x stringstringstring" would return "part rb212x", but I want to ignore the "part bit as it contains no numbers.
0
 
LVL 96

Expert Comment

by:Bob Learned
Comment Utility
Thank you.  I just realized that it would help to know where you are using this regular expression (PHP, .NET, Perl, ...)?
0
 
LVL 11

Author Comment

by:Slimshaneey
Comment Utility
Its PHP, sorry,  should have mentioned in opening post
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
Can you provide the rules for a valid serial number? You've got spaces in your regex, but I would think that a serial number wouldn't contain spaces.
0
 
LVL 13

Assisted Solution

by:Carl Bohman
Carl Bohman earned 250 total points
Comment Utility
Based on what you've said so far, perhaps just removing the space from the look-ahead assertion would give you the results you're looking for:
$regexForModelNo = "@\b((?=[A-Za-z/-]{0,13}\d)[A-Za-z0-9/ -]{3,14})\b@";

Open in new window

With that change, the first part of the serial number must contain a number, but can also contain leters or a dash.
0
 
LVL 108

Accepted Solution

by:
Ray Paseur earned 250 total points
Comment Utility
You might be asking too much of a single regular expression.  A function like this may be easier to write and debug.
http://www.laprbass.com/RAY_temp_slimshaneey.php

You might also enjoy reading about test-driven development.  The article looks like it is about building regular expressions, but it is really about how to think about programming problems in a practical and structured way.
http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/PHP/A_7830-A-Quick-Tour-of-Test-Driven-Development.html
<?php // RAY_temp_slimshaneey.php
error_reporting(E_ALL);
echo "<pre>";

// THE TEST DATA
$txt = "Stringstringstring  part rb212x stringstringstring or A212x";

// A FUNCTION TO FIND SERIAL NUMBERS
function find_sn($str)
{
    // A REGULAR EXPRESSION TO ISOLATE SUBSTRINGS
    $rgx
    = '#'             // REGEX DELIMITER
    . '\b'            // WORD BOUNDARY
    . '('             // GROUP
    . '.*?'           // OF ANYTHING
    . ')'             // END GROUP
    . '\b'            // WORD BOUNDARY
    . '#'             // END REGEX
    ;

    // A REGULAR EXPRESSION TO MATCH LETTERS
    $ltr = '#[A-Z]#i';

    // A REGULAR EXPRESSION TO MATCH NUMBERS
    $num = '#[0-9]#';

    if (!preg_match_all($rgx, $str, $mat)) return FALSE;

    $out = array();

    // LOOK FOR BOTH LETTERS AND NUMBERS IN EACH OF THE SUBSTRINGS
    foreach ($mat[0] as $txt)
    {
        if (preg_match($ltr, $txt))
        {
            if (preg_match($num, $txt))
            {
                $out[] = $txt;
            }
        }
    }

    // ANY FINDINGS?
    if (empty($out)) return FALSE;
    return $out;
}

// TEST THE FUNCTION
print_r( find_sn($txt) );

Open in new window

0
 
LVL 13

Expert Comment

by:Carl Bohman
Comment Utility
If you can define what is being looked for, it is a lot easier to create a regex (or any function) to find it.  TheLearnedOne asked about positive and negative test cases.  kaufmed asked for the definition of what you are looking for.  Either of these would help in coming up with the proper regex (or function, if you want).

Based on what has been written so far, it sounds like a serial number is defined as a sequence of characters with the following attributes:
1. 3-14 characters in length;
2. Contains a combination of letters, numbers, dashes, slashes (/), and spaces;
3. Is not touching other words (i.e., contained between word boundaries);
4. Contains at least one number; and
5. (I assume) Contains at least one number prior to any spaces.

Adding more description to the definition (or correcting errors in it) will allow for more refining of the regex.
0
 
LVL 11

Author Closing Comment

by:Slimshaneey
Comment Utility
Thanks guys, I looked at many options on this, it ended up being a much more simple regex query (Ive included it below, it basically finds groups of clustered characters that contains at least a number). You advice and insight set me on a very productive learning curve though. Sorry for getting back to this so late. Finally getting round to some cleanup.

[A-Za-z0-9-.]{4,}(?<=\d)
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Introduction HTML checkboxes provide the perfect way for a web developer to receive client input when the client's options might be none, one or many.  But the PHP code for processing the checkboxes can be confusing at first.  What if a checkbox is…
Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now