Parsing codes out of string

I would like to extract codes out of a string.   There may be 0, 1, or more occurrences of codes.  The code will be in the format of [A-Za-z]+\s*[0-9]+.

Here are some example strings:

1. You must complete Art 101 before enrolling
2. You must complete ART 101 and ART 201 or have consent
3. You must complete ART 101, 201, and MUS 101 before enrolling

So far, my regex is:
/^(.*?)\s*.*?([A-Za-z]+\s*[0-9]+\s*(?:and)?\s)+(.*)$/g

Open in new window


That satisfies #1 above.

I've tried other variations:

/^(.*?)\s*.*?([A-Za-z]+\s*[0-9]+(?:\s*and\s*|\s*,\s*)*\s)+(.*)$/g
^(.*?)\s*.*?([A-Za-z]+\s*[0-9]+\s*(?:and|,)*\s*)+(.*)$

I'm really only getting one code returned, usually the last one.
mock5cAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

dameyCommented:
^.*<Art>[\d\w\s,-]{101,*}?
jmcgOwnerCommented:
In your example 3, how would you want to handle the "201" appearing after "ART 101"?

Do the numerical parts of your codes always consist of 3 digits?

In your examples, the subject part of the code is always 3 alpha characters. Is this true in general or can they be longer?
Shahan AyyubSenior Software EngineerCommented:
Try like this:

([A-Za-z]+\s*\d+(\s*,\s*\d+)*)

Or:

([A-Za-z]+)?\s*((?:\s*,?)\d+)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

mock5cAuthor Commented:
To answer jmcg's questions:

In your example 3, how would you want to handle the "201" appearing after "ART 101"?

In the case of 201 after ART 101, it would be handled as if ART preceded the 201.  I can programmatically handle this but if there's a regex that can insert that, then that would be acceptable.   Another example (example #4) might be:

You must complete ART 101, ART 201, and MUS 101 before enrolling

There is lots of variation but I think the 4 examples cover almost all of the cases.


Do the numerical parts of your codes always consist of 3 digits?

Not necessarily.

In your examples, the subject part of the code is always 3 alpha characters. Is this true in general or can they be longer?

It could be variable but generally 3 or 4 alpha characters.

Bottom line is that I need to extract these codes and discard the rest of the string.  I can handle codes programmatically after they've been extracted in order to prepend alpha characters, if necessary.
Shahan AyyubSenior Software EngineerCommented:
@mock5c

Did you try my suggestion ?
mock5cAuthor Commented:
Shahan,

I have tried your suggestion.  I see that it is highlighting the codes when I'm using regexr.com.  I'm trying to determine how to apply your suggestion to the complete example.  regexr.com might be a bad site.
jmcgOwnerCommented:
Something like the following expression might work for you:

/((?:\w+)?\s+\d+)/g

Open in new window


I tested this with Perl against your sample data
 perl -ne '@matches = m/((?:\w+)?\s+\d+)/g; printf "%s\n", join "|", @matches' <testinput.txt

Open in new window

and got results like:

Art 101
ART 101|ART 201
ART 101| 201|MUS 101

Open in new window


Now, you may be using a slightly different regexp engine, so the (?: notation for grouping-but-not-capturing parentheses might not work for you, or it may need to be expressed differently.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.