proper RegEx to parse these strings


I have a file with lines like this:

CBOEA11/08/13iShares MSCI Brazil Capped ETF1EWZ  Jan15201600047000  P 11/08/13
CBOEA11/08/13Cincinnati Financial Corp (EUR2CINF Jan14201400050000C   11/08/13
CBOEA11/08/13HONEYWELL INTL INC (NEW)      HON   Jan15201600077050C P 11/11/13
CBOEA11/08/13iShares iBoxx $ High Yield CorHYG   Jan15201600084000C P 11/11/13
CBOEA11/08/13Oil States International, Inc.OIS   Jan15201600100000C P 11/11/13

Open in new window

(note the even spacing).

I'm trying to extract several a couple of pieces of information from each line.
Stock Ticker (EWZ, CINF, HON, HYG, OIS)
Date (Jan152016, Jan142014 etc.)

I can do this in one or more steps, it doesn't matter.  My regex knowledge is limited, but I started off with this:


I did r =, and r.groups() returns (u'Jan',)

Shouldn't this regex match the entire  number after "Jan"?  Of course with this date I need to match the 3 letter month and then grab the next 6 digits.

The ticker looks to be rather hard.  Ideas on that too?
LVL 11
Who is Participating?
Terry WoodsConnect With a Mentor IT GuruCommented:
Perhaps try:
I tested it here, and it seems to work ok.

I put a ?: into the brackets that determine the options for month, which makes them non-capturing, and added some extra brackets that capture the 6 digits following the month.

I've assumed the ticker is always caps, and is the last 3 or 4 characters before the date.
käµfm³d 👽Commented:
The groups method only lists out the captures of any capture groups that you specified in your pattern. However, when you do not explicitly specify which group you are interested in, it defaults to starting with group 1. Using the group method instead, it will default to the zeroth group, which is the entire match, regardless of how many capture groups there are.


Open in new window

Derek JensenCommented:
Okay. Let's make the regex a tad simpler.

First, start out with what you're looking for [first]:


followed by what doesn't matter:


and by this point we know what's coming next, so just grab it:


Slap our case-insensitive flag on there, and we have a resulting regex of:

Build your data science skills into a career

Are you ready to take your data science career to the next step, or break into data science? With Springboard’s Data Science Career Track, you’ll master data science topics, have personalized career guidance, weekly calls with a data science expert, and a job guarantee.

Terry WoodsIT GuruCommented:
@bigdogdman, you're assuming those values are the only possible ticker codes, which is unlikely to be the case, given the nature of stock ticker data.
ugebAuthor Commented:
That works perfectly, thank you!
Derek JensenCommented:

Ah, indeed you are right. I must've misread the question. :-)
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.