Solved

proper RegEx to parse these strings

Posted on 2013-11-10
6
370 Views
Last Modified: 2013-11-11
Hi,

I have a file with lines like this:

CBOEA11/08/13iShares MSCI Brazil Capped ETF1EWZ  Jan15201600047000  P 11/08/13
CBOEA11/08/13Cincinnati Financial Corp (EUR2CINF Jan14201400050000C   11/08/13
CBOEA11/08/13HONEYWELL INTL INC (NEW)      HON   Jan15201600077050C P 11/11/13
CBOEA11/08/13iShares iBoxx $ High Yield CorHYG   Jan15201600084000C P 11/11/13
CBOEA11/08/13Oil States International, Inc.OIS   Jan15201600100000C P 11/11/13

Open in new window

(note the even spacing).

I'm trying to extract several a couple of pieces of information from each line.
Stock Ticker (EWZ, CINF, HON, HYG, OIS)
Date (Jan152016, Jan142014 etc.)

I can do this in one or more steps, it doesn't matter.  My regex knowledge is limited, but I started off with this:

(Feb|Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|Cap){1}[0-9]*

I did r = regex.search(string), and r.groups() returns (u'Jan',)

Shouldn't this regex match the entire  number after "Jan"?  Of course with this date I need to match the 3 letter month and then grab the next 6 digits.

The ticker looks to be rather hard.  Ideas on that too?
Thanks!
0
Comment
Question by:ugeb
6 Comments
 
LVL 35

Accepted Solution

by:
Terry Woods earned 500 total points
ID: 39637872
Perhaps try:
([A-Z]{3,4})\s*((?:Feb|Jan|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|Cap){1}[0-9]{6})
I tested it here, and it seems to work ok.

I put a ?: into the brackets that determine the options for month, which makes them non-capturing, and added some extra brackets that capture the 6 digits following the month.

I've assumed the ticker is always caps, and is the last 3 or 4 characters before the date.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
ID: 39637913
The groups method only lists out the captures of any capture groups that you specified in your pattern. However, when you do not explicitly specify which group you are interested in, it defaults to starting with group 1. Using the group method instead, it will default to the zeroth group, which is the entire match, regardless of how many capture groups there are.

e.g.

r.group()

Open in new window

0
 
LVL 9

Expert Comment

by:Derek Jensen
ID: 39639838
Okay. Let's make the regex a tad simpler.

First, start out with what you're looking for [first]:

(ewz|hon|hyg|ois|cinf)

followed by what doesn't matter:

\s*

and by this point we know what's coming next, so just grab it:

(\w{3}\d{6})

Slap our case-insensitive flag on there, and we have a resulting regex of:

(?i)(ewz|hon|hyg|ois|cinf)\s*(\w{3}\d{6})
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 35

Expert Comment

by:Terry Woods
ID: 39639849
@bigdogdman, you're assuming those values are the only possible ticker codes, which is unlikely to be the case, given the nature of stock ticker data.
0
 
LVL 11

Author Closing Comment

by:ugeb
ID: 39639854
That works perfectly, thank you!
0
 
LVL 9

Expert Comment

by:Derek Jensen
ID: 39639865
@Terry,

Ah, indeed you are right. I must've misread the question. :-)
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
regex. How to include lines between 10 70
Effective way to iterate over a python list 8 90
.net string parse 18 43
Coldfusion RegEx 8 49
Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now