We help IT Professionals succeed at work.

regular expressions in Python

FrancisSong
FrancisSong asked
on
455 Views
Last Modified: 2012-06-22

I would like to write a regular expressions to extract the blood string:

UUID=ee27bf12-d43e     /      /level/city           temp              def           1            1


D3444=ee27bf12-d43e      max       /sudol/city           glop              def           0            1

Comment
Watch Question

CERTIFIED EXPERT

Commented:
Let's start with the following code.  It prints the following result:

C:\tmp\___python\FrancisSong\Q_26311335>a.py
['UUID', 'ee27bf12-d43e', '/', '/level/city', 'temp', 'def', '1', '1']
['D3444', 'ee27bf12-d43e', 'max', '/sudol/city', 'glop', 'def', '0', '1']

You may want to process it further.

The parse() functions may be overkill.  On the other hand, you separate the parsing code and you can enhance/replace it later.

import re

def parse(s, rex=re.compile(r'([^= \t]+)')):
    return rex.findall(s)
    
print parse('UUID=ee27bf12-d43e      /       /level/city           temp              def           1            1')
print parse('D3444=ee27bf12-d43e      max       /sudol/city           glop              def           0            1')

Open in new window

CERTIFIED EXPERT

Commented:
Hi Francis,

I'm not sure, what you mean with blood string?


if the bllod string were 'max' ,
then I would NOT use a regular expression, but just

def get_blood_string(s):
    return s.split()[1]
    # or even probably slightly faster
    return s.split(None,3)[1]


if you just want to split the words, then I'd use split.

In my opinion regulare expressions are great for complex cases.
For trivial cases they tend to obfuscate your code.

Whether you have any difference in speed of execution depends on your test patterns/

If you wanted to extract the value of the first assignment in a line, then you could do this with two split statements,

uuild = s.split()[0].split('=')[1]

On the other hand:
If you don't know on which position of the string you find an assignment to a uuid, then you could
probably do somethng like:

def get_uuid(s, rex=re.compile(r'UUID=([^ \t]+)')):
    uuids = rex.findall(s)
    if len(uuids) == 0:
        return None
    else:
        return uuids[0]


Author

Commented:
UUID=ee27bf12-d43e                 /              /level/city                     temp       def               1                1
D3444=ee27bf12-d43e            max           /sudol/city                    glop                     def               0                1

I mean extracting the bold words in each line. I have tried using split() function but it did not work.

if you can notice that  there are spaces between each filed.

Author

Commented:
UUID=ee27bf12-d43e                             /                             /level/city                              temp                def                      1                      1
D3444=ee27bf12-d43e                     max                 /sudol/city                               glop                     def        0        1

I mean extracting the bold words in each line. I have tried using split() function but it did not work.

if you can notice that  there are spaces between each filed.
CERTIFIED EXPERT

Commented:
Then try the following modification:
import re

def parse(s, rex=re.compile(r'([^= \t]+)')):
    lst = rex.findall(s)
    return lst[2]
    
print parse('UUID=ee27bf12-d43e      /       /level/city           temp              def           1            1')
print parse('D3444=ee27bf12-d43e      max       /sudol/city           glop              def           0            1')

Open in new window

CERTIFIED EXPERT

Commented:
Or you can use the approach using the .split() method of the string type (gelonida is right when suggesting split over the regular expression -- in this case).  Anyway, notice that the function allows you to change its body to prevent the interface.  See the code which is called the same way as the parse() that uses the regular expression.  Only, the function was give a more appropriate name.
def getBloodString(s):
    k, v = s.split('=', 1)  # split to key/value by the first '=' (i.e. once only)
    lst = v.split()         # split the value by the whitespace sequences
    return lst[1]           # return the second element of the value (zero based)
    
    
print getBloodString('UUID=ee27bf12-d43e      /       /level/city           temp              def           1            1')
print getBloodString('D3444=ee27bf12-d43e      max       /sudol/city           glop              def           0            1')

Open in new window

CERTIFIED EXPERT

Commented:
Correction: "notice that the function allows you to change its body to prevent the interface."
should be "notice that the function allows you to change its body to PRESERVE the interface."
CERTIFIED EXPERT

Commented:
Francis,

I'm not sure, that I understand, what's wrong (except, that it is no regular expression of course)
with my first suggestion to use split()[1]

If you pass nothing or None as first explicit argument to split, then it will split by any witespace. Multiple blank characters will be treated as one separator.

This is different from split(' '),

I copied and pasted your example lines and it seems to work.
if you mean, that you might have blank LINES in between, the lines to be parsed, than you might vary my code in the following way.

for l in lines:
    first_three_words = l.split(None,3)
    if len(first_three_words) < 2:
        continue
    bloodstring=first_three_words[1]
    print "<%s>" % bloodstring

lines = [
"UUID=ee27bf12-d43e      /       /level/city           temp              def           1            1",

"D3444=ee27bf12-d43e      max       /sudol/city           glop              def           0            1",

]

for l in lines:
    bloodstring = l.split()[1]
    print "<%s>" % bloodstring

Open in new window

CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.