Python regular expression

I need to convert the following regex in Perl to Python

input:
my $string = "      <span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</i> <span class="authorship">D.L.Jones</span></span>"

I want to extract the value of genus (in this case, "Corysanthes") into a variable.
In Perl, I would write something like:

           $string =~ /class=\"genus\">([^<]+)<\/i>/;  
            my $genus = $1;


How do you write this in Python?
Thanks:


Python
cpeters5Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

käµfm³d 👽Commented:
One approach:

import re

regex = re.compile('class=\"genus\">([^<]+)</i>')
string = '      <span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</i> <span class="authorship">D.L.Jones</span></span>'
match = regex.search(string)
genus = match.group(1)

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
peprCommented:
It may be better to use a XML parser that a part of Python distribution.
0
cpeters5Author Commented:
Thanks pepr.  I will take a look.  (Still very green, havn't gotten to the XML section yet.)    The files I am parsing are just HTML, they are not well formed.  Would this be a problem?
0
käµfm³d 👽Commented:
Take a look at BeautifulSoup. It deals with sloppy HTML well.
0
peprCommented:
+1 for BeautifulSoup. Anyway, if you separate good HTML fragment, you can use the standard xml.etree.ElementTree:
#!python3

import xml.etree.ElementTree as ET

s = '      <span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</i> <span class="authorship">D.L.Jones</span></span>'

element = ET.fromstring(s)
ET.dump(element)

print('------------')

# To find the specific <i > element wherever it is.
genus = element.find('.//i[@class="genus"]')
print(genus.text)

# Similarly for the species.
species = element.find('./i[@class="species"]')
print(species.text)

print('------------')

# Looping through the structure. The `.attrib` is a dictionary
# of the element attributes; the `element` behaves as the list 
# of children
print(element.tag, element.attrib)
for e in element:
    print(e.tag, e.attrib['class'], e.text)

Open in new window

It prints on console:
c:\__Python\cpeters5\Q_28532295>a.py
<span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</
i> <span class="authorship">D.L.Jones</span></span>
------------
Corysanthes
grumula
------------
span {'class': 'name'}
i genus Corysanthes
i species grumula
span authorship D.L.Jones

Open in new window

0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Python

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.