Solved

Python regular expression

Posted on 2014-10-06
5
227 Views
Last Modified: 2014-10-06
I need to convert the following regex in Perl to Python

input:
my $string = "      <span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</i> <span class="authorship">D.L.Jones</span></span>"

I want to extract the value of genus (in this case, "Corysanthes") into a variable.
In Perl, I would write something like:

           $string =~ /class=\"genus\">([^<]+)<\/i>/;  
            my $genus = $1;


How do you write this in Python?
Thanks:


Python
0
Comment
Question by:cpeters5
  • 2
  • 2
5 Comments
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 40364537
One approach:

import re

regex = re.compile('class=\"genus\">([^<]+)</i>')
string = '      <span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</i> <span class="authorship">D.L.Jones</span></span>'
match = regex.search(string)
genus = match.group(1)

Open in new window

0
 
LVL 28

Expert Comment

by:pepr
ID: 40364734
It may be better to use a XML parser that a part of Python distribution.
0
 

Author Comment

by:cpeters5
ID: 40364813
Thanks pepr.  I will take a look.  (Still very green, havn't gotten to the XML section yet.)    The files I am parsing are just HTML, they are not well formed.  Would this be a problem?
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40365080
Take a look at BeautifulSoup. It deals with sloppy HTML well.
0
 
LVL 28

Expert Comment

by:pepr
ID: 40365316
+1 for BeautifulSoup. Anyway, if you separate good HTML fragment, you can use the standard xml.etree.ElementTree:
#!python3

import xml.etree.ElementTree as ET

s = '      <span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</i> <span class="authorship">D.L.Jones</span></span>'

element = ET.fromstring(s)
ET.dump(element)

print('------------')

# To find the specific <i > element wherever it is.
genus = element.find('.//i[@class="genus"]')
print(genus.text)

# Similarly for the species.
species = element.find('./i[@class="species"]')
print(species.text)

print('------------')

# Looping through the structure. The `.attrib` is a dictionary
# of the element attributes; the `element` behaves as the list 
# of children
print(element.tag, element.attrib)
for e in element:
    print(e.tag, e.attrib['class'], e.text)

Open in new window

It prints on console:
c:\__Python\cpeters5\Q_28532295>a.py
<span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</
i> <span class="authorship">D.L.Jones</span></span>
------------
Corysanthes
grumula
------------
span {'class': 'name'}
i genus Corysanthes
i species grumula
span authorship D.L.Jones

Open in new window

0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Plenty of writing has gone on the web trying to compare Python with other competitive programming languages and vice versa. However, not much has been put into a wholistic perspective. This article should help you decide whether to adopt Python as a…
This article will show the steps for installing Python on Ubuntu Operating System. I have created a virtual machine with Ubuntu Operating system 8.10 and this installing process also works with upgraded version of Ubuntu OS. For installing Py…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question