Solved

Python regular expression

Posted on 2014-10-06
5
229 Views
Last Modified: 2014-10-06
I need to convert the following regex in Perl to Python

input:
my $string = "      <span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</i> <span class="authorship">D.L.Jones</span></span>"

I want to extract the value of genus (in this case, "Corysanthes") into a variable.
In Perl, I would write something like:

           $string =~ /class=\"genus\">([^<]+)<\/i>/;  
            my $genus = $1;


How do you write this in Python?
Thanks:


Python
0
Comment
Question by:cpeters5
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 40364537
One approach:

import re

regex = re.compile('class=\"genus\">([^<]+)</i>')
string = '      <span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</i> <span class="authorship">D.L.Jones</span></span>'
match = regex.search(string)
genus = match.group(1)

Open in new window

0
 
LVL 29

Expert Comment

by:pepr
ID: 40364734
It may be better to use a XML parser that a part of Python distribution.
0
 

Author Comment

by:cpeters5
ID: 40364813
Thanks pepr.  I will take a look.  (Still very green, havn't gotten to the XML section yet.)    The files I am parsing are just HTML, they are not well formed.  Would this be a problem?
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 40365080
Take a look at BeautifulSoup. It deals with sloppy HTML well.
0
 
LVL 29

Expert Comment

by:pepr
ID: 40365316
+1 for BeautifulSoup. Anyway, if you separate good HTML fragment, you can use the standard xml.etree.ElementTree:
#!python3

import xml.etree.ElementTree as ET

s = '      <span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</i> <span class="authorship">D.L.Jones</span></span>'

element = ET.fromstring(s)
ET.dump(element)

print('------------')

# To find the specific <i > element wherever it is.
genus = element.find('.//i[@class="genus"]')
print(genus.text)

# Similarly for the species.
species = element.find('./i[@class="species"]')
print(species.text)

print('------------')

# Looping through the structure. The `.attrib` is a dictionary
# of the element attributes; the `element` behaves as the list 
# of children
print(element.tag, element.attrib)
for e in element:
    print(e.tag, e.attrib['class'], e.text)

Open in new window

It prints on console:
c:\__Python\cpeters5\Q_28532295>a.py
<span class="name"><i class="genus">Corysanthes</i> <i class="species">grumula</
i> <span class="authorship">D.L.Jones</span></span>
------------
Corysanthes
grumula
------------
span {'class': 'name'}
i genus Corysanthes
i species grumula
span authorship D.L.Jones

Open in new window

0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Article by: Swadhin
Introduction of Lists in Python: There are six built-in types of sequences. Lists and tuples are the most common one. In this article we will see how to use Lists in python and how we can utilize it while doing our own program. In general we can al…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question