?
Solved

Python TypeError

Posted on 2014-10-31
3
Medium Priority
?
274 Views
Last Modified: 2014-10-31
The following code snippet fetches content of a url and searches for string of the form "abc-1234"
In this example, the "url http://www.theplantlist.org/1.1/browse/A/Orchidaceae/Aa/" does contain a line
<a href="/tpl1.1/record/kew-34">

When matching against this string directly (8 - 10)  the pattern works, but when matching within the for loop, I got TypeError.  Seems like Python is not happy with type mismatch.  How do I fix this?


from urllib.request import Request, urlopen
from urllib.error import  URLError
import re

root = 'http://www.theplantlist.org/'
genurl = '{0}1.1/browse/A/Orchidaceae/Aa/'.format(root)

line = '<a href="/tpl1.1/record/kew-456080">'
p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', line)
print(p.group(2), p.group(3))

try:
	greq = Request(genurl)
except:
	x = 'Error connecting to {0}'.format(genurl)
	sys.exit(x)
		
gresponse = urlopen(greq)

for gline in gresponse:
	#--Example     <a href="/tpl1.1/record/kew-456080">
	p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', gline)
	print("\t",p.group(2),p.group(3))

Open in new window

0
Comment
Question by:cpeters5
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
3 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 2000 total points
ID: 40416800
Your gline is an array of bytes but your regular expression search is expecting a string.  So you'll need to convert it to a string in some way.  You really need to know the page encoding to do it properly but utf-8 is probably a good guess.  E.g.,

from urllib.request import Request, urlopen
from urllib.error import  URLError
import re

root = 'http://www.theplantlist.org/'
genurl = '{0}1.1/browse/A/Orchidaceae/Aa/'.format(root)

line = '<a href="/tpl1.1/record/kew-456080">'
p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', line)
print(p.group(2), p.group(3))

try:
    greq = Request(genurl)
except:
    x = 'Error connecting to {0}'.format(genurl)
    sys.exit(x)
        
gresponse = urlopen(greq)

for gline in gresponse:
    #--Example     <a href="/tpl1.1/record/kew-456080">
    p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', gline.decode('utf-8'))
    if p:
            print("\t",p.group(2),p.group(3))

Open in new window

0
 

Author Closing Comment

by:cpeters5
ID: 40416817
Thank you
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Swadhin
Introduction of Lists in Python: There are six built-in types of sequences. Lists and tuples are the most common one. In this article we will see how to use Lists in python and how we can utilize it while doing our own program. In general we can al…
When we want to run, execute or repeat a statement multiple times, a loop is necessary. This article covers the two types of loops in Python: the while loop and the for loop.
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

719 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question