Solved

Python TypeError

Posted on 2014-10-31
3
243 Views
Last Modified: 2014-10-31
The following code snippet fetches content of a url and searches for string of the form "abc-1234"
In this example, the "url http://www.theplantlist.org/1.1/browse/A/Orchidaceae/Aa/" does contain a line
<a href="/tpl1.1/record/kew-34">

When matching against this string directly (8 - 10)  the pattern works, but when matching within the for loop, I got TypeError.  Seems like Python is not happy with type mismatch.  How do I fix this?


from urllib.request import Request, urlopen
from urllib.error import  URLError
import re

root = 'http://www.theplantlist.org/'
genurl = '{0}1.1/browse/A/Orchidaceae/Aa/'.format(root)

line = '<a href="/tpl1.1/record/kew-456080">'
p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', line)
print(p.group(2), p.group(3))

try:
	greq = Request(genurl)
except:
	x = 'Error connecting to {0}'.format(genurl)
	sys.exit(x)
		
gresponse = urlopen(greq)

for gline in gresponse:
	#--Example     <a href="/tpl1.1/record/kew-456080">
	p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', gline)
	print("\t",p.group(2),p.group(3))

Open in new window

0
Comment
Question by:cpeters5
  • 2
3 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 40416800
Your gline is an array of bytes but your regular expression search is expecting a string.  So you'll need to convert it to a string in some way.  You really need to know the page encoding to do it properly but utf-8 is probably a good guess.  E.g.,

from urllib.request import Request, urlopen
from urllib.error import  URLError
import re

root = 'http://www.theplantlist.org/'
genurl = '{0}1.1/browse/A/Orchidaceae/Aa/'.format(root)

line = '<a href="/tpl1.1/record/kew-456080">'
p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', line)
print(p.group(2), p.group(3))

try:
    greq = Request(genurl)
except:
    x = 'Error connecting to {0}'.format(genurl)
    sys.exit(x)
        
gresponse = urlopen(greq)

for gline in gresponse:
    #--Example     <a href="/tpl1.1/record/kew-456080">
    p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', gline.decode('utf-8'))
    if p:
            print("\t",p.group(2),p.group(3))

Open in new window

0
 

Author Closing Comment

by:cpeters5
ID: 40416817
Thank you
0
 

Author Comment

by:cpeters5
ID: 40416891
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Here I am using Python IDLE(GUI) to write a simple program and save it, so that we can just execute it in future. Because when we write any program and exit from Python then program that we have written will be lost. So for not losing our program we…
Article by: Swadhin
Introduction of Lists in Python: There are six built-in types of sequences. Lists and tuples are the most common one. In this article we will see how to use Lists in python and how we can utilize it while doing our own program. In general we can al…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now