Solved

Python TypeError

Posted on 2014-10-31
3
249 Views
Last Modified: 2014-10-31
The following code snippet fetches content of a url and searches for string of the form "abc-1234"
In this example, the "url http://www.theplantlist.org/1.1/browse/A/Orchidaceae/Aa/" does contain a line
<a href="/tpl1.1/record/kew-34">

When matching against this string directly (8 - 10)  the pattern works, but when matching within the for loop, I got TypeError.  Seems like Python is not happy with type mismatch.  How do I fix this?


from urllib.request import Request, urlopen
from urllib.error import  URLError
import re

root = 'http://www.theplantlist.org/'
genurl = '{0}1.1/browse/A/Orchidaceae/Aa/'.format(root)

line = '<a href="/tpl1.1/record/kew-456080">'
p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', line)
print(p.group(2), p.group(3))

try:
	greq = Request(genurl)
except:
	x = 'Error connecting to {0}'.format(genurl)
	sys.exit(x)
		
gresponse = urlopen(greq)

for gline in gresponse:
	#--Example     <a href="/tpl1.1/record/kew-456080">
	p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', gline)
	print("\t",p.group(2),p.group(3))

Open in new window

0
Comment
Question by:cpeters5
  • 2
3 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 40416800
Your gline is an array of bytes but your regular expression search is expecting a string.  So you'll need to convert it to a string in some way.  You really need to know the page encoding to do it properly but utf-8 is probably a good guess.  E.g.,

from urllib.request import Request, urlopen
from urllib.error import  URLError
import re

root = 'http://www.theplantlist.org/'
genurl = '{0}1.1/browse/A/Orchidaceae/Aa/'.format(root)

line = '<a href="/tpl1.1/record/kew-456080">'
p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', line)
print(p.group(2), p.group(3))

try:
    greq = Request(genurl)
except:
    x = 'Error connecting to {0}'.format(genurl)
    sys.exit(x)
        
gresponse = urlopen(greq)

for gline in gresponse:
    #--Example     <a href="/tpl1.1/record/kew-456080">
    p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', gline.decode('utf-8'))
    if p:
            print("\t",p.group(2),p.group(3))

Open in new window

0
 

Author Closing Comment

by:cpeters5
ID: 40416817
Thank you
0
 

Author Comment

by:cpeters5
ID: 40416891
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Plenty of writing has gone on the web trying to compare Python with other competitive programming languages and vice versa. However, not much has been put into a wholistic perspective. This article should help you decide whether to adopt Python as a…
Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

16 Experts available now in Live!

Get 1:1 Help Now