Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Python TypeError

Posted on 2014-10-31
3
Medium Priority
?
298 Views
Last Modified: 2014-10-31
The following code snippet fetches content of a url and searches for string of the form "abc-1234"
In this example, the "url http://www.theplantlist.org/1.1/browse/A/Orchidaceae/Aa/" does contain a line
<a href="/tpl1.1/record/kew-34">

When matching against this string directly (8 - 10)  the pattern works, but when matching within the for loop, I got TypeError.  Seems like Python is not happy with type mismatch.  How do I fix this?


from urllib.request import Request, urlopen
from urllib.error import  URLError
import re

root = 'http://www.theplantlist.org/'
genurl = '{0}1.1/browse/A/Orchidaceae/Aa/'.format(root)

line = '<a href="/tpl1.1/record/kew-456080">'
p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', line)
print(p.group(2), p.group(3))

try:
	greq = Request(genurl)
except:
	x = 'Error connecting to {0}'.format(genurl)
	sys.exit(x)
		
gresponse = urlopen(greq)

for gline in gresponse:
	#--Example     <a href="/tpl1.1/record/kew-456080">
	p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', gline)
	print("\t",p.group(2),p.group(3))

Open in new window

0
Comment
Question by:cpeters5
  • 2
3 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 2000 total points
ID: 40416800
Your gline is an array of bytes but your regular expression search is expecting a string.  So you'll need to convert it to a string in some way.  You really need to know the page encoding to do it properly but utf-8 is probably a good guess.  E.g.,

from urllib.request import Request, urlopen
from urllib.error import  URLError
import re

root = 'http://www.theplantlist.org/'
genurl = '{0}1.1/browse/A/Orchidaceae/Aa/'.format(root)

line = '<a href="/tpl1.1/record/kew-456080">'
p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', line)
print(p.group(2), p.group(3))

try:
    greq = Request(genurl)
except:
    x = 'Error connecting to {0}'.format(genurl)
    sys.exit(x)
        
gresponse = urlopen(greq)

for gline in gresponse:
    #--Example     <a href="/tpl1.1/record/kew-456080">
    p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', gline.decode('utf-8'))
    if p:
            print("\t",p.group(2),p.group(3))

Open in new window

0
 

Author Closing Comment

by:cpeters5
ID: 40416817
Thank you
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Plenty of writing has gone on the web trying to compare Python with other competitive programming languages and vice versa. However, not much has been put into a wholistic perspective. This article should help you decide whether to adopt Python as a…
Flask is a microframework for Python based on Werkzeug and Jinja 2. This requires you to have a good understanding of Python 2.7. Lets install Flask! To install Flask you can use a python repository for libraries tool called pip. Download this f…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
Suggested Courses
Course of the Month11 days, 20 hours left to enroll

564 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question