Solved

Python TypeError

Posted on 2014-10-31
3
251 Views
Last Modified: 2014-10-31
The following code snippet fetches content of a url and searches for string of the form "abc-1234"
In this example, the "url http://www.theplantlist.org/1.1/browse/A/Orchidaceae/Aa/" does contain a line
<a href="/tpl1.1/record/kew-34">

When matching against this string directly (8 - 10)  the pattern works, but when matching within the for loop, I got TypeError.  Seems like Python is not happy with type mismatch.  How do I fix this?


from urllib.request import Request, urlopen
from urllib.error import  URLError
import re

root = 'http://www.theplantlist.org/'
genurl = '{0}1.1/browse/A/Orchidaceae/Aa/'.format(root)

line = '<a href="/tpl1.1/record/kew-456080">'
p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', line)
print(p.group(2), p.group(3))

try:
	greq = Request(genurl)
except:
	x = 'Error connecting to {0}'.format(genurl)
	sys.exit(x)
		
gresponse = urlopen(greq)

for gline in gresponse:
	#--Example     <a href="/tpl1.1/record/kew-456080">
	p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', gline)
	print("\t",p.group(2),p.group(3))

Open in new window

0
Comment
Question by:cpeters5
  • 2
3 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
ID: 40416800
Your gline is an array of bytes but your regular expression search is expecting a string.  So you'll need to convert it to a string in some way.  You really need to know the page encoding to do it properly but utf-8 is probably a good guess.  E.g.,

from urllib.request import Request, urlopen
from urllib.error import  URLError
import re

root = 'http://www.theplantlist.org/'
genurl = '{0}1.1/browse/A/Orchidaceae/Aa/'.format(root)

line = '<a href="/tpl1.1/record/kew-456080">'
p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', line)
print(p.group(2), p.group(3))

try:
    greq = Request(genurl)
except:
    x = 'Error connecting to {0}'.format(genurl)
    sys.exit(x)
        
gresponse = urlopen(greq)

for gline in gresponse:
    #--Example     <a href="/tpl1.1/record/kew-456080">
    p = re.search('<a href="(.*?)record/(.*?)-(\d+)".*?>', gline.decode('utf-8'))
    if p:
            print("\t",p.group(2),p.group(3))

Open in new window

0
 

Author Closing Comment

by:cpeters5
ID: 40416817
Thank you
0
 

Author Comment

by:cpeters5
ID: 40416891
0

Featured Post

Is Your AD Toolbox Looking More Like a Toybox?

Managing Active Directory can get complicated.  Often, the native tools for managing AD are just not up to the task.  The largest Active Directory installations in the world have relied on one tool to manage their day-to-day administration tasks: Hyena. Start your trial today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
AttributeError: 'URLError' object has no attribute 'code' 2 848
Python tuples 2 123
Python 2.7 - Save to file 4 61
Difference of import usage in Python 1 69
Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

832 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question