BrianGEFF719
asked on
Strings, Binary Data - Regular Expressions
Hi I'm new to Python, I tried to pick a project that would help me learn the language so I decided on a script that would allow me to search google.
For the most part i've got the basics, however, I'm having some confusion with strings, my strings are prefixed with b, what does that mean b'string'. I had to do that to my regular expression to get it to work properly. But what does it mean exactly, is it really necessary?
For the most part i've got the basics, however, I'm having some confusion with strings, my strings are prefixed with b, what does that mean b'string'. I had to do that to my regular expression to get it to work properly. But what does it mean exactly, is it really necessary?
import http.client
import re
class GoogleQuery:
def query(self,q):
conn = http.client.HTTPConnection("www.google.com")
conn.request("GET", "/search?q=" + q)
r1 = conn.getresponse()
self.status = r1.status
self.reason = r1.reason
if self.status == 200:
self.data = r1.read()
return True
else:
return False
def parse(self):
reobj = re.compile(b'<a href="([^"]*)" class=l>(.*?)</a>')
result = reobj.findall(self.data)
for res in result:
print(res)
c = GoogleQuery()
if c.query("blah"):
c.parse()
else:
print("Unable to query google, got error: ",c.status," -- ", c.reason)
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
... and yes. The associative array (or hash table in other languages) is the Python dictionary.
To correct my above statement. If you get some data as bytes, you cannot apply a regular expression compiled for a string pattern. You have to use also the pattern of the bytes type. I have no deep experience with Python 3 and regular expression with bytes; however, you can probably use br'raw bytes' -- i.e. br prefix for the patterns.
To correct my above statement. If you get some data as bytes, you cannot apply a regular expression compiled for a string pattern. You have to use also the pattern of the bytes type. I have no deep experience with Python 3 and regular expression with bytes; however, you can probably use br'raw bytes' -- i.e. br prefix for the patterns.
For the last part of your question... Because your regular expression defines two groups, the findall() returns results with tuples of size 2. The first part is the URL, the second part is the displayed text. Try the following snippet...
import http.client
import re
class GoogleQuery:
def query(self,q):
conn = http.client.HTTPConnection("www.google.com")
conn.request("GET", "/search?q=" + q)
r1 = conn.getresponse()
self.status = r1.status
self.reason = r1.reason
if self.status == 200:
self.data = r1.read()
return True
else:
return False
def parse(self):
reobj = re.compile(br'<a href="([^"]*)" class=l>(.*?)</a>')
result = reobj.findall(self.data)
d = {} # empty dictionary
for res in result:
d[res[0]] = res[1] # insert the value for the key
return d
c = GoogleQuery()
if c.query("blah"):
d = c.parse()
for k in d:
print(k, ' --> ', d[k])
else:
print("Unable to query google, got error: ",c.status," -- ", c.reason)
ASKER
Excellent answer, thank you.
ASKER
Oh one last thing, is there anyway to covert the bytes to a string, and resolve that whole issue?
There is a built in functions str() in Python, that is used for conversion of an object to the string. As string in Python 3 must be unambiguous (concerning the interpretation), you must supply also the encoding when converting an object of the bytes type (see http://docs.python.org/3.1/library/functions.html#str). This means that you must know the encoding of the downloaded data.
For your GoogleQuery class (and the like), you may want to implement the special method named __str__ (see http://docs.python.org/3.1/reference/datamodel.html#basic-customization and http://docs.python.org/3.1/reference/datamodel.html#object.__str__). This method of the object is called by the built-in function str() when the object is passed as its argument. It is also used when you print() the object.
For your GoogleQuery class (and the like), you may want to implement the special method named __str__ (see http://docs.python.org/3.1/reference/datamodel.html#basic-customization and http://docs.python.org/3.1/reference/datamodel.html#object.__str__). This method of the object is called by the built-in function str() when the object is passed as its argument. It is also used when you print() the object.
ASKER