urls and normal strings

Hi, i have a string which i would like to pass to an URL and get the results from the URL parsed to the resulting string.
Here is a snippet of my code


import getopt
import os
import re
import string
import sys
import urllib

### Compiled regular expressions

firstLineRE = re.compile(r'<td bgcolor=white class=s><div style=padding:10px;>(?P<mean>.*?)</div></td>')

if __name__ == "__main__":
    # process command-line args
        opts, args = getopt.getopt(sys.argv[1:], "l")
    except getopt.GetoptError:

    pico = string.replace(args[1]," ","%20")
    # Submit the query and open a file handle to the results page.
    class AppURLopener(urllib.FancyURLopener):
         def __init__(self, *args):
             self.version = "Mozilla/4.0"
             urllib.FancyURLopener.__init__(self, *args)
    urllib._urlopener = AppURLopener()
    f = urllib.urlopen("http://babel.altavista.com/translate.dyn?enc=utf8&doit=done&BabelFishFrontPage=yes&bblType=urltext&trtext=%s&lp=en_es" % pico)    
    linenum = 0    
    for line in f.readlines():        

        # check this line against the appropriate RE.        
        line = line.strip()
        linenum += 1
        match = firstLineRE.search(line)
        if match is not None:
            print match.group("mean")        

Now the script takes a word (and some operators) and returns the spanish equivalent of it. the current problem I'm having is that it wont translate sentances or words with characters such as !@#$%^&*)(_+\ etc
anyway, I know that there are 2 problems here...first i have to convert all spaces (and other characters) to URL readable (such as spaces to %20 etc). unfortunately i dont know other than the spaces what corresponds to what, pls help.
the second problem is whenever the output as a character which cant be printed it will throw a unicode decode error...ivee been trying to catch it by putting a try statement inside the regex function but it doesnt work, please help me here as well.
thanks for your help!
Who is Participating?
mish33Connect With a Mentor Commented:
1st problem:
    pico = string.replace(args[1]," ","%20")
    pico = urllib.quote(args[1])

2nd problem:
    print match.group("mean")
    print match.group("mean").encode('latin1')
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.