[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 277
  • Last Modified:

urls and normal strings

Hi, i have a string which i would like to pass to an URL and get the results from the URL parsed to the resulting string.
Here is a snippet of my code


import getopt
import os
import re
import string
import sys
import urllib

### Compiled regular expressions

firstLineRE = re.compile(r'<td bgcolor=white class=s><div style=padding:10px;>(?P<mean>.*?)</div></td>')

if __name__ == "__main__":
    # process command-line args
        opts, args = getopt.getopt(sys.argv[1:], "l")
    except getopt.GetoptError:

    pico = string.replace(args[1]," ","%20")
    # Submit the query and open a file handle to the results page.
    class AppURLopener(urllib.FancyURLopener):
         def __init__(self, *args):
             self.version = "Mozilla/4.0"
             urllib.FancyURLopener.__init__(self, *args)
    urllib._urlopener = AppURLopener()
    f = urllib.urlopen("http://babel.altavista.com/translate.dyn?enc=utf8&doit=done&BabelFishFrontPage=yes&bblType=urltext&trtext=%s&lp=en_es" % pico)    
    linenum = 0    
    for line in f.readlines():        

        # check this line against the appropriate RE.        
        line = line.strip()
        linenum += 1
        match = firstLineRE.search(line)
        if match is not None:
            print match.group("mean")        

Now the script takes a word (and some operators) and returns the spanish equivalent of it. the current problem I'm having is that it wont translate sentances or words with characters such as !@#$%^&*)(_+\ etc
anyway, I know that there are 2 problems here...first i have to convert all spaces (and other characters) to URL readable (such as spaces to %20 etc). unfortunately i dont know other than the spaces what corresponds to what, pls help.
the second problem is whenever the output as a character which cant be printed it will throw a unicode decode error...ivee been trying to catch it by putting a try statement inside the regex function but it doesnt work, please help me here as well.
thanks for your help!
1 Solution
1st problem:
    pico = string.replace(args[1]," ","%20")
    pico = urllib.quote(args[1])

2nd problem:
    print match.group("mean")
    print match.group("mean").encode('latin1')

Featured Post

Prep for the ITIL® Foundation Certification Exam

December’s Course of the Month is now available! Enroll to learn ITIL® Foundation best practices for delivering IT services effectively and efficiently.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now