Link to home
Start Free TrialLog in
Avatar of Gaaara
Gaaara

asked on

how to edit name from variable with python

hello

i use bs4 and get title tag on the website
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname.text
print oname.text

Open in new window


résult

Assassination Classroom - Saison 1 Épisode 2 - VOSTFR


and i wold like edit this to get in a variable
Assassination Classroom - S 01 Ep 2


and rename my files

ty in advance
ASKER CERTIFIED SOLUTION
Avatar of clockwatcher
clockwatcher

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Gaaara
Gaaara

ASKER

i have a syntax probleme

my code
http://pastebin.com/t5r5tFgN

error

 File "start.py", line 37
    elif menu=="3":
    ^

Open in new window

Python cares about indentation.  Lines 33-40 in your pastebin need to be indented so they are part of the elif menu =="2" block.
Avatar of Gaaara

ASKER

i don't understand sorry ?
This:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window

Needs to be this:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window


The indentation needs to line up.
Avatar of Gaaara

ASKER

ty for your help
Avatar of Gaaara

ASKER

ehh is not work sorry  the name is not edited sorry I was too fast
Avatar of Gaaara

ASKER

it works just on python direcly no in my script ....
You'll need to give me a bit more to go on.  What is it doing?  Just not matching/  Try the following and paste back what it spits out...

      oname = soup.find("title")
      print("Searching: {0}".format(oname.text))
      match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)'):
      if match:
          print "Found match"
          oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
          print(oname_cleanedup)
      else:
           print("Match not found")

Open in new window

I went back to your pastebin and I'm not seeing where you're importing re.  You need to make sure to import the regular expression module.
import re

Open in new window

Avatar of Gaaara

ASKER

yes re is imported :)

i have an error of syntax
File "start.py", line 35
    match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)'):

Open in new window


i deleted the
 :

Open in new window

i have the error

raceback (most recent call last):
  File "start.py", line 34, in <module>
    print("Searching: {0}".format(oname.text))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc9' in position 35: ordinal not in range(128)

Open in new window

Change:
  print("Searching: {0}".format(oname.text))

To:
   print("Searching: {0}".format(oname.text.encode('ascii', 'xmlcharrefreplace')))

And:
   print(oname_cleanedup)

To:
   print(oname_cleanedup.encode('ascii', 'xmlcharrefreplace'))
Avatar of Gaaara

ASKER

i have other error :)

 Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Traceback (most recent call last):
  File "start.py", line 35, in <module>
    match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)')
TypeError: search() takes at least 2 arguments (1 given)

Open in new window

match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', oname.text)
Avatar of Gaaara

ASKER

not work :)

Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Match not found

Open in new window

Then I'm guessing you may be have encoding problems (e.g., your file/editor may not really be using utf-8 and your É may not be the same É), because that regular expression should match.  Try changing the regular expression to:

r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)'
Avatar of Gaaara

ASKER

not work ^^
Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Found match
Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR

Open in new window


is possible to créate a function with a different python files to get the modification name with principal script ?
If it printed "found match", then the regular expression matched.  And the substitution would have worked too (if you had changed it to use the \xc9).  But if you're saying it didn't work, I'm guessing you didn't modify the second regular expression (the one in the sub call) to use the \xc9.  

Can you post your entire code?  This back and forth is getting us nowhere fast.
Avatar of Gaaara

ASKER

ok

http://pastebin.com/EF0umjd9

I thought of creating a new script python to get and modificate the name and générate a variable for the principal script
You didn't change the regular expression in the sub call to use a \xc9 rather than the É.   The problem is that whatever editor you're using isn't using the right-encoding.  Your É isn't the utf-8 É that is in the webpage.  To get around that, use the \xc9 (which is the utf-8 character code for É).

In other words, change line 37 to:

oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)',

And you can also probably change line 40 back to just:

 print(oname_cleanedup)

Whether you really can or not depends on the codepage that your console is using.  You'll know when you try to print it.  If it bombs on that print line it's because whatever codepage your console is in, it doesn't have a translation for that character.  

If you want to get rid of the debug output all together, you can go back with my original post from way earlier in the day, and just use \xc9 rather than the É:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window

Avatar of Gaaara

ASKER

arr /$$%$%? :(

i have somme dificulty with indentation you have tool to help me ?

ty for your post
You might want to try using a python ide.   Pycharm is a good choice (https://www.jetbrains.com/pycharm/).  There's a free community edition and on occasions they've offered the free full version to students (https://www.jetbrains.com/student/).
Avatar of Gaaara

ASKER

eem i have a probleme

the modification it works just one mor bug

how to resset the variable in the end of the script ?

By the fact that the information is kept(guarded) for the end of the script and when I begin again the process he(it) keeps(guards) this information and distorts to give them

and on oder link the résult is

Naruto Shippuden -  Épisode 392 - VOSTFR

Open in new window


suposed to

Naruto Shippuden - Ep 392 

Open in new window

I'm not understanding what you mean with your first question.  

As far as the second question, you should be able to use the following:

    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      if oname.text.find('Saison') >= 0:
          regexp = r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)'
          subst = "{title} {season:02d} Ep {episode}"
      else:
          regexp = r'(.*?\s+-)(\s+)\xc9.*?(\d+)(.*)'
          subst = "{title} Ep {episode}"
      oname_cleanedup = re.sub(regexp, 
                         lambda m: subst.format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)

Open in new window

Avatar of Gaaara

ASKER

error
Traceback (most recent call last):
  File "start.py", line 41, in <module>
    oname.text)
  File "/usr/lib64/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "start.py", line 40, in <lambda>
    lambda m: subst.format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
ValueError: invalid literal for int() with base 10: ''

Open in new window

Avatar of Gaaara

ASKER

for the fris question

my menu is a loop the script end and display de choice menu and i use the same option = get the same title name

exemple

title 1 naruto
title 2 bleach

run script link naruto

résult naruto Ep 01
and back to menu and choose the same option title bleach

result naruto Ep 01
lambda m: subst.format(title=m.group(1), season=int(m.group(2)) if m.group(2).find(" ")==-1 else "", episode=m.group(3))

And it shouldn't do that the second time through... The oname_cleanedup variable is set based on the oname.text variable which should be reset by your request.

So, I'd need to see the real code where you're making your request.  Because this line:

response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
     
Can't be your real code because that request is hardcoded and would always return that assassination classroom episode 2 page.
Avatar of Gaaara

ASKER

Can't be your real code because that request is hardcoded and would always return that assassination classroom episode 2 page.

yes i remplace this with olinks :)

error
 File "/home/gaaara/adn/test2.py", line 21
    oname.text)
        ^
SyntaxError: invalid syntax

Open in new window

Based on the error it looks like your line 21 is probably just:
 
   oname.text)

I'm guessing you put it on a line by itself and it's supposed to be up on line 20.  

And I'm sorry but I think I'm done with this question.  It's eaten up way way too much of my time.  I think you might want to learn a little bit more about python syntax.  You need some basic python knowledge here that I'm guessing might be missing-- indentation, where you can and can't wrap lines.  Sorry but I gotta give this question up.
Avatar of Gaaara

ASKER

ok ty for your help :)  ^^    It was really appreciated your help
Looked at it again and I'm guessing what you did is that you got rid of the ending comma "," on line 20.  Anyway... now I really do have to give this one up and good luck to you with your program.
Avatar of Gaaara

ASKER

it works :)