Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 198
  • Last Modified:

how to edit name from variable with python

hello

i use bs4 and get title tag on the website
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname.text
print oname.text

Open in new window


résult

Assassination Classroom - Saison 1 Épisode 2 - VOSTFR


and i wold like edit this to get in a variable
Assassination Classroom - S 01 Ep 2


and rename my files

ty in advance
0
Gaaara
Asked:
Gaaara
  • 17
  • 15
1 Solution
 
clockwatcherCommented:
Here's one way:
import re
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
print(oname_cleanedup)

Open in new window

0
 
GaaaraAuthor Commented:
i have a syntax probleme

my code
http://pastebin.com/t5r5tFgN

error

 File "start.py", line 37
    elif menu=="3":
    ^

Open in new window

0
 
clockwatcherCommented:
Python cares about indentation.  Lines 33-40 in your pastebin need to be indented so they are part of the elif menu =="2" block.
0
What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

 
GaaaraAuthor Commented:
i don't understand sorry ?
0
 
clockwatcherCommented:
This:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window

Needs to be this:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window


The indentation needs to line up.
0
 
GaaaraAuthor Commented:
ty for your help
0
 
GaaaraAuthor Commented:
ehh is not work sorry  the name is not edited sorry I was too fast
0
 
GaaaraAuthor Commented:
it works just on python direcly no in my script ....
0
 
clockwatcherCommented:
You'll need to give me a bit more to go on.  What is it doing?  Just not matching/  Try the following and paste back what it spits out...

      oname = soup.find("title")
      print("Searching: {0}".format(oname.text))
      match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)'):
      if match:
          print "Found match"
          oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
          print(oname_cleanedup)
      else:
           print("Match not found")

Open in new window

0
 
clockwatcherCommented:
I went back to your pastebin and I'm not seeing where you're importing re.  You need to make sure to import the regular expression module.
import re

Open in new window

0
 
GaaaraAuthor Commented:
yes re is imported :)

i have an error of syntax
File "start.py", line 35
    match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)'):

Open in new window


i deleted the
 :

Open in new window

i have the error

raceback (most recent call last):
  File "start.py", line 34, in <module>
    print("Searching: {0}".format(oname.text))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc9' in position 35: ordinal not in range(128)

Open in new window

0
 
clockwatcherCommented:
Change:
  print("Searching: {0}".format(oname.text))

To:
   print("Searching: {0}".format(oname.text.encode('ascii', 'xmlcharrefreplace')))

And:
   print(oname_cleanedup)

To:
   print(oname_cleanedup.encode('ascii', 'xmlcharrefreplace'))
0
 
GaaaraAuthor Commented:
i have other error :)

 Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Traceback (most recent call last):
  File "start.py", line 35, in <module>
    match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)')
TypeError: search() takes at least 2 arguments (1 given)

Open in new window

0
 
clockwatcherCommented:
match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', oname.text)
0
 
GaaaraAuthor Commented:
not work :)

Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Match not found

Open in new window

0
 
clockwatcherCommented:
Then I'm guessing you may be have encoding problems (e.g., your file/editor may not really be using utf-8 and your É may not be the same É), because that regular expression should match.  Try changing the regular expression to:

r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)'
0
 
GaaaraAuthor Commented:
not work ^^
Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Found match
Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR

Open in new window


is possible to créate a function with a different python files to get the modification name with principal script ?
0
 
clockwatcherCommented:
If it printed "found match", then the regular expression matched.  And the substitution would have worked too (if you had changed it to use the \xc9).  But if you're saying it didn't work, I'm guessing you didn't modify the second regular expression (the one in the sub call) to use the \xc9.  

Can you post your entire code?  This back and forth is getting us nowhere fast.
0
 
GaaaraAuthor Commented:
ok

http://pastebin.com/EF0umjd9

I thought of creating a new script python to get and modificate the name and générate a variable for the principal script
0
 
clockwatcherCommented:
You didn't change the regular expression in the sub call to use a \xc9 rather than the É.   The problem is that whatever editor you're using isn't using the right-encoding.  Your É isn't the utf-8 É that is in the webpage.  To get around that, use the \xc9 (which is the utf-8 character code for É).

In other words, change line 37 to:

oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)',

And you can also probably change line 40 back to just:

 print(oname_cleanedup)

Whether you really can or not depends on the codepage that your console is using.  You'll know when you try to print it.  If it bombs on that print line it's because whatever codepage your console is in, it doesn't have a translation for that character.  

If you want to get rid of the debug output all together, you can go back with my original post from way earlier in the day, and just use \xc9 rather than the É:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window

0
 
GaaaraAuthor Commented:
arr /$$%$%? :(

i have somme dificulty with indentation you have tool to help me ?

ty for your post
0
 
clockwatcherCommented:
You might want to try using a python ide.   Pycharm is a good choice (https://www.jetbrains.com/pycharm/).  There's a free community edition and on occasions they've offered the free full version to students (https://www.jetbrains.com/student/).
0
 
GaaaraAuthor Commented:
eem i have a probleme

the modification it works just one mor bug

how to resset the variable in the end of the script ?

By the fact that the information is kept(guarded) for the end of the script and when I begin again the process he(it) keeps(guards) this information and distorts to give them

and on oder link the résult is

Naruto Shippuden -  Épisode 392 - VOSTFR

Open in new window


suposed to

Naruto Shippuden - Ep 392 

Open in new window

0
 
clockwatcherCommented:
I'm not understanding what you mean with your first question.  

As far as the second question, you should be able to use the following:

    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      if oname.text.find('Saison') >= 0:
          regexp = r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)'
          subst = "{title} {season:02d} Ep {episode}"
      else:
          regexp = r'(.*?\s+-)(\s+)\xc9.*?(\d+)(.*)'
          subst = "{title} Ep {episode}"
      oname_cleanedup = re.sub(regexp, 
                         lambda m: subst.format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)

Open in new window

0
 
GaaaraAuthor Commented:
error
Traceback (most recent call last):
  File "start.py", line 41, in <module>
    oname.text)
  File "/usr/lib64/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "start.py", line 40, in <lambda>
    lambda m: subst.format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
ValueError: invalid literal for int() with base 10: ''

Open in new window

0
 
GaaaraAuthor Commented:
for the fris question

my menu is a loop the script end and display de choice menu and i use the same option = get the same title name

exemple

title 1 naruto
title 2 bleach

run script link naruto

résult naruto Ep 01
and back to menu and choose the same option title bleach

result naruto Ep 01
0
 
clockwatcherCommented:
lambda m: subst.format(title=m.group(1), season=int(m.group(2)) if m.group(2).find(" ")==-1 else "", episode=m.group(3))

And it shouldn't do that the second time through... The oname_cleanedup variable is set based on the oname.text variable which should be reset by your request.

So, I'd need to see the real code where you're making your request.  Because this line:

response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
     
Can't be your real code because that request is hardcoded and would always return that assassination classroom episode 2 page.
0
 
GaaaraAuthor Commented:
Can't be your real code because that request is hardcoded and would always return that assassination classroom episode 2 page.

yes i remplace this with olinks :)

error
 File "/home/gaaara/adn/test2.py", line 21
    oname.text)
        ^
SyntaxError: invalid syntax

Open in new window

0
 
clockwatcherCommented:
Based on the error it looks like your line 21 is probably just:
 
   oname.text)

I'm guessing you put it on a line by itself and it's supposed to be up on line 20.  

And I'm sorry but I think I'm done with this question.  It's eaten up way way too much of my time.  I think you might want to learn a little bit more about python syntax.  You need some basic python knowledge here that I'm guessing might be missing-- indentation, where you can and can't wrap lines.  Sorry but I gotta give this question up.
0
 
GaaaraAuthor Commented:
ok ty for your help :)  ^^    It was really appreciated your help
0
 
clockwatcherCommented:
Looked at it again and I'm guessing what you did is that you got rid of the ending comma "," on line 20.  Anyway... now I really do have to give this one up and good luck to you with your program.
0
 
GaaaraAuthor Commented:
it works :)
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

The 14th Annual Expert Award Winners

The results are in! Meet the top members of our 2017 Expert Awards. Congratulations to all who qualified!

  • 17
  • 15
Tackle projects and never again get stuck behind a technical roadblock.
Join Now