Solved

how to edit name from variable with python

Posted on 2015-01-17
32
138 Views
Last Modified: 2015-01-18
hello

i use bs4 and get title tag on the website
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname.text
print oname.text

Open in new window


résult

Assassination Classroom - Saison 1 Épisode 2 - VOSTFR


and i wold like edit this to get in a variable
Assassination Classroom - S 01 Ep 2


and rename my files

ty in advance
0
Comment
Question by:Gaaara
  • 17
  • 15
32 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 500 total points
Comment Utility
Here's one way:
import re
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
print(oname_cleanedup)

Open in new window

0
 

Author Comment

by:Gaaara
Comment Utility
i have a syntax probleme

my code
http://pastebin.com/t5r5tFgN

error

 File "start.py", line 37
    elif menu=="3":
    ^

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
Python cares about indentation.  Lines 33-40 in your pastebin need to be indented so they are part of the elif menu =="2" block.
0
 

Author Comment

by:Gaaara
Comment Utility
i don't understand sorry ?
0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
This:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window

Needs to be this:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window


The indentation needs to line up.
0
 

Author Comment

by:Gaaara
Comment Utility
ty for your help
0
 

Author Comment

by:Gaaara
Comment Utility
ehh is not work sorry  the name is not edited sorry I was too fast
0
 

Author Comment

by:Gaaara
Comment Utility
it works just on python direcly no in my script ....
0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
You'll need to give me a bit more to go on.  What is it doing?  Just not matching/  Try the following and paste back what it spits out...

      oname = soup.find("title")
      print("Searching: {0}".format(oname.text))
      match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)'):
      if match:
          print "Found match"
          oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
          print(oname_cleanedup)
      else:
           print("Match not found")

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
I went back to your pastebin and I'm not seeing where you're importing re.  You need to make sure to import the regular expression module.
import re

Open in new window

0
 

Author Comment

by:Gaaara
Comment Utility
yes re is imported :)

i have an error of syntax
File "start.py", line 35
    match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)'):

Open in new window


i deleted the
 :

Open in new window

i have the error

raceback (most recent call last):
  File "start.py", line 34, in <module>
    print("Searching: {0}".format(oname.text))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc9' in position 35: ordinal not in range(128)

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
Change:
  print("Searching: {0}".format(oname.text))

To:
   print("Searching: {0}".format(oname.text.encode('ascii', 'xmlcharrefreplace')))

And:
   print(oname_cleanedup)

To:
   print(oname_cleanedup.encode('ascii', 'xmlcharrefreplace'))
0
 

Author Comment

by:Gaaara
Comment Utility
i have other error :)

 Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Traceback (most recent call last):
  File "start.py", line 35, in <module>
    match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)')
TypeError: search() takes at least 2 arguments (1 given)

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', oname.text)
0
 

Author Comment

by:Gaaara
Comment Utility
not work :)

Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Match not found

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
Then I'm guessing you may be have encoding problems (e.g., your file/editor may not really be using utf-8 and your É may not be the same É), because that regular expression should match.  Try changing the regular expression to:

r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)'
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 

Author Comment

by:Gaaara
Comment Utility
not work ^^
Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Found match
Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR

Open in new window


is possible to créate a function with a different python files to get the modification name with principal script ?
0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
If it printed "found match", then the regular expression matched.  And the substitution would have worked too (if you had changed it to use the \xc9).  But if you're saying it didn't work, I'm guessing you didn't modify the second regular expression (the one in the sub call) to use the \xc9.  

Can you post your entire code?  This back and forth is getting us nowhere fast.
0
 

Author Comment

by:Gaaara
Comment Utility
ok

http://pastebin.com/EF0umjd9

I thought of creating a new script python to get and modificate the name and générate a variable for the principal script
0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
You didn't change the regular expression in the sub call to use a \xc9 rather than the É.   The problem is that whatever editor you're using isn't using the right-encoding.  Your É isn't the utf-8 É that is in the webpage.  To get around that, use the \xc9 (which is the utf-8 character code for É).

In other words, change line 37 to:

oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)',

And you can also probably change line 40 back to just:

 print(oname_cleanedup)

Whether you really can or not depends on the codepage that your console is using.  You'll know when you try to print it.  If it bombs on that print line it's because whatever codepage your console is in, it doesn't have a translation for that character.  

If you want to get rid of the debug output all together, you can go back with my original post from way earlier in the day, and just use \xc9 rather than the É:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window

0
 

Author Comment

by:Gaaara
Comment Utility
arr /$$%$%? :(

i have somme dificulty with indentation you have tool to help me ?

ty for your post
0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
You might want to try using a python ide.   Pycharm is a good choice (https://www.jetbrains.com/pycharm/).  There's a free community edition and on occasions they've offered the free full version to students (https://www.jetbrains.com/student/).
0
 

Author Comment

by:Gaaara
Comment Utility
eem i have a probleme

the modification it works just one mor bug

how to resset the variable in the end of the script ?

By the fact that the information is kept(guarded) for the end of the script and when I begin again the process he(it) keeps(guards) this information and distorts to give them

and on oder link the résult is

Naruto Shippuden -  Épisode 392 - VOSTFR

Open in new window


suposed to

Naruto Shippuden - Ep 392 

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
I'm not understanding what you mean with your first question.  

As far as the second question, you should be able to use the following:

    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      if oname.text.find('Saison') >= 0:
          regexp = r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)'
          subst = "{title} {season:02d} Ep {episode}"
      else:
          regexp = r'(.*?\s+-)(\s+)\xc9.*?(\d+)(.*)'
          subst = "{title} Ep {episode}"
      oname_cleanedup = re.sub(regexp, 
                         lambda m: subst.format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)

Open in new window

0
 

Author Comment

by:Gaaara
Comment Utility
error
Traceback (most recent call last):
  File "start.py", line 41, in <module>
    oname.text)
  File "/usr/lib64/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "start.py", line 40, in <lambda>
    lambda m: subst.format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
ValueError: invalid literal for int() with base 10: ''

Open in new window

0
 

Author Comment

by:Gaaara
Comment Utility
for the fris question

my menu is a loop the script end and display de choice menu and i use the same option = get the same title name

exemple

title 1 naruto
title 2 bleach

run script link naruto

résult naruto Ep 01
and back to menu and choose the same option title bleach

result naruto Ep 01
0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
lambda m: subst.format(title=m.group(1), season=int(m.group(2)) if m.group(2).find(" ")==-1 else "", episode=m.group(3))

And it shouldn't do that the second time through... The oname_cleanedup variable is set based on the oname.text variable which should be reset by your request.

So, I'd need to see the real code where you're making your request.  Because this line:

response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
     
Can't be your real code because that request is hardcoded and would always return that assassination classroom episode 2 page.
0
 

Author Comment

by:Gaaara
Comment Utility
Can't be your real code because that request is hardcoded and would always return that assassination classroom episode 2 page.

yes i remplace this with olinks :)

error
 File "/home/gaaara/adn/test2.py", line 21
    oname.text)
        ^
SyntaxError: invalid syntax

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
Based on the error it looks like your line 21 is probably just:
 
   oname.text)

I'm guessing you put it on a line by itself and it's supposed to be up on line 20.  

And I'm sorry but I think I'm done with this question.  It's eaten up way way too much of my time.  I think you might want to learn a little bit more about python syntax.  You need some basic python knowledge here that I'm guessing might be missing-- indentation, where you can and can't wrap lines.  Sorry but I gotta give this question up.
0
 

Author Closing Comment

by:Gaaara
Comment Utility
ok ty for your help :)  ^^    It was really appreciated your help
0
 
LVL 25

Expert Comment

by:clockwatcher
Comment Utility
Looked at it again and I'm guessing what you did is that you got rid of the ending comma "," on line 20.  Anyway... now I really do have to give this one up and good luck to you with your program.
0
 

Author Comment

by:Gaaara
Comment Utility
it works :)
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Flask is a microframework for Python based on Werkzeug and Jinja 2. This requires you to have a good understanding of Python 2.7. Lets install Flask! To install Flask you can use a python repository for libraries tool called pip. Download this f…
Sequence is something that used to store data in it in very simple words. Let us just create a list first. To create a list first of all we need to give a name to our list which I have taken as “COURSE” followed by equals sign and finally enclosed …
This video teaches viewers about errors in exception handling.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

7 Experts available now in Live!

Get 1:1 Help Now