Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

how to edit name from variable with python

Posted on 2015-01-17
32
Medium Priority
?
157 Views
Last Modified: 2015-01-18
hello

i use bs4 and get title tag on the website
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname.text
print oname.text

Open in new window


résult

Assassination Classroom - Saison 1 Épisode 2 - VOSTFR


and i wold like edit this to get in a variable
Assassination Classroom - S 01 Ep 2


and rename my files

ty in advance
0
Comment
Question by:Gaaara
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 17
  • 15
32 Comments
 
LVL 25

Accepted Solution

by:
clockwatcher earned 2000 total points
ID: 40554928
Here's one way:
import re
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
print(oname_cleanedup)

Open in new window

0
 

Author Comment

by:Gaaara
ID: 40555233
i have a syntax probleme

my code
http://pastebin.com/t5r5tFgN

error

 File "start.py", line 37
    elif menu=="3":
    ^

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40555247
Python cares about indentation.  Lines 33-40 in your pastebin need to be indented so they are part of the elif menu =="2" block.
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 

Author Comment

by:Gaaara
ID: 40555250
i don't understand sorry ?
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40555263
This:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
session = requests.Session()
response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
soup = BeautifulSoup(response.content)
oname = soup.find("title")
oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window

Needs to be this:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window


The indentation needs to line up.
0
 

Author Comment

by:Gaaara
ID: 40555281
ty for your help
0
 

Author Comment

by:Gaaara
ID: 40555288
ehh is not work sorry  the name is not edited sorry I was too fast
0
 

Author Comment

by:Gaaara
ID: 40555341
it works just on python direcly no in my script ....
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40555681
You'll need to give me a bit more to go on.  What is it doing?  Just not matching/  Try the following and paste back what it spits out...

      oname = soup.find("title")
      print("Searching: {0}".format(oname.text))
      match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)'):
      if match:
          print "Found match"
          oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
          print(oname_cleanedup)
      else:
           print("Match not found")

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40555684
I went back to your pastebin and I'm not seeing where you're importing re.  You need to make sure to import the regular expression module.
import re

Open in new window

0
 

Author Comment

by:Gaaara
ID: 40555933
yes re is imported :)

i have an error of syntax
File "start.py", line 35
    match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)'):

Open in new window


i deleted the
 :

Open in new window

i have the error

raceback (most recent call last):
  File "start.py", line 34, in <module>
    print("Searching: {0}".format(oname.text))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc9' in position 35: ordinal not in range(128)

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40556349
Change:
  print("Searching: {0}".format(oname.text))

To:
   print("Searching: {0}".format(oname.text.encode('ascii', 'xmlcharrefreplace')))

And:
   print(oname_cleanedup)

To:
   print(oname_cleanedup.encode('ascii', 'xmlcharrefreplace'))
0
 

Author Comment

by:Gaaara
ID: 40556372
i have other error :)

 Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Traceback (most recent call last):
  File "start.py", line 35, in <module>
    match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)')
TypeError: search() takes at least 2 arguments (1 given)

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40556515
match = re.search(r'(.*?\s+-\s+S)aison\s+(\d+)\s+É.*?(\d+)(.*)', oname.text)
0
 

Author Comment

by:Gaaara
ID: 40556668
not work :)

Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Match not found

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40556692
Then I'm guessing you may be have encoding problems (e.g., your file/editor may not really be using utf-8 and your É may not be the same É), because that regular expression should match.  Try changing the regular expression to:

r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)'
0
 

Author Comment

by:Gaaara
ID: 40556915
not work ^^
Searching: Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR
Found match
Assassination Classroom - Saison 1 &#201;pisode 2 - VOSTFR

Open in new window


is possible to créate a function with a different python files to get the modification name with principal script ?
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40556926
If it printed "found match", then the regular expression matched.  And the substitution would have worked too (if you had changed it to use the \xc9).  But if you're saying it didn't work, I'm guessing you didn't modify the second regular expression (the one in the sub call) to use the \xc9.  

Can you post your entire code?  This back and forth is getting us nowhere fast.
0
 

Author Comment

by:Gaaara
ID: 40556934
ok

http://pastebin.com/EF0umjd9

I thought of creating a new script python to get and modificate the name and générate a variable for the principal script
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40556945
You didn't change the regular expression in the sub call to use a \xc9 rather than the É.   The problem is that whatever editor you're using isn't using the right-encoding.  Your É isn't the utf-8 É that is in the webpage.  To get around that, use the \xc9 (which is the utf-8 character code for É).

In other words, change line 37 to:

oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)',

And you can also probably change line 40 back to just:

 print(oname_cleanedup)

Whether you really can or not depends on the codepage that your console is using.  You'll know when you try to print it.  If it bombs on that print line it's because whatever codepage your console is in, it doesn't have a translation for that character.  

If you want to get rid of the debug output all together, you can go back with my original post from way earlier in the day, and just use \xc9 rather than the É:
    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      oname_cleanedup = re.sub(r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)', 
                         lambda m: "{title} {season:02d} Ep {episode}".format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)
      ##fin
    elif menu=="3":

Open in new window

0
 

Author Comment

by:Gaaara
ID: 40556968
arr /$$%$%? :(

i have somme dificulty with indentation you have tool to help me ?

ty for your post
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40556978
You might want to try using a python ide.   Pycharm is a good choice (https://www.jetbrains.com/pycharm/).  There's a free community edition and on occasions they've offered the free full version to students (https://www.jetbrains.com/student/).
0
 

Author Comment

by:Gaaara
ID: 40556981
eem i have a probleme

the modification it works just one mor bug

how to resset the variable in the end of the script ?

By the fact that the information is kept(guarded) for the end of the script and when I begin again the process he(it) keeps(guards) this information and distorts to give them

and on oder link the résult is

Naruto Shippuden -  Épisode 392 - VOSTFR

Open in new window


suposed to

Naruto Shippuden - Ep 392 

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40556991
I'm not understanding what you mean with your first question.  

As far as the second question, you should be able to use the following:

    elif menu=="2":
      #demande de liens 
      olinks=raw_input("Entrer votre liens ")
      #récupération du fichier png & smil
      subprocess.call(["php", "files/adn.php" , olinks])
      #decryption du fichier png
      subprocess.call(["php", "files/AES.class.php" , "tmp/adn.png"])
      #récupération du nom de l'animation
      headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537)'}
      session = requests.Session()
      response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
      soup = BeautifulSoup(response.content)
      oname = soup.find("title")
      if oname.text.find('Saison') >= 0:
          regexp = r'(.*?\s+-\s+S)aison\s+(\d+)\s+\xc9.*?(\d+)(.*)'
          subst = "{title} {season:02d} Ep {episode}"
      else:
          regexp = r'(.*?\s+-)(\s+)\xc9.*?(\d+)(.*)'
          subst = "{title} Ep {episode}"
      oname_cleanedup = re.sub(regexp, 
                         lambda m: subst.format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
                         oname.text)
      print(oname_cleanedup)

Open in new window

0
 

Author Comment

by:Gaaara
ID: 40557000
error
Traceback (most recent call last):
  File "start.py", line 41, in <module>
    oname.text)
  File "/usr/lib64/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
  File "start.py", line 40, in <lambda>
    lambda m: subst.format(title=m.group(1), season=int(m.group(2)), episode=m.group(3)), 
ValueError: invalid literal for int() with base 10: ''

Open in new window

0
 

Author Comment

by:Gaaara
ID: 40557004
for the fris question

my menu is a loop the script end and display de choice menu and i use the same option = get the same title name

exemple

title 1 naruto
title 2 bleach

run script link naruto

résult naruto Ep 01
and back to menu and choose the same option title bleach

result naruto Ep 01
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40557017
lambda m: subst.format(title=m.group(1), season=int(m.group(2)) if m.group(2).find(" ")==-1 else "", episode=m.group(3))

And it shouldn't do that the second time through... The oname_cleanedup variable is set based on the oname.text variable which should be reset by your request.

So, I'd need to see the real code where you're making your request.  Because this line:

response = session.get("http://animedigitalnetwork.fr/video/assassination-classroom/5886-episode-2-lecon-de-base-ball", headers=headers)
     
Can't be your real code because that request is hardcoded and would always return that assassination classroom episode 2 page.
0
 

Author Comment

by:Gaaara
ID: 40557034
Can't be your real code because that request is hardcoded and would always return that assassination classroom episode 2 page.

yes i remplace this with olinks :)

error
 File "/home/gaaara/adn/test2.py", line 21
    oname.text)
        ^
SyntaxError: invalid syntax

Open in new window

0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40557042
Based on the error it looks like your line 21 is probably just:
 
   oname.text)

I'm guessing you put it on a line by itself and it's supposed to be up on line 20.  

And I'm sorry but I think I'm done with this question.  It's eaten up way way too much of my time.  I think you might want to learn a little bit more about python syntax.  You need some basic python knowledge here that I'm guessing might be missing-- indentation, where you can and can't wrap lines.  Sorry but I gotta give this question up.
0
 

Author Closing Comment

by:Gaaara
ID: 40557048
ok ty for your help :)  ^^    It was really appreciated your help
0
 
LVL 25

Expert Comment

by:clockwatcher
ID: 40557057
Looked at it again and I'm guessing what you did is that you got rid of the ending comma "," on line 20.  Anyway... now I really do have to give this one up and good luck to you with your program.
0
 

Author Comment

by:Gaaara
ID: 40557076
it works :)
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A set of related code is known to be a Module, it helps us to organize our code logically which is much easier for us to understand and use it. Module is an object with arbitrarily named attributes which can be used in binding and referencing. …
Dictionaries contain key:value pairs. Which means a collection of tuples with an attribute name and an assigned value to it. The semicolon present in between each key and values and attribute with values are delimited with a comma.  In python we can…
This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.
Suggested Courses

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question