jonberg
asked on
In Python, how do I .find the '#'
I am trying to write code that can handle html anchor tags. I need to test a string to see if it contains the '#'. The code is returning URLs without '#' in them.
Thanks in advance
Thanks in advance
currTestIndex = currURL.find("#")
if(currTestIndex == -1 or currTestIndex >= len(currURL)):
print currURL
return currURL
Your code works for me. Can you post a larger section of your program, or show an example that doesn't work?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
You may be interested in the standard module urlparse (http://docs.python.org/release/2.4.3/lib/module-urlparse.html; it was renamed/moved to urllib.parse in Python 3). Try the snippet below. It writes:
C:\tmp\___python\jonberg\Q _25855993> a.py
http
docs.python.org
/release/2.4.3/lib/module- urlparse.h tml
xyz
ParseResult(scheme='http', netloc='docs.python.org', path='/release/2.4.3/lib/m odule-urlp arse.html' , params='', query='', fragment='xyz')
('http', 'docs.python.org', '/release/2.4.3/lib/module -urlparse. html', '', '', 'xyz')
http://docs.python.org/release/2.4.3/lib/module-urlparse.html
C:\tmp\___python\jonberg\Q
http
docs.python.org
/release/2.4.3/lib/module-
xyz
ParseResult(scheme='http',
('http', 'docs.python.org', '/release/2.4.3/lib/module
http://docs.python.org/release/2.4.3/lib/module-urlparse.html
import urlparse
url = 'http://docs.python.org/release/2.4.3/lib/module-urlparse.html#xyz'
x = urlparse.urlparse(url)
print x.scheme
print x.netloc
print x.path
print x.fragment
print
print x
# The ParseResult class is actually derived from the tuple class.
print tuple(x)
# Construct the new URL from the parts.
nu = urlparse.urlunparse((x.scheme, x.netloc, x.path, '', '', ''))
print nu