asked on

In Python, how do I .find the '#'

I am trying to write code that can handle html anchor tags. I need to test a string to see if it contains the '#'. The code is returning URLs without '#' in them.

Thanks in advance

currTestIndex = currURL.find("#")
    if(currTestIndex == -1 or currTestIndex >= len(currURL)):
       print currURL
       return currURL

Open in new window

Superdave

Your code works for me. Can you post a larger section of your program, or show an example that doesn't work?

ASKER CERTIFIED SOLUTION

jonberg

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

pepr

You may be interested in the standard module urlparse (http://docs.python.org/release/2.4.3/lib/module-urlparse.html; it was renamed/moved to urllib.parse in Python 3). Try the snippet below. It writes:

C:\tmp\___python\jonberg\Q_25855993>a.py
http
docs.python.org
/release/2.4.3/lib/module-urlparse.html
xyz
ParseResult(scheme='http', netloc='docs.python.org', path='/release/2.4.3/lib/module-urlparse.html', params='', query='', fragment='xyz')
('http', 'docs.python.org', '/release/2.4.3/lib/module-urlparse.html', '', '', 'xyz')
http://docs.python.org/release/2.4.3/lib/module-urlparse.html

import urlparse

url = 'http://docs.python.org/release/2.4.3/lib/module-urlparse.html#xyz'

x = urlparse.urlparse(url)

print x.scheme
print x.netloc
print x.path
print x.fragment
print 
print x

# The ParseResult class is actually derived from the tuple class.
print tuple(x)

# Construct the new URL from the parts.
nu = urlparse.urlunparse((x.scheme, x.netloc, x.path, '', '', ''))
print nu

Open in new window