Python, HTML parse error due to malformed code. Help!

Posted on 2009-04-25
Last Modified: 2012-05-06
As per the code below you can see that there is an error in the line:
 '<option selected">Conference</option>', it should read:
 '<option selected="selected">Conference</option>'

The problem is that when I parse the webpage using the HTMLParser module, it falls over saying "HTMLParser.HTMLParseError: malformed start tag, at line 107, column 17" and I agree!

Has anyone any suggestions as to how I overcome the error?

Many many thanks,

Actual from extract:|Visual&term=&submit=Show+matching+events

Amusingly, their motto is "best of the web".

<select name="type">

<option selected">Conference</option>

<option>All events</option>



Firefox correction as per html standard:

<select name="type">

<option selected="selected">Conference</option>

<option>All events</option>



Open in new window

Question by:ScriberUK
    LVL 28

    Expert Comment

    It may also depend on your intention. If you want just to extract some information from the page, then the BeautifulSoup parser may be a better choice for you (see It was designed to cope also with malformed pages. Also, it has very nice features for searching for the extracted information.

    If this is the case, reformulate your wish here.
    LVL 28

    Expert Comment

    Taking back my promises ;) I have just tried to store the page to b.html and to use BeautifulSoup. The truth is that it internally uses HTMLParser and reports the same error when trying to parse it. The snippet below produces:

    Traceback (most recent call last):
      File "C:\tmp\_Python\ScriberUK\", line 9, in <module>
        soup = BeautifulSoup(page)
      File "C:\Python26\lib\site-packages\", line 1499, in __init__
        BeautifulStoneSoup.__init__(self, *args, **kwargs)
      File "C:\Python26\lib\site-packages\", line 1230, in __init__
      File "C:\Python26\lib\site-packages\", line 1263, in _feed
      File "C:\Python26\lib\", line 108, in feed
      File "C:\Python26\lib\", line 148, in goahead
        k = self.parse_starttag(i)
      File "C:\Python26\lib\", line 226, in parse_starttag
        endpos = self.check_for_whole_start_tag(i)
      File "C:\Python26\lib\", line 301, in check_for_whole_start_tag
        self.error("malformed start tag")
      File "C:\Python26\lib\", line 115, in error
        raise HTMLParseError(message, self.getpos())
    HTMLParser.HTMLParseError: malformed start tag, at line 113, column 17
    from BeautifulSoup import BeautifulSoup
    # Get the content of your document (somehow) into one string.
    f = open('b.html')
    page =
    # Parse the string.
    soup = BeautifulSoup(page)
    for opt in soup.findAll('option', ['selected']):
        print opt.string

    Open in new window

    LVL 28

    Accepted Solution

    However, you may be interested in "HTML Tidy" application ( which is capable to fix the page content. There even is a Python wrapper (, but I have no experience with that.

    (By the way, my code above fails at line 11 -- there is a bug in .findAll() second argument which did not manifestated because of earlier problems at line 9.)
    LVL 8

    Assisted Solution

    BeautifulSoup is an excellent tool and I highly recommend it. The reason for the failure is explained in great detail here:

    To summarize, the 3.1.x series of BeautifulSoup was released for compatibility with python 3+, and as such uses HTMLParser instead of SGMLParser which was removed from the python standard library starting with python 3.0. Unfortunately, HTMLParser is not very good at handling malformed html.

    So, as suggested in the article, if you're still using python <= 2.6 you can continue with the 3.0.x series of BeautifulSoup (3.0.7a). Otherwise, you can try one of the other options listed in the article, of which the front runner seems to be html5lib.

    Good luck!

    Author Comment

    Thank you for your answers but I appear to be having a nightmare here...

    BeautifulSoup 3.0.x and 3.1.x fall over, I cannot get html5lib to install under windows. Has anyone installed html5lib?
    LVL 8

    Expert Comment

    I can imagine how 3.1.x fails, but what error messages do you receive with 3.0.x?

    I have not installed html5lib, but could you problems on windows be related to this issue?

    The html5lib page recommend using the 0.12 version from subversion, not sure if you tried that.

    Author Comment

    Thank you all. I've still not had any luck with BeautifulSoup or html5lib, however µTidylib ( does appear to fix the problem!

    However now I have another... how do I pass the doucment object result back into my script? Please see new question:

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    Highfive Gives IT Their Time Back

    Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

    Dictionaries contain key:value pairs. Which means a collection of tuples with an attribute name and an assigned value to it. The semicolon present in between each key and values and attribute with values are delimited with a comma.  In python we can…
    Article by: Swadhin
    Introduction of Lists in Python: There are six built-in types of sequences. Lists and tuples are the most common one. In this article we will see how to use Lists in python and how we can utilize it while doing our own program. In general we can al…
    Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
    Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

    760 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    12 Experts available now in Live!

    Get 1:1 Help Now