parsing HTML tags within text field in an Access table

Posted on 2005-03-09
Medium Priority
Last Modified: 2006-11-17
Within my mdb, table named products, field named description, html tags are within the data.  I need to remove the html tags, but keep all the data.  I can't figure out an easy way to do it.
Question by:crowegreg
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2

Expert Comment

ID: 13503961
Something that might work:
1. select all the data from the table
2. paste it into the code pane of a wysiwyg html editor
3. switch to the browser (display) pane
4. copy the resulting output and paste it into notepad to strip the formatting
5. copy and paste the unformatted text back into the tabe (or import it).

The chances of this working are slim, especially if you have line breaks <b />, paragraphs <p />, lists <ul /> or <ol /> or similar.

The only other option I can think of is write a function to:
1. loop through the table, returning one description field at a time.
2. for each field, store the contents in a string variable
3. go to the first character.
4. look for the next < character.
5. from this < character, go on to the first > character.
6. look at the data between the < and >. if it looks like an HTML tag, delete the string, including the < and >; if not then goto step 7.
7. repeat from step 4 until no more < characters are found.
8. repeat from step 2 until you reach the end of the table.

Hope this helps,


Accepted Solution

rvooijs earned 750 total points
ID: 13504120

Since '<' and '>' characters in html are stored as '&lt;' and '&gt;', you can consider everything between
'<' and '>' as html-tags. After removing those you should translate the '&lt;' etc. codes to the original character.

    s = "string with <b>html</b> tags and &lt; and &gt; codes."
    lt = instr(s, "<")
    while (lt>0)
        gt = instr(lt, s, ">")
        if (gt>0) then
            s = left(s, lt-1) & mid(s, gt+1)
        end if
        lt = instr(s, "<")

    s = replace(s, "&lt;", "<")
    s = replace(s, "&gt;", ">")
    s = replace(s, "&nbsp;", " ")

I think there are some more &-codes that you should check, but otherwise it should work.


Expert Comment

ID: 13504154
cheers Robert, I'd forgotten about the & codes!


Author Comment

ID: 13510522
This table gets recreated on a daily basis.  So I need to write a procedure to handle this.  I'll start working using the code above.

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This post looks at MongoDB and MySQL, and covers high-level MongoDB strengths, weaknesses, features, and uses from the perspective of an SQL user.
In this article, I’ll look at how you can use a backup to start a secondary instance for MongoDB.
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
In this video, Percona Solution Engineer Dimitri Vanoverbeke discusses why you want to use at least three nodes in a database cluster. To discuss how Percona Consulting can help with your design and architecture needs for your database and infras…

800 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question