asked on

parsing HTML tags within text field in an Access table

Within my mdb, table named products, field named description, html tags are within the data. I need to remove the html tags, but keep all the data. I can't figure out an easy way to do it.

solution46

Something that might work:
1. select all the data from the table
2. paste it into the code pane of a wysiwyg html editor
3. switch to the browser (display) pane
4. copy the resulting output and paste it into notepad to strip the formatting
5. copy and paste the unformatted text back into the tabe (or import it).

The chances of this working are slim, especially if you have line breaks <b />, paragraphs <p />, lists <ul /> or <ol /> or similar.

The only other option I can think of is write a function to:
1. loop through the table, returning one description field at a time.
2. for each field, store the contents in a string variable
3. go to the first character.
4. look for the next < character.
5. from this < character, go on to the first > character.
6. look at the data between the < and >. if it looks like an HTML tag, delete the string, including the < and >; if not then goto step 7.
7. repeat from step 4 until no more < characters are found.
8. repeat from step 2 until you reach the end of the table.

Hope this helps,

s46.

ASKER CERTIFIED SOLUTION

rvooijs

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

solution46

cheers Robert, I'd forgotten about the & codes!

s46.

crowegreg

ASKER

This table gets recreated on a daily basis. So I need to write a procedure to handle this. I'll start working using the code above.