?
Solved

parsing HTML tags within text field in an Access table

Posted on 2005-03-09
4
Medium Priority
?
334 Views
Last Modified: 2006-11-17
Within my mdb, table named products, field named description, html tags are within the data.  I need to remove the html tags, but keep all the data.  I can't figure out an easy way to do it.
0
Comment
Question by:crowegreg
  • 2
4 Comments
 
LVL 9

Expert Comment

by:solution46
ID: 13503961
Something that might work:
1. select all the data from the table
2. paste it into the code pane of a wysiwyg html editor
3. switch to the browser (display) pane
4. copy the resulting output and paste it into notepad to strip the formatting
5. copy and paste the unformatted text back into the tabe (or import it).

The chances of this working are slim, especially if you have line breaks <b />, paragraphs <p />, lists <ul /> or <ol /> or similar.

The only other option I can think of is write a function to:
1. loop through the table, returning one description field at a time.
2. for each field, store the contents in a string variable
3. go to the first character.
4. look for the next < character.
5. from this < character, go on to the first > character.
6. look at the data between the < and >. if it looks like an HTML tag, delete the string, including the < and >; if not then goto step 7.
7. repeat from step 4 until no more < characters are found.
8. repeat from step 2 until you reach the end of the table.

Hope this helps,

s46.
0
 
LVL 6

Accepted Solution

by:
rvooijs earned 750 total points
ID: 13504120
Hi,

Since '<' and '>' characters in html are stored as '&lt;' and '&gt;', you can consider everything between
'<' and '>' as html-tags. After removing those you should translate the '&lt;' etc. codes to the original character.

    s = "string with <b>html</b> tags and &lt; and &gt; codes."
    lt = instr(s, "<")
    while (lt>0)
        gt = instr(lt, s, ">")
        if (gt>0) then
            s = left(s, lt-1) & mid(s, gt+1)
        end if
        lt = instr(s, "<")
    wend

    s = replace(s, "&lt;", "<")
    s = replace(s, "&gt;", ">")
    s = replace(s, "&nbsp;", " ")


I think there are some more &-codes that you should check, but otherwise it should work.

Success,
Robert
0
 
LVL 9

Expert Comment

by:solution46
ID: 13504154
cheers Robert, I'd forgotten about the & codes!

s46.
0
 

Author Comment

by:crowegreg
ID: 13510522
This table gets recreated on a daily basis.  So I need to write a procedure to handle this.  I'll start working using the code above.
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What we learned in Webroot's webinar on multi-vector protection.
Blockchain technology enhances society similar to the Internet. Its effects are broad, disruptive, and will boost global productivity.
Video by: Steve
Using examples as well as descriptions, step through each of the common simple join types, explaining differences in syntax, differences in expected outputs and showing how the queries run along with the actual outputs based upon a simple set of dem…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses

579 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question