How do I format simple HTML tags inside a Word document?

Posted on 2004-04-23
Last Modified: 2012-06-27

I have a Word Document with a Macro which queries a SQL Server database, returning text fields with HTML tags. I would like for the HTML formatting to appear in the Word Document.  It is simple tags, <b> and <i> and &nbsp; and &reg; and that's about it.

How do I get <b>asdf</b> to show up as a boldfaced "asdf" in the Word document?  So far, Word is displaying the HTML tag text, "<b>asdf</b>", with no formatting applied.

Thanks for any ideas.
Question by:jasonwisdom
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 4
  • 3
LVL 21

Expert Comment

by:Eric Fletcher
ID: 10918661
Jason: Turn on the checkbox for Use wildcards in Find and Replace. In the Find what box, type "(\<b\>)(*)(\</b\>)" and "\2" in the Replace with (don't include the quotes). Then press Ctrl-b to set the format in the replace what to bold. When you click Replace All, it will find any string starting with <b> and ending with <\b> and replace it with the same string minus the html tags but in bold. The parentheses set groups; the \n in replace with refer to the group by its sequence in the find what.

If you have other html tags, create a macro to do it all at once. Be sure to turn off the wildcard option at the end so you don't inadvertently leave it set and mess up subsequent finds.

Expert Comment

ID: 10918705
i don't think there's any easy way to do what you want. Could you save the text fields as html files? then you could open them in word and it'd do the conversions. Otherwise, some sort of parsing macro like this:

Sub q()
Dim frmName As String
frmName = "<B>" 'the bold character
Selection.HomeKey unit:=wdStory, Extend:=wdMove
Selection.Find.Execute findtext:=frmName, Forward:=True, MatchWholeWord:=False, Wrap:=wdFindStop
Do Until Selection.Find.Found = False   'find all the bolds
    Selection.Delete                    'replace format mark with bookmark
    Selection.Bookmarks.Add Name:="boldStart"
    frmName = "</" & Right(frmName, 2)  'find the end of formatting
    Selection.Find.Execute findtext:=frmName, Forward:=True, MatchWholeWord:=False, Wrap:=wdFindStop
    Selection.Delete                    'replace end format mark with bookmark
    Selection.Bookmarks.Add Name:="boldEnd"
    Selection.GoTo What:=wdGoToBookmark, Name:="boldStart"
    With Selection          'select the text between the bookmarks
        .Collapse Direction:=wdCollapseStart
        .ExtendMode = True
        Selection.GoTo What:=wdGoToBookmark, Name:="boldEnd"
        .ExtendMode = False
    End With
    Selection.Font.Bold = True          'bold it
    Selection.HomeKey unit:=wdStory
    frmName = "<B>"                     'find if there is another bold
    Selection.Find.Execute findtext:=frmName, Forward:=True, MatchWholeWord:=False, Wrap:=wdFindStop
End Sub

would work for simple ones like <b> and <i>  (i don't remember what &req &nbsp do anymore) .  

LVL 21

Expert Comment

by:Eric Fletcher
ID: 10918711
A bit of clarification to my earlier comment...

The "\" character is necessary before the "<" and ">" because both these characters have special meanings in a wildcard search (beginning and end of words). See Word's help for a more complete rundown on how to use wildcards. /Eric
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.


Author Comment

ID: 10920204
Eric -

What does the "\2" mean?  When I tried it on <i>..</i>, it just removed the <i> and </i> tags.  It did not make the selection italics or boldface.

I am about to try the save as HTML file and then reload into Word.  I am thinking of something like this:

FSO - save file as textfile with extension .htm
Create a new Word document in my VBA
Open the .htm file into the new Word Document
Make the Selection All
Copy and Paste into my original Word Document

Something like that?

Thank you both for your help.

Author Comment

ID: 10922961
Gilbar -

I am looking at saving the file as an .htm file then opening it in Word and having Word do the conversion.  Here is what I came up with so far...

                Documents.Open ("WordTemp.htm")
                i = 1
                While i <= Documents.Count
                    If Len(Documents.Item(i).Content.Text) < 100 Then
                        strProduct = Documents.Item(i).Content.Text
                    End If
                    i = i + 1
                rowNew.Range.Cells.Item(2).Range.Text = strProduct

And the HTML tags are stripped out.  However, the formatting is lost as my <i>..</i> text is not italicized.

Any ideas?


Author Comment

ID: 10923224
Try this code block instead:

                Documents.Open ("WordTemp.htm")
                strProduct = Documents("WordTemp.htm").Content
                Documents("WordTemp.htm").Close SaveChanges:=wdDoNotSaveChanges
                rowNew.Range.Cells.Item(2).Range.Text = strProduct

The code is simpler, but the result is the same:  no italicized text, although the HTML tags have been removed.
LVL 21

Assisted Solution

by:Eric Fletcher
Eric Fletcher earned 50 total points
ID: 10927146
Jason: The "\2" in the Replace with represents the 2nd group from the Find what part of the F&R dialog -- in this case, the "(*)" part which is whatever is between the html tags.

If you lost the html tags but didn't see any bold, you probably didn't have the format set to bold in the Replace with part of the dialog (be sure "Font: Bold" appears). If you do it manually per my instructions, it will definitely work.

However, if you recorded a macro to do it, you'll need to add some extra lines to manage the formatting part of the replace. For some reason, Word doesn't seem to record that part of the dialog! Here is what gets recorded with my added lines as indicated:

Sub Macro2()
    With Selection.Find
        .Text = "(\<b\>)(*)(\</b\>)"
'-- set bold for the find part to false (not essential but good practice)
        .Font.Bold = False
        .Replacement.Text = "\2"
'-- set the format for the replacement text to bold
        .Replacement.Font.Bold = True
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Note that this leaves the F&R box 'loaded' with the options youv'e set up: if you are doing a macro, it would be a good idea to end it by resetting the F&R options (clear formatting, turn off wildcards...)

Some time ago, I set up a somewhat similar macro to clean up HTML text for a very specific project. It used the approach above to convert internal formatting html tags (bold, italic, strong) as well as change paragraphs with bullets and heading level styles into Word format. I could probably dredge it up from some backup but it would have to wait until after tax time (Apr 30 in Canada!). This should give you a good start on getting the same thing.

Accepted Solution

gilbar earned 100 total points
ID: 10928057
Jason, you're losing your formatting when you put it into the string, instead try

then selection.paste after selecting where you what it (the cell or what ever)

Author Comment

ID: 10931454
Thank you, both Eric and Gilbar.

Creating a new document and mixing the Selection.WholeStory, .Cut and .Paste worked.  It converted <b>, <i>, &reg; and <img src="http://"> into display in my original Word Document.  And if any new tags (<u> for example) appear later, I won't have to recode in order to get it to work.

The other 2 ideas were very, very helpful, and through them I feel confident I could have "hacked" everything except for the <img src> tags through that.  But this works much better.

I GREATLY appreciate your time!!!


Expert Comment

ID: 10931544
you're welcome jason!
now humor a person who is obviously going senile and remind me what &req &nbsp do.  I used to know, back in the twentyth century :)

Author Comment

ID: 10932083
me too, I haven't done HTML since 1999!

&nbsp; is non-breaking space.  it's a " ".
&reg; is a Registered symbol.  The ® symbol.
There's one for TM as well...

Expert Comment

ID: 10932142

Featured Post

Enroll in June's Course of the Month

June's Course of the Month is now available! Every 10 seconds, a consumer gets hit with ransomware. Refresh your knowledge of ransomware best practices by enrolling in this month's complimentary course for Premium Members, Team Accounts, and Qualified Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Selection object is designed for user interaction. It has a Range property, so it can be used in most places that a Range object can. Recorded macros must use the Selection because they are simply copying what the user is doing. A Range prope…
I would like to show you some basics you can do with Mailings in MS Word. It´s quite handy feature you can use for creating envelopes, labels, personalized letters etc. First question could be what is this feature good for? Mailing can really he…
This video walks the viewer through the process of creating a watermark for their document, customizing it, and saving it for viewing/printing needs.
This video shows and describes the main difference between both orientations in Microsoft Word. Viewers will understand when to use each orientation and how to get the most out of them.

724 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question