Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

How do I format simple HTML tags inside a Word document?

Posted on 2004-04-23
Last Modified: 2012-06-27

I have a Word Document with a Macro which queries a SQL Server database, returning text fields with HTML tags. I would like for the HTML formatting to appear in the Word Document.  It is simple tags, <b> and <i> and &nbsp; and &reg; and that's about it.

How do I get <b>asdf</b> to show up as a boldfaced "asdf" in the Word document?  So far, Word is displaying the HTML tag text, "<b>asdf</b>", with no formatting applied.

Thanks for any ideas.
Question by:jasonwisdom
  • 5
  • 4
  • 3
LVL 21

Expert Comment

ID: 10918661
Jason: Turn on the checkbox for Use wildcards in Find and Replace. In the Find what box, type "(\<b\>)(*)(\</b\>)" and "\2" in the Replace with (don't include the quotes). Then press Ctrl-b to set the format in the replace what to bold. When you click Replace All, it will find any string starting with <b> and ending with <\b> and replace it with the same string minus the html tags but in bold. The parentheses set groups; the \n in replace with refer to the group by its sequence in the find what.

If you have other html tags, create a macro to do it all at once. Be sure to turn off the wildcard option at the end so you don't inadvertently leave it set and mess up subsequent finds.

Expert Comment

ID: 10918705
i don't think there's any easy way to do what you want. Could you save the text fields as html files? then you could open them in word and it'd do the conversions. Otherwise, some sort of parsing macro like this:

Sub q()
Dim frmName As String
frmName = "<B>" 'the bold character
Selection.HomeKey unit:=wdStory, Extend:=wdMove
Selection.Find.Execute findtext:=frmName, Forward:=True, MatchWholeWord:=False, Wrap:=wdFindStop
Do Until Selection.Find.Found = False   'find all the bolds
    Selection.Delete                    'replace format mark with bookmark
    Selection.Bookmarks.Add Name:="boldStart"
    frmName = "</" & Right(frmName, 2)  'find the end of formatting
    Selection.Find.Execute findtext:=frmName, Forward:=True, MatchWholeWord:=False, Wrap:=wdFindStop
    Selection.Delete                    'replace end format mark with bookmark
    Selection.Bookmarks.Add Name:="boldEnd"
    Selection.GoTo What:=wdGoToBookmark, Name:="boldStart"
    With Selection          'select the text between the bookmarks
        .Collapse Direction:=wdCollapseStart
        .ExtendMode = True
        Selection.GoTo What:=wdGoToBookmark, Name:="boldEnd"
        .ExtendMode = False
    End With
    Selection.Font.Bold = True          'bold it
    Selection.HomeKey unit:=wdStory
    frmName = "<B>"                     'find if there is another bold
    Selection.Find.Execute findtext:=frmName, Forward:=True, MatchWholeWord:=False, Wrap:=wdFindStop
End Sub

would work for simple ones like <b> and <i>  (i don't remember what &req &nbsp do anymore) .  

LVL 21

Expert Comment

ID: 10918711
A bit of clarification to my earlier comment...

The "\" character is necessary before the "<" and ">" because both these characters have special meanings in a wildcard search (beginning and end of words). See Word's help for a more complete rundown on how to use wildcards. /Eric
Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.


Author Comment

ID: 10920204
Eric -

What does the "\2" mean?  When I tried it on <i>..</i>, it just removed the <i> and </i> tags.  It did not make the selection italics or boldface.

I am about to try the save as HTML file and then reload into Word.  I am thinking of something like this:

FSO - save file as textfile with extension .htm
Create a new Word document in my VBA
Open the .htm file into the new Word Document
Make the Selection All
Copy and Paste into my original Word Document

Something like that?

Thank you both for your help.

Author Comment

ID: 10922961
Gilbar -

I am looking at saving the file as an .htm file then opening it in Word and having Word do the conversion.  Here is what I came up with so far...

                Documents.Open ("WordTemp.htm")
                i = 1
                While i <= Documents.Count
                    If Len(Documents.Item(i).Content.Text) < 100 Then
                        strProduct = Documents.Item(i).Content.Text
                    End If
                    i = i + 1
                rowNew.Range.Cells.Item(2).Range.Text = strProduct

And the HTML tags are stripped out.  However, the formatting is lost as well...so my <i>..</i> text is not italicized.

Any ideas?


Author Comment

ID: 10923224
Try this code block instead:

                Documents.Open ("WordTemp.htm")
                strProduct = Documents("WordTemp.htm").Content
                Documents("WordTemp.htm").Close SaveChanges:=wdDoNotSaveChanges
                rowNew.Range.Cells.Item(2).Range.Text = strProduct

The code is simpler, but the result is the same:  no italicized text, although the HTML tags have been removed.
LVL 21

Assisted Solution

EricFletcher earned 50 total points
ID: 10927146
Jason: The "\2" in the Replace with represents the 2nd group from the Find what part of the F&R dialog -- in this case, the "(*)" part which is whatever is between the html tags.

If you lost the html tags but didn't see any bold, you probably didn't have the format set to bold in the Replace with part of the dialog (be sure "Font: Bold" appears). If you do it manually per my instructions, it will definitely work.

However, if you recorded a macro to do it, you'll need to add some extra lines to manage the formatting part of the replace. For some reason, Word doesn't seem to record that part of the dialog! Here is what gets recorded with my added lines as indicated:

Sub Macro2()
    With Selection.Find
        .Text = "(\<b\>)(*)(\</b\>)"
'-- set bold for the find part to false (not essential but good practice)
        .Font.Bold = False
        .Replacement.Text = "\2"
'-- set the format for the replacement text to bold
        .Replacement.Font.Bold = True
        .Forward = True
        .Wrap = wdFindContinue
        .Format = True
        .MatchCase = False
        .MatchWholeWord = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Note that this leaves the F&R box 'loaded' with the options youv'e set up: if you are doing a macro, it would be a good idea to end it by resetting the F&R options (clear formatting, turn off wildcards...)

Some time ago, I set up a somewhat similar macro to clean up HTML text for a very specific project. It used the approach above to convert internal formatting html tags (bold, italic, strong) as well as change paragraphs with bullets and heading level styles into Word format. I could probably dredge it up from some backup but it would have to wait until after tax time (Apr 30 in Canada!). This should give you a good start on getting the same thing.

Accepted Solution

gilbar earned 100 total points
ID: 10928057
Jason, you're losing your formatting when you put it into the string, instead try

then selection.paste after selecting where you what it (the cell or what ever)

Author Comment

ID: 10931454
Thank you, both Eric and Gilbar.

Creating a new document and mixing the Selection.WholeStory, .Cut and .Paste worked.  It converted <b>, <i>, &reg; and <img src="http://"> into display in my original Word Document.  And if any new tags (<u> for example) appear later, I won't have to recode in order to get it to work.

The other 2 ideas were very, very helpful, and through them I feel confident I could have "hacked" everything except for the <img src> tags through that.  But this works much better.

I GREATLY appreciate your time!!!


Expert Comment

ID: 10931544
you're welcome jason!
now humor a person who is obviously going senile and remind me what &req &nbsp do.  I used to know, back in the twentyth century :)

Author Comment

ID: 10932083
me too, I haven't done HTML since 1999!

&nbsp; is non-breaking space.  it's a " ".
&reg; is a Registered symbol.  The ® symbol.
There's one for TM as well...

Expert Comment

ID: 10932142

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I'm writing to share my clumsy experience in using this elegant tool so you can avoid every stupid mistake I made. (I leave it to the authorities to decide if this deserves a place in the Knowledge archives.)  Now that I am on the other side of my l…
Introduction This tutorial provides instructions on how to properly format your Word document using the inbuilt tools provided. The benefits of using these tools means your documents are more accessible and easily portable to other applications an…
This video shows the viewer how to set up and create Footnotes in their document. Click on the References tab: Select "Insert Footnote": Type in desired text:
This video shows where to find the word count, how to display it, and what it breaks down to in Microsoft Word.

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question