Problems importing HTML from a browser into Word

Posted on 2012-09-20
Last Modified: 2012-09-22

For various reasons, I need to import HTML into a Word document.  I display the HTML in Chrome, select the entire page, then paste it into Word. It works great, except for two problems I need help with.

First, if a word is surrounded by bold tags, when it is pasted into Word, it is accurately bolded, but it is surrounded now by non-breaking spaces, which sometimes make a small mess of the layout in Word.

For example, this page has two short sentences, each with a word bolded, one with <b> and the other with CSS:

If you load that page in Chrome, select all, and paste into Word, you'll see the non-breaking spaces surrounding the bolded words.  Is there a way to prevent the non-breaking spaces?  Or identify specifically the non-breaking spaces that were wrongly created like that? I know I could search/replace a non-breaking space with a regular space, but that would interfere with other places where I *want* non-breaking spaces.

If I show the html page in Internet Explorer and paste into Word, the problem does not exist, but for a multitude of reasons I need to use Chrome.

The other problem is that sometimes two words that are separate in the HTML will run together without a space in Word.  It happens only infrequently, and always one of the words is bolded and the other is not bolded.  In a 50-page document, this might happen 5 or 6 times.

I have only minimal skills in VBA, but I thought someone might be able to write a tiny program that would find instances where a non-bold character is immediately adjacent to a bold character.  That would allow me to do a quick search and fix all of the instances rather than scouring the entire document visually.

Thanks in advance for anyl help on these issues.
Question by:StevenMiles
    LVL 44

    Expert Comment

    You can do this manually.
    1. find/replace all ^s with ^s (font: bold)
    2. position the cursor up the the start of the document
    3. find ^$^s (font: bold)

    You can repeat step three by clicking the Find Next button on the dialog.

    Author Comment

    Hi, aikimark,
    Thank you for responding.  Maybe I'm not understanding, but I don't think this helps me.  If I replace all of the nbsp's with bold-nbsp's, and then search for a bold character adjacent to a bold nbsp, I've just located every place where Word wrongly imported nbsp's around a bold word.  But there are nbsp's that I *want* that are adjacent to bold words, too.

    And that doesn't address the more aggravating problem: that sometimes I'll have two words together, like appleorange, and "apple" is not bold but "orange" is bold.  Do you have a way to find those instances?
    LVL 44

    Expert Comment

    the first part of the problem you described finding non-breaking space characters that follow a bold faced word/character.  I'm providing a means to find those in your document.  You would skip non-breaking space characters that follow a non-bold character.

    The find text is <*>
    with wildcards enabled.

    I did not address the second part of your problem. That will require some code.  This code will find and select mixed formatted words. It starts with the current cursor position, so you'll need to move the cursor after each invocation.
    Option Explicit
    Sub FindMixed()
        Dim oWd As Range
        For Each oWd In ActiveDocument.Range(Selection.Start, ActiveDocument.Range.End).Words
            If oWd.Font.Bold = wdUndefined Then
                Debug.Print oWd.Text, oWd.Font.Bold
                Exit For
            End If
    End Sub

    Open in new window


    Author Comment

    Hi again,
    Yeah, I think part of the problem was that I didn't explain myself well enough.

    Your code for finding mixed formatted words is exactly right.

    Now I think I can describe the issue of the nbsp's better, and I'll bet some short code segment will fix that, too. See if the following makes sense:

    The imported document puts nbsp's next to bold words, but there are also bold words next to *nbsp's that I want to keep*, so the find operation we have been discussing will find lots of instances that I don't want to change.

    But: I do know exactly what the text is for all of the nbsp's that I want to *keep*, so I could just replace EVERY nbsp with a regular space, and then do a search/replace of, for example, "Bill Smith" with "BillnbspSmith".

    The problem with the blanket replacement is that there are nbsp's in *tables* that are needed to create layout spacing, so I can't just do a blanket replacement of all nbsp's with regular spaces.  BUT, I *can* replace all nbsp's *that are adjacent to a character* with regular spaces.  See?  If a nbsp is adjacent to a letter, I can freely replace it with a regular space, and that would solve the problem.  Can you write a snippet that will find and replace those instances?
    LVL 44

    Accepted Solution

    enable wildcards and do a find/replace all of
    with a single space character

    Featured Post

    Do You Know the 4 Main Threat Actor Types?

    Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

    Join & Write a Comment

    #Citrix #Internet Explorer #Enterprise Mode #IE 11 #IE 8
    Building a website can seem like a daunting task to the uninitiated but it really only requires knowledge of two basic languages: HTML and CSS.
    The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …
    Shows how to create a shortcut to site-search Experts Exchange using Google in the Chrome browser. This eliminates the need to type out whenever you want to search the site. Launch the Search Engine Menu: In chrome, via you…

    729 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    18 Experts available now in Live!

    Get 1:1 Help Now