Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
?
Solved

Problems importing HTML from a browser into Word

Posted on 2012-09-20
5
Medium Priority
?
487 Views
Last Modified: 2012-09-22
Hi,

For various reasons, I need to import HTML into a Word document.  I display the HTML in Chrome, select the entire page, then paste it into Word. It works great, except for two problems I need help with.

First, if a word is surrounded by bold tags, when it is pasted into Word, it is accurately bolded, but it is surrounded now by non-breaking spaces, which sometimes make a small mess of the layout in Word.

For example, this page has two short sentences, each with a word bolded, one with <b> and the other with CSS:
http://www.App33.com/bold.html

If you load that page in Chrome, select all, and paste into Word, you'll see the non-breaking spaces surrounding the bolded words.  Is there a way to prevent the non-breaking spaces?  Or identify specifically the non-breaking spaces that were wrongly created like that? I know I could search/replace a non-breaking space with a regular space, but that would interfere with other places where I *want* non-breaking spaces.

If I show the html page in Internet Explorer and paste into Word, the problem does not exist, but for a multitude of reasons I need to use Chrome.


The other problem is that sometimes two words that are separate in the HTML will run together without a space in Word.  It happens only infrequently, and always one of the words is bolded and the other is not bolded.  In a 50-page document, this might happen 5 or 6 times.

I have only minimal skills in VBA, but I thought someone might be able to write a tiny program that would find instances where a non-bold character is immediately adjacent to a bold character.  That would allow me to do a quick search and fix all of the instances rather than scouring the entire document visually.

Thanks in advance for anyl help on these issues.
0
Comment
Question by:StevenMiles
  • 3
  • 2
5 Comments
 
LVL 46

Expert Comment

by:aikimark
ID: 38421413
You can do this manually.
1. find/replace all ^s with ^s (font: bold)
2. position the cursor up the the start of the document
3. find ^$^s (font: bold)

You can repeat step three by clicking the Find Next button on the dialog.
0
 

Author Comment

by:StevenMiles
ID: 38422219
Hi, aikimark,
Thank you for responding.  Maybe I'm not understanding, but I don't think this helps me.  If I replace all of the nbsp's with bold-nbsp's, and then search for a bold character adjacent to a bold nbsp, I've just located every place where Word wrongly imported nbsp's around a bold word.  But there are nbsp's that I *want* that are adjacent to bold words, too.

And that doesn't address the more aggravating problem: that sometimes I'll have two words together, like appleorange, and "apple" is not bold but "orange" is bold.  Do you have a way to find those instances?
0
 
LVL 46

Expert Comment

by:aikimark
ID: 38422462
the first part of the problem you described finding non-breaking space characters that follow a bold faced word/character.  I'm providing a means to find those in your document.  You would skip non-breaking space characters that follow a non-bold character.

The find text is <*>
with wildcards enabled.


I did not address the second part of your problem. That will require some code.  This code will find and select mixed formatted words. It starts with the current cursor position, so you'll need to move the cursor after each invocation.
Option Explicit

Sub FindMixed()
    Dim oWd As Range
    For Each oWd In ActiveDocument.Range(Selection.Start, ActiveDocument.Range.End).Words
        If oWd.Font.Bold = wdUndefined Then
            Debug.Print oWd.Text, oWd.Font.Bold
            oWd.Select
            Exit For
        End If
    Next
End Sub

Open in new window

0
 

Author Comment

by:StevenMiles
ID: 38423029
Hi again,
Yeah, I think part of the problem was that I didn't explain myself well enough.

Your code for finding mixed formatted words is exactly right.

Now I think I can describe the issue of the nbsp's better, and I'll bet some short code segment will fix that, too. See if the following makes sense:

The imported document puts nbsp's next to bold words, but there are also bold words next to *nbsp's that I want to keep*, so the find operation we have been discussing will find lots of instances that I don't want to change.

But: I do know exactly what the text is for all of the nbsp's that I want to *keep*, so I could just replace EVERY nbsp with a regular space, and then do a search/replace of, for example, "Bill Smith" with "BillnbspSmith".

The problem with the blanket replacement is that there are nbsp's in *tables* that are needed to create layout spacing, so I can't just do a blanket replacement of all nbsp's with regular spaces.  BUT, I *can* replace all nbsp's *that are adjacent to a character* with regular spaces.  See?  If a nbsp is adjacent to a letter, I can freely replace it with a regular space, and that would solve the problem.  Can you write a snippet that will find and replace those instances?
0
 
LVL 46

Accepted Solution

by:
aikimark earned 2000 total points
ID: 38423394
enable wildcards and do a find/replace all of
>(^s)<
with a single space character
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Without even knowing it, most of us are using web applications on a daily basis.  In fact, Gmail and Yahoo email, Twitter, Facebook, and eBay are used by most of us daily—and they are web applications. We generally confuse these web applications to…
When you put your credit card number into a website for an online transaction, surely you know to look for signs of a secure website such as the padlock icon in the web browser or the green address bar.  This is one way to protect yourself from oth…
This Micro Tutorial will demonstrate how nuggets on the Web are formatted by using Chrome Developer Tools. These tools would not only view the site's CSS but it can also modify it and save the CSS to use on your own site.
Office 365 is currently available in five editions. Three of them are for business use: Office 365 Business Essentials, Office 365 Business, and Office 365 Business Premium. Two of them are for home/personal use: Office 365 Home and Office 365 Perso…

571 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question