Convert doc to Html to database coln thru lateBinding

Hi All,

I am trying to convert an Word doc to Html and this Html's source code is read and put in sql database column(ntext datatype).

I had few problems trying to do this, but purchased the code from EE. I guess the accepted code was of "Richie_Simonetti" of date 08/03/2001 10:29AM

The problem is as follows:

1)First of all, the code works fine if i give a "Reference" to the Word object library, and then use the word object to open a doc file, and "SaveAs" html file.

After I save the doc as Htm file, I give reference to filesystem and text stream to read the html file and insert into my database 'ntext' column. The reason I am stressing on 'ntext' datatype is because anything other than 'ntext' does not take the formatted doc fike, like bold, the colors, tables etc. That's why 'ntext'

But my machine has Windows2000 Professional configured so in my source code i had referenced word 10.0 library.

And my clients are all using Word 97, so there is this problem of 'version conflict'

So I thought of using 'late binding' which took care of the version conflict, put was not converting the doc to html in the proper way(I could see all junk, small boxes)

2)Second is my clients may have in their doc files some images, like charts and pie diagrams, so when I convert this doc to html it creates a separate folder(with images as it always does if you save as html).

I was wondering if sql database has a datatype which will hold text, formatted text and 'images' as in this case.

Am I on the right track, any info, help would be apprciated or maybe a workaround.

Here's the synopsis:
-how to late bind because of version conflict,
-convert it into proper html format without junk,
-and third if at all images are there in the doc file, what is the workaround.

Thanks all,


Who is Participating?
Guy Hengel [angelIII / a3]Connect With a Mentor Billing EngineerCommented:
-how to late bind:
 use code like this
 DIM objWord as Object
 SET objWord = CreateObject("Word.Application")

-convert without junk:
 I fear this could be a Word97 problem (checking...)
-other datatype:
 IMAGE, similar use than NTEXT, but only stored binary data instead of interpreting caracter strings, which could lead to junk data.

TimCotteeHead of Software ServicesCommented:
The only work-around for the images would be to create another table which references the original one and store the images in seperate records in IMAGE datatype in this table. This way you can preserve the structure with the html.

Alternatively you can skip the formatting as html, if you have embedded images etc and really must store them in your sql database then why don't you simply stream the original .doc into an IMAGE datatype?
priya_pbkAuthor Commented:
yes, i too wrote the synatax for late binding that way. But with late binding and SaveAs syntax, it gives me junk.
I tried both the ways:

1)windows2000 Professional(My PC):
 a)Early binding(refr Word 10.0 library)+saveAs-->"Success"

2)Different Machine(Word 97 installed pc) opened the source code and referenced the windows97 object library
 b)lateBinding-->did not check at all


Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

priya_pbkAuthor Commented:
I think the images problem can wait for a while. What I am stuck up is with the conversion of doc to html.

If I can make this workable, i can proceed with the images part. I am not able to convert word97 doc to into html?

But i guess the problem is with late binding, Am i doing this right? This is what I am doing:
Dim wapp As Object
Set wapp = CreateObject("Word.Application")
Set Doc = wapp.Documents.Open(CStr(txtWFileToOpen), True)

Doc.SaveAs FileName:="C:\TmpstockIdeaFiles\toShow.htm", FileFormat:= wdFormatHTML

So where am i doing wrong ....???
I don't think your problem is with the binding. Your Junk HTML is being created by Word '97. The HTML Converter in Word '97 is notoriously poor at generating clean HTML. Later versions of the Office products generate XML documents which are much prettier. (In fact the only Office '97 product that created clean HTML was Excel, no wonder there)
priya_pbkAuthor Commented:
This question is to TimCottee:

You said:
"Alternatively you can skip the formatting as html, if you have embedded images etc and really must store
them in your sql database then why don't you simply stream the original .doc into an IMAGE datatype? "

If this is so then,

-1)how can I do that, i mean the synatx for converting the doc to image and then store that image in sql database, is it simlar to SaveAs with a different parameter for Htmlformat?

-2) What about the size and wont the image be heavy and take time loading in the web page (I guess the image will be large enough coz conversion of 4-5 pages of doc will surely yield into a huge size of image.

just curious, wanted to know?


TimCottee is not saying convert the word doc to an image but actually storing the document itself in the database using an 'image' datatype. You can then query the DB for the document and it will have all your images embedded into the binary file stored in the field
priya_pbkAuthor Commented:
I think I will grant angelIII the points, because angelIII  was closer to the probable solution and also the first to answer.

Thanks everyone for the inputs and suggestion


Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.