Solved

Convert html to doc,txt and rtf formats

Posted on 2003-10-23
11
5,704 Views
Last Modified: 2013-11-20

I have a requirement to convert html file to doc,txt and rtf formats, can I know the way how can I approach towards solving the problem
0
Comment
Question by:jntu_hareesh
  • 5
  • 5
11 Comments
 
LVL 32

Expert Comment

by:jhance
ID: 9613334
Are you looking to program such a solution yourself? (Non-trivial and far beyond the scope of a 50 pt question here.)

Are you looking for a 3rd party solution you can buy? (Off topic!)
0
 
LVL 8

Expert Comment

by:martynjpearson
ID: 9613357
If you have MS Word on your machine, you can use Word automation to open the HTML file and save it as a Word document, RTF document or text file. To do this, follow these steps :

1 Open ClassWizard, press Add Class and select From a Type Library from the drop down menu
2 Locate the type library for MS Word, called something like MSWORDx.OLB, where x is a number that represents the version
3 ClassWizard will list the classes that can be generated from the olb - select them all and hit OK
4 At the top of your source file, include the generated header file, MSWordx.h
5 Add these #defines to your source file

#define wdFormatDocument  0
#define wdFormatTemplate  1
#define wdFormatText  2
#define wdFormatTextLineBreaks  3
#define wdFormatDOSText  4
#define wdFormatDOSTextLineBreaks  5
#define wdFormatRTF  6
#define wdFormatUnicodeText  7

6 Use this code to load in and convert the HTML file to a Word document :

    COleVariant varEmpty(DISP_E_PARAMNOTFOUND, VT_ERROR);
    COleVariant varTrue((short)VARIANT_TRUE, VT_BOOL), varFalse((short)VARIANT_FALSE, VT_BOOL);
   _Application   * pWordApp = new _Application;

   if (pWordApp->CreateDispatch("Word.Application", NULL))
   {
       COleVariant varOpt((long)DISP_E_PARAMNOTFOUND, VT_ERROR);

       //Create a new document
       Documents oDocs(pWordApp->GetDocuments());

       COleVariant varFilename("E:\\martyn\\HTMLTest\\Test.htm");
       _Document doc(oDocs.Open(&varFilename, &varFalse, &varEmpty, &varFalse, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty));
       COleVariant varNewFilename("E:\\martyn\\HTMLTest\\Test.doc");
       COleVariant varDocFormat((long)wdFormatDocument);
       doc.SaveAs(&varNewFilename, &varDocFormat, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty);

       delete pWordApp;
   }
   else
   {
       delete pWordApp;
       AfxMessageBox("Failed to invoke Microsoft Word");
   }

Use the appropriate values in the #defines if you want to save to another format

Hope this helps
Martyn
0
 

Author Comment

by:jntu_hareesh
ID: 9648908
Hi Martyn,
   Thankyou for your help.

   I have MSWORD9.OLB in my system, I have followed the same procedure sent by you, but when I compile it is not giving any error, but when I try to run the application it is giving the error message "Failed to invoke Microsoft Word", when I debug the code, it is failing at
   if( pWordApp->CreateDispatch("Word.Application", NULL))
  and directly coming to else part, can you help me why this function is failing, is it the problem with the latest MSWORD9.OLB or any other reason.

   I will be awaiting for your reply.

thanks,
  hareesh.
0
 

Author Comment

by:jntu_hareesh
ID: 9649008
Hi Martyn,
   Sorry I have forgotten to say I have added two more parameters to
Open, the last two parameters I have added are &varEmpty, &varEmpty, that's all except this I have not done any modifications in the code.

_Document doc(oDocs.Open(&varFilename, &varFalse, &varEmpty, &varFalse, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty,&varEmpty, &varEmpty));

bye,
  hareesh
0
 
LVL 8

Expert Comment

by:martynjpearson
ID: 9649041
Do you have Word installed on the machine on which you are running your application? Automation effectively runs the server application (in this case MS Word) in the background, so it must be installed on your machine.

The changes you made to the call to Open() is probably because my code was put together using version 8, not 9 of the OLB file.

All the best
Martyn
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 

Author Comment

by:jntu_hareesh
ID: 9656649
Thankyou Martyn,
 
  I am able to export to html file to doc,rtf and txt formats, the only mistake what I have done previosly is, I did not select Container option in Compound Document Support, that is the reason it was giving error.

  If I am working with VC++6 it is working fine, but when I am working with vc++.net , when I try to add the class by using type library it is converting all the interfaces to the wrapper classes, ex...
_Application as CApplication, Documents as CDocuments,_Document as CDocument0....

I have converted the code sent by u in this format...

COleVariant varEmpty(DISP_E_PARAMNOTFOUND, VT_ERROR);
    COleVariant varTrue((short)VARIANT_TRUE, VT_BOOL), varFalse((short)VARIANT_FALSE, VT_BOOL);
    CApplication   objWordApp;

   if (objWordApp.CreateDispatch("Word.Document"))
   {
       COleVariant varOpt((long)DISP_E_PARAMNOTFOUND, VT_ERROR);

       //Create a new document
       CDocuments oDocs(objWordApp.get_Documents());

       COleVariant varFilename("C:\Documents and Settings\venkatahareesh\Desktop\aaaa.html");
       CDocument0 doc(oDocs.Open(&varFilename, &varFalse, &varEmpty, &varFalse, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty));
       COleVariant varNewFilename("C:\\Documents and Settings\\venkatahareesh\\Desktop\\TestExport.doc");
       COleVariant varDocFormat((long)wdFormatDocument);
       doc.SaveAs(&varNewFilename, &varDocFormat, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty, &varEmpty);

       //delete pWordApp;
   }
   else
   {
       //delete pWordApp;
       AfxMessageBox("Failed to invoke Microsoft Word");

   }


It is compiling and executing fine, but failing to invoke Microsoft Word, when I tried to debug it, it is failing at  
(objWordApp.CreateDispatch("Word.Document"))


Can you please suggest me some solution for this, like how can I do this program by using vc++.net,
waiting for a positive reply....
hareesh.
0
 
LVL 8

Expert Comment

by:martynjpearson
ID: 9656686
Ah, I think the problem may have been my fault in the initial code - I didn't add that you need to call AfxOleInit() to initialise the OLE DLL's before you start. Sorry about that!!

I'm afraid I've not had chance to do that much with .NET yet, so I'm no expert, but I would imagine that there could be a similar sort of problem - you may need to call AfxOleInit() before you can call CreateDispatch(). I usually call it in the constructor of the application object.

Hope this helps
Martyn
0
 

Author Comment

by:jntu_hareesh
ID: 9676530
Hi Martyn,
    I am able to work with the same code even with vc++.net also, i want to put finally one question, the document is getting saved but when we open the document "File in use" dialog is coming, which contains the information as following.

export.doc is locked for editing by 'another user', click notify to open read-only copy of the document and receive notification when the document is no longer in use.
and three buttons are appearing ReadOnly, Notify and Cancel,
When I click cancel, when I try to open the document is not getting opened, when I click ReadOnly or Notify it is getting opened and from next time it is not showing dialog,

If I don't want this dialog to appear and directly open the document, need I change any parameters in the code?

waiting for your reply,
 hareesh.
0
 
LVL 8

Accepted Solution

by:
martynjpearson earned 50 total points
ID: 9676983
I don't think this is anything to do with the code as such - the code is reporting that the document is open somewhere. It could be that there is another instance of Word running that has it open (it could be an instance of Word that is being run by another instance of this app, or an instance of Word that has been left "hanging" when the parent app crashed or didn't terminate Word for some reason).

You can look for stray instances of Word in task manager - use End Process to end any stray instances of the exe (WINWORD.EXE), but make sure that you don't have any documents open in Word as there is no way of knowing which instance is which!!

Hope this helps
Martyn
0
 

Author Comment

by:jntu_hareesh
ID: 10061175
Hi Martyn,

      I have written the following code to check wether MSWORD is installed in my machine,
      everytime when I run this application a new WINWORD.exe is running in the process, this
      I   have observed by opening task manager, can you please guide me why ReleaseDispatch
      is not releasing the memory.

       CApplication   objWordApp;//= new CApplication;

      bool isWordApp = true;

      isWordApp = objWordApp.CreateDispatch("Word.Application");
   
      if ( !isWordApp)      {
            isWordInstalled = false;
            AfxMessageBox ( " MS Word is not installed in your machine." ) ;
      }
      objWordApp.ReleaseDispatch() ;


waiting for your reply,
  hareesh.
0
 
LVL 8

Expert Comment

by:martynjpearson
ID: 10061353
ReleaseDispatch does not actually close Word, it just closes the "interface" that you have to the instance of Word.

You could call the Quit() method of the Word application object to close Word. I use the following method to determine if Word is installed on a machine :

bool IsMSWordInstalled()
{
    bool bInstalled = false;

    HKEY hKeyWordVersion;
    if (RegOpenKeyEx(HKEY_CLASSES_ROOT, "Word.Application\\CurVer", 0, KEY_READ, &hKeyWordVersion) == 0)
    {
        CString strVersionString;
        DWORD dwType = REG_SZ, dwSize = 128;
        DWORD dwSuccess = RegQueryValueEx(hKeyWordVersion, NULL, NULL, &dwType, (BYTE *)strVersionString.GetBuffer(dwSize + 1), &dwSize);
        strVersionString.ReleaseBuffer();

        if (dwSuccess == 0)
        {
            if (strVersionString.Left(17) == "Word.Application.")
            {
                int nVersion = atol(strVersionString.Mid(17));
                if (nVersion >= 8) // Corresponds to MS Word 97
                {
                    bInstalled = true;
                }
            }
        }

        RegCloseKey(hKeyWordVersion);
    }

    return bInstalled;
}

All the best
Martyn
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

Introduction: Displaying information on the statusbar.   Continuing from the third article about sudoku.   Open the project in visual studio. Status bar – let’s display the timestamp there.  We need to get the timestamp from the document s…
Introduction: The undo support, implementing a stack. Continuing from the eigth article about sudoku.   We need a mechanism to keep track of the digits entered so as to implement an undo mechanism.  This should be a ‘Last In First Out’ collec…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
Here's a very brief overview of the methods PRTG Network Monitor (https://www.paessler.com/prtg) offers for monitoring bandwidth, to help you decide which methods you´d like to investigate in more detail.  The methods are covered in more detail in o…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now