• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 375
  • Last Modified:

Convert MS word file to text file

Hi,

I have to convert a MS-word (*.doc/*.docx) file to a text file (*.txt) using a VC++ and MFC.

The problem faced while conversion is that When I try to convert word file to text file the text file is not in the readable form. It shows the text in the form like "´•IOÃ0…ïHü‡ÈW”¸p@5åÀr„JqvIj/ò¸Û¿gÜ%j«¶)P.‘ç½÷yœ™tfºN&àQY“³ë¬Ã0ÒÊT9û¼¤w,Á L!jk  ".

Reverse conversion, I mean from text to *.doc is working fine.

I checked the font properties are also same. Even I pasted the converted garbage text to word file, but it produced the same garbage one.

Welcome if any further information required.

Thanks
0
harshvir_drish
Asked:
harshvir_drish
  • 3
  • 2
1 Solution
 
AndyAinscowFreelance programmer / ConsultantCommented:
How do you perform the conversion?
0
 
harshvir_drishAuthor Commented:
I am doing this task using VC+ program.

Pl. find the code snippet , developed by me for this purpose.

         
CFile File1;
	char Buff[15024];
	File1.Open(m_SourceFile,CFile::modeRead);
	UINT Bytes = File1.Read(Buff,15024);
	CFile File2;
	File2.Open(L"c:\\MyFile.txt",CFile::modeCreate|CFile::modeWrite);
	File2.Write(Buff,Bytes);

Open in new window


Waiting for your appreciable response.
0
 
AndyAinscowFreelance programmer / ConsultantCommented:
Thought so.  A word document is not text, it has other information in it.  What you are doing is effectively renaming xx.doc to xx.txt.  Hence you see garbage.
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
harshvir_drishAuthor Commented:
Yes I tested the same by opening the same word file with notepad and the output is same as generated by my code.

So, Can you guide further?

Thanks for your quick and appreciable response.
0
 
pgorodCommented:
1. I would do it from within Word, with a Macro, a simple "Save as..." text.

From MFC, there are ways to handle Word from there (COM automation object):
http://support.microsoft.com/kb/196776

But the basic idea is that Microsoft Word knows about it's format, you should use their code to convert.

2. A totally different approach, only for docx format (does not work for .doc!) is that you can unzip a docx and you will see text files with the document content; these are in XML and the format is documented by Microsoft.
0
 
AndyAinscowFreelance programmer / ConsultantCommented:
>>So, Can you guide further?

Two possiblities spring to mind.
Already mentioned is controlling word via automation.
The other is to send a message to word to copy the open document to the clipboard, then your app reads the clipboard and saves to .txt file.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now