Solved

Extracting text from a Word doc, but every line ends with CR/CR

Posted on 2015-02-21
4
153 Views
Last Modified: 2015-02-24
Hi, I'm using this code to extract text from a Word .doc file:

function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count;//get the number of chars to select
    Result:=WordApp.Documents.item(1).Range(0, CharsCount).Text;//Select the text and retrieve the selection
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

Open in new window


It works good except for one thing - every line of text it returns is terminated by a CR/CR (ie. #13#13), instead of a CR/LF (ie. #13#10). Is there a way to have the lines of the extracted text terminated my CR/LF?

Thanks!
    Shawn
0
Comment
Question by:shawn857
  • 2
  • 2
4 Comments
 
LVL 24

Expert Comment

by:jimyX
ID: 40623914
Seems like when copying text by range, it loses the CR&LF.
Better let's use Clipboard:

uses ClipBrd, ComObj;

function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count; //get the number of chars to select
    WordApp.Selection.SetRange(0, CharsCount); //make the selection
    WordApp.Selection.Copy;//copy to the clipboard
    Result:= Clipboard.AsText;//get the text from the clipboard
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

Open in new window

0
 

Author Comment

by:shawn857
ID: 40624690
Thanks Jimy, but the clipboard method runs so much slower than copying text by range. So nothing can be done in the original method to replace CRCR to CRLF?

Thanks
    Shawn
0
 
LVL 24

Accepted Solution

by:
jimyX earned 500 total points
ID: 40625030
> "So nothing can be done in the original method to replace CRCR to CRLF?"

It is possible by using StringReplace. But sounds unsafe to replace every occurrence of #13#13. You better test it carefully.

Result:= StringReplace(CopiedText, CRCR, CRLF, [rfReplaceAll]);

function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count;//get the number of chars to select
    Result:=WordApp.Documents.item(1).Range(0, CharsCount).Text;//Select the text and retrieve the selection
    Result:=StringReplace(Result, #13#13, #13#10, [rfReplaceAll]);
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

Open in new window

0
 

Author Closing Comment

by:shawn857
ID: 40629301
Thanks Jimy!

Cheers
    Shawn
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…
Email security requires an ever evolving service that stays up to date with counter-evolving threats. The Email Laundry perform Research and Development to ensure their email security service evolves faster than cyber criminals. We apply our Threat…

856 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question