Solved

Extracting text from a Word doc, but every line ends with CR/CR

Posted on 2015-02-21
4
147 Views
Last Modified: 2015-02-24
Hi, I'm using this code to extract text from a Word .doc file:

function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count;//get the number of chars to select
    Result:=WordApp.Documents.item(1).Range(0, CharsCount).Text;//Select the text and retrieve the selection
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

Open in new window


It works good except for one thing - every line of text it returns is terminated by a CR/CR (ie. #13#13), instead of a CR/LF (ie. #13#10). Is there a way to have the lines of the extracted text terminated my CR/LF?

Thanks!
    Shawn
0
Comment
Question by:shawn857
  • 2
  • 2
4 Comments
 
LVL 24

Expert Comment

by:jimyX
ID: 40623914
Seems like when copying text by range, it loses the CR&LF.
Better let's use Clipboard:

uses ClipBrd, ComObj;

function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count; //get the number of chars to select
    WordApp.Selection.SetRange(0, CharsCount); //make the selection
    WordApp.Selection.Copy;//copy to the clipboard
    Result:= Clipboard.AsText;//get the text from the clipboard
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

Open in new window

0
 

Author Comment

by:shawn857
ID: 40624690
Thanks Jimy, but the clipboard method runs so much slower than copying text by range. So nothing can be done in the original method to replace CRCR to CRLF?

Thanks
    Shawn
0
 
LVL 24

Accepted Solution

by:
jimyX earned 500 total points
ID: 40625030
> "So nothing can be done in the original method to replace CRCR to CRLF?"

It is possible by using StringReplace. But sounds unsafe to replace every occurrence of #13#13. You better test it carefully.

Result:= StringReplace(CopiedText, CRCR, CRLF, [rfReplaceAll]);

function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count;//get the number of chars to select
    Result:=WordApp.Documents.item(1).Range(0, CharsCount).Text;//Select the text and retrieve the selection
    Result:=StringReplace(Result, #13#13, #13#10, [rfReplaceAll]);
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

Open in new window

0
 

Author Closing Comment

by:shawn857
ID: 40629301
Thanks Jimy!

Cheers
    Shawn
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
FMX enumerated colours 2 98
Printing problem 2 91
How to debug For loops? 3 49
tidtcpserver connection lost handle 2 83
Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…
This video shows how to quickly and easily add an email signature for all users on Exchange 2016. The resulting signature is applied on a server level by Exchange Online. The email signature template has been downloaded from: www.mail-signatures…

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question