Solved

How to read pdf/xls/doc files in my app and be able to read the contents?

Posted on 2015-02-11
8
271 Views
Last Modified: 2015-02-19
Hi, I have an app that reads plain ascii text files (.txt, .csv) and does processing on the strings in those files. Is there a way in Delphi to directly read in .pdf/.xls/.doc files and have them readable as well?

Thanks
    Shawn

P.S: I use D7.
0
Comment
Question by:shawn857
  • 3
  • 3
  • 2
8 Comments
 
LVL 26

Assisted Solution

by:Sinisa Vuk
Sinisa Vuk earned 100 total points
ID: 40605259
Try to read similar questions/answers on EE first:
Q_26787229, Q_28214245, Q_28573693

For short - there is no all-in-one solution. Depending on file extensions you should run appropriate reader.

Other:
tmssoftware's flexcel, kluug - xlsx-ods-delphi

Do you want to read for some strings in files or you want to show real thing to customer?
0
 

Author Comment

by:shawn857
ID: 40605304
Thanks Sinisa, I will have a look at that.

I just want my app to internally read the strings... no need to show real thing to user.

Thanks!
   Shawn
0
 
LVL 24

Expert Comment

by:jimyX
ID: 40605413
Hi Shawn,
Do you need to extract all text from those files?

OR

Do all your files look alike, and you have a pattern for reading strings?

Could you give samples of what PDF, XLS and DOC files might look like?
0
 
LVL 26

Expert Comment

by:Sinisa Vuk
ID: 40605417
one note - pdfs are zipped inside - so no clear text at all.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 24

Expert Comment

by:jimyX
ID: 40605598
Some pointers:

Doc: Automation will allow reading the content easily.

XLS: Reading xls files as table using ADO.

PDF: PDF is tough but could be parsed.
http://www.swissdelphicenter.ch/en/showcode.php?id=2169
http://www.foolabs.com/xpdf/about.html
Extracting text with Quick PDF free sdk:
http://www.quickpdflibrary.com/faq/extract-text-and-images-and-insert-into-new-pdf.php
0
 

Author Comment

by:shawn857
ID: 40609122
Jimy - yes I need to extract/read *all* the text from the files.

Thanks
   Shawn
0
 
LVL 24

Accepted Solution

by:
jimyX earned 400 total points
ID: 40609523
Drop a Memo and three buttons on your form.

PDF:
By using xPDF, just download the binaries they offer, and put next to your application.
You will need PdfToText.exe which accepts two parameters (that matter right now), others you can use as need be:

PdfToText PDF_File Txt_File
PDF_File as input
Txt_File as output.

By using ShellApi and wait for external process:

uses ShellApi;

function ExecPdfToTxt(ExecuteFile, ParamString: String): boolean;
var
  SEInfo: TShellExecuteInfo;
  ExitCode: DWORD;
begin
    FillChar(SEInfo, SizeOf(SEInfo), 0) ;
    SEInfo.cbSize := SizeOf(TShellExecuteInfo) ;
    with SEInfo do begin
      fMask := SEE_MASK_NOCLOSEPROCESS;
      Wnd := Application.Handle;
      lpFile := PChar(ExecuteFile) ;
      lpParameters:= PChar(ParamString);

      nShow := SW_HIDE;
    end;
    if ShellExecuteEx(@SEInfo) then begin
      repeat
        Application.ProcessMessages;
        GetExitCodeProcess(SEInfo.hProcess, ExitCode) ;
      until (ExitCode <> STILL_ACTIVE) or
 	 Application.Terminated;
      Result:= True;
    end
    else Result:= False;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  param, pTxt,
  PdfToTxt: String;
begin
  //provide full path if the tool is not located at same dir as your application
  PdfToTxt:= 'pdftotext.exe';
  if OpenDialog1.Execute then
    begin
      param:= OpenDialog1.FileName;
      pTxt:= ExtractFileName(param) + '.txt';
      param:= '"'+ param + '"' +' '+ '"' + pTxt+ '"';

      if ExecPdfToTxt(PdfToTxt, param) then
        begin
          Memo1.Lines.LoadFromFile(pTxt);
          //then you can delete the file, if at no further use
          //DeleteFile(pTxt)
        end
      else
        showmessage('Error: not executed');
    end;
end;

Open in new window


Doc & Xls:
uses ComObj;

// Doc (source links provided above)
function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count;//get the number of chars to select
    Result:=WordApp.Documents.item(1).Range(0, CharsCount).Text;//Select the text and retrieve the selection
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

procedure TForm1.Button2Click(Sender: TObject);
begin
  if OpenDialog1.Execute then
    Memo1.Lines.Add(ExtractTextFromWordFile(OpenDialog1.FileName));
end;

//XLS
procedure ExtractTextFromExcelFile(xlsMem:TMemo; const FileName:string);
var
  XLApp: OleVariant;
  Sheets: Variant;
  i, j: Integer;
begin
  XLApp := CreateOleObject('Excel.Application');
  XLApp.Visible := False;
  XLApp.Workbooks.Open(FileName);

  //Tailor to suit your need
  for i := 1 to XLApp.Workbooks.Count do begin  //Just in case
    for j:= 1 to XLApp.Workbooks[i].Sheets.Count do
      begin
        Sheets:= XLApp.Workbooks[i].WorkSheets[j];
        xlsMem.Lines.Add('Sheet Name: '+ Sheets.Name +#13+#10);        

        Sheets.Activate;
        XLApp.range[XLApp.cells[1, 1], Sheets.Cells[Sheets.Rows.Count,Sheets.Columns.Count]].Select;        
        XLApp.Selection.Copy;
        xlsMem.PasteFromClipboard;
      end; end;
  XLApp.DisplayAlerts := False;
  XLApp.Quit;
  XLApp := Unassigned;
end;

procedure TForm1.Button3Click(Sender: TObject);
begin
  if OpenDialog1.Execute then
    ExtractTextFromExcelFile(Memo1, OpenDialog1.FileName);
end;

Open in new window


PS: Doc & Xls Automation require MS Office to be installed.
0
 

Author Closing Comment

by:shawn857
ID: 40619917
Thank you gentlemen!

Cheers
   Shawn
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
Hello everybody This Article will show you how to validate number with TEdit control, What's the TEdit control? TEdit is a standard Windows edit control on a form, it allows to user to write, read and copy/paste single line of text. Usua…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.
A company’s greatest vulnerability is their email. CEO fraud, ransomware and spear phishing attacks are the no1 threat to a company’s security. Cybercrime is responsible for the largest loss of money to companies today with losses projected to r…

929 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now