Solved

How to read pdf/xls/doc files in my app and be able to read the contents?

Posted on 2015-02-11
8
333 Views
Last Modified: 2015-02-19
Hi, I have an app that reads plain ascii text files (.txt, .csv) and does processing on the strings in those files. Is there a way in Delphi to directly read in .pdf/.xls/.doc files and have them readable as well?

Thanks
    Shawn

P.S: I use D7.
0
Comment
Question by:shawn857
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
8 Comments
 
LVL 27

Assisted Solution

by:Sinisa Vuk
Sinisa Vuk earned 100 total points
ID: 40605259
Try to read similar questions/answers on EE first:
Q_26787229, Q_28214245, Q_28573693

For short - there is no all-in-one solution. Depending on file extensions you should run appropriate reader.

Other:
tmssoftware's flexcel, kluug - xlsx-ods-delphi

Do you want to read for some strings in files or you want to show real thing to customer?
0
 

Author Comment

by:shawn857
ID: 40605304
Thanks Sinisa, I will have a look at that.

I just want my app to internally read the strings... no need to show real thing to user.

Thanks!
   Shawn
0
 
LVL 24

Expert Comment

by:jimyX
ID: 40605413
Hi Shawn,
Do you need to extract all text from those files?

OR

Do all your files look alike, and you have a pattern for reading strings?

Could you give samples of what PDF, XLS and DOC files might look like?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 27

Expert Comment

by:Sinisa Vuk
ID: 40605417
one note - pdfs are zipped inside - so no clear text at all.
0
 
LVL 24

Expert Comment

by:jimyX
ID: 40605598
Some pointers:

Doc: Automation will allow reading the content easily.

XLS: Reading xls files as table using ADO.

PDF: PDF is tough but could be parsed.
http://www.swissdelphicenter.ch/en/showcode.php?id=2169
http://www.foolabs.com/xpdf/about.html
Extracting text with Quick PDF free sdk:
http://www.quickpdflibrary.com/faq/extract-text-and-images-and-insert-into-new-pdf.php
0
 

Author Comment

by:shawn857
ID: 40609122
Jimy - yes I need to extract/read *all* the text from the files.

Thanks
   Shawn
0
 
LVL 24

Accepted Solution

by:
jimyX earned 400 total points
ID: 40609523
Drop a Memo and three buttons on your form.

PDF:
By using xPDF, just download the binaries they offer, and put next to your application.
You will need PdfToText.exe which accepts two parameters (that matter right now), others you can use as need be:

PdfToText PDF_File Txt_File
PDF_File as input
Txt_File as output.

By using ShellApi and wait for external process:

uses ShellApi;

function ExecPdfToTxt(ExecuteFile, ParamString: String): boolean;
var
  SEInfo: TShellExecuteInfo;
  ExitCode: DWORD;
begin
    FillChar(SEInfo, SizeOf(SEInfo), 0) ;
    SEInfo.cbSize := SizeOf(TShellExecuteInfo) ;
    with SEInfo do begin
      fMask := SEE_MASK_NOCLOSEPROCESS;
      Wnd := Application.Handle;
      lpFile := PChar(ExecuteFile) ;
      lpParameters:= PChar(ParamString);

      nShow := SW_HIDE;
    end;
    if ShellExecuteEx(@SEInfo) then begin
      repeat
        Application.ProcessMessages;
        GetExitCodeProcess(SEInfo.hProcess, ExitCode) ;
      until (ExitCode <> STILL_ACTIVE) or
 	 Application.Terminated;
      Result:= True;
    end
    else Result:= False;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  param, pTxt,
  PdfToTxt: String;
begin
  //provide full path if the tool is not located at same dir as your application
  PdfToTxt:= 'pdftotext.exe';
  if OpenDialog1.Execute then
    begin
      param:= OpenDialog1.FileName;
      pTxt:= ExtractFileName(param) + '.txt';
      param:= '"'+ param + '"' +' '+ '"' + pTxt+ '"';

      if ExecPdfToTxt(PdfToTxt, param) then
        begin
          Memo1.Lines.LoadFromFile(pTxt);
          //then you can delete the file, if at no further use
          //DeleteFile(pTxt)
        end
      else
        showmessage('Error: not executed');
    end;
end;

Open in new window


Doc & Xls:
uses ComObj;

// Doc (source links provided above)
function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count;//get the number of chars to select
    Result:=WordApp.Documents.item(1).Range(0, CharsCount).Text;//Select the text and retrieve the selection
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

procedure TForm1.Button2Click(Sender: TObject);
begin
  if OpenDialog1.Execute then
    Memo1.Lines.Add(ExtractTextFromWordFile(OpenDialog1.FileName));
end;

//XLS
procedure ExtractTextFromExcelFile(xlsMem:TMemo; const FileName:string);
var
  XLApp: OleVariant;
  Sheets: Variant;
  i, j: Integer;
begin
  XLApp := CreateOleObject('Excel.Application');
  XLApp.Visible := False;
  XLApp.Workbooks.Open(FileName);

  //Tailor to suit your need
  for i := 1 to XLApp.Workbooks.Count do begin  //Just in case
    for j:= 1 to XLApp.Workbooks[i].Sheets.Count do
      begin
        Sheets:= XLApp.Workbooks[i].WorkSheets[j];
        xlsMem.Lines.Add('Sheet Name: '+ Sheets.Name +#13+#10);        

        Sheets.Activate;
        XLApp.range[XLApp.cells[1, 1], Sheets.Cells[Sheets.Rows.Count,Sheets.Columns.Count]].Select;        
        XLApp.Selection.Copy;
        xlsMem.PasteFromClipboard;
      end; end;
  XLApp.DisplayAlerts := False;
  XLApp.Quit;
  XLApp := Unassigned;
end;

procedure TForm1.Button3Click(Sender: TObject);
begin
  if OpenDialog1.Execute then
    ExtractTextFromExcelFile(Memo1, OpenDialog1.FileName);
end;

Open in new window


PS: Doc & Xls Automation require MS Office to be installed.
0
 

Author Closing Comment

by:shawn857
ID: 40619917
Thank you gentlemen!

Cheers
   Shawn
0

Featured Post

Enroll in June's Course of the Month

June’s Course of the Month is now available! Experts Exchange’s Premium Members, Team Accounts, and Qualified Experts have access to a complimentary course each month as part of their membership—an extra way to sharpen your skills and increase training.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The uses clause is one of those things that just tends to grow and grow. Most of the time this is in the main form, as it's from this form that all others are called. If you have a big application (including many forms), the uses clause in the in…
Introduction I have seen many questions in this Delphi topic area where queries in threads are needed or suggested. I know bumped into a similar need. This article will address some of the concepts when dealing with a multithreaded delphi database…
If you're a developer or IT admin, you’re probably tasked with managing multiple websites, servers, applications, and levels of security on a daily basis. While this can be extremely time consuming, it can also be frustrating when systems aren't wor…
Monitoring a network: why having a policy is the best policy? Michael Kulchisky, MCSE, MCSA, MCP, VTSP, VSP, CCSP outlines the enormous benefits of having a policy-based approach when monitoring medium and large networks. Software utilized in this v…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question