• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 466
  • Last Modified:

How to read pdf/xls/doc files in my app and be able to read the contents?

Hi, I have an app that reads plain ascii text files (.txt, .csv) and does processing on the strings in those files. Is there a way in Delphi to directly read in .pdf/.xls/.doc files and have them readable as well?

Thanks
    Shawn

P.S: I use D7.
0
shawn857
Asked:
shawn857
  • 3
  • 3
  • 2
2 Solutions
 
Sinisa VukCommented:
Try to read similar questions/answers on EE first:
Q_26787229, Q_28214245, Q_28573693

For short - there is no all-in-one solution. Depending on file extensions you should run appropriate reader.

Other:
tmssoftware's flexcel, kluug - xlsx-ods-delphi

Do you want to read for some strings in files or you want to show real thing to customer?
0
 
shawn857Author Commented:
Thanks Sinisa, I will have a look at that.

I just want my app to internally read the strings... no need to show real thing to user.

Thanks!
   Shawn
0
 
jimyXCommented:
Hi Shawn,
Do you need to extract all text from those files?

OR

Do all your files look alike, and you have a pattern for reading strings?

Could you give samples of what PDF, XLS and DOC files might look like?
0
Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

 
Sinisa VukCommented:
one note - pdfs are zipped inside - so no clear text at all.
0
 
jimyXCommented:
Some pointers:

Doc: Automation will allow reading the content easily.

XLS: Reading xls files as table using ADO.

PDF: PDF is tough but could be parsed.
http://www.swissdelphicenter.ch/en/showcode.php?id=2169
http://www.foolabs.com/xpdf/about.html
Extracting text with Quick PDF free sdk:
http://www.quickpdflibrary.com/faq/extract-text-and-images-and-insert-into-new-pdf.php
0
 
shawn857Author Commented:
Jimy - yes I need to extract/read *all* the text from the files.

Thanks
   Shawn
0
 
jimyXCommented:
Drop a Memo and three buttons on your form.

PDF:
By using xPDF, just download the binaries they offer, and put next to your application.
You will need PdfToText.exe which accepts two parameters (that matter right now), others you can use as need be:

PdfToText PDF_File Txt_File
PDF_File as input
Txt_File as output.

By using ShellApi and wait for external process:

uses ShellApi;

function ExecPdfToTxt(ExecuteFile, ParamString: String): boolean;
var
  SEInfo: TShellExecuteInfo;
  ExitCode: DWORD;
begin
    FillChar(SEInfo, SizeOf(SEInfo), 0) ;
    SEInfo.cbSize := SizeOf(TShellExecuteInfo) ;
    with SEInfo do begin
      fMask := SEE_MASK_NOCLOSEPROCESS;
      Wnd := Application.Handle;
      lpFile := PChar(ExecuteFile) ;
      lpParameters:= PChar(ParamString);

      nShow := SW_HIDE;
    end;
    if ShellExecuteEx(@SEInfo) then begin
      repeat
        Application.ProcessMessages;
        GetExitCodeProcess(SEInfo.hProcess, ExitCode) ;
      until (ExitCode <> STILL_ACTIVE) or
 	 Application.Terminated;
      Result:= True;
    end
    else Result:= False;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  param, pTxt,
  PdfToTxt: String;
begin
  //provide full path if the tool is not located at same dir as your application
  PdfToTxt:= 'pdftotext.exe';
  if OpenDialog1.Execute then
    begin
      param:= OpenDialog1.FileName;
      pTxt:= ExtractFileName(param) + '.txt';
      param:= '"'+ param + '"' +' '+ '"' + pTxt+ '"';

      if ExecPdfToTxt(PdfToTxt, param) then
        begin
          Memo1.Lines.LoadFromFile(pTxt);
          //then you can delete the file, if at no further use
          //DeleteFile(pTxt)
        end
      else
        showmessage('Error: not executed');
    end;
end;

Open in new window


Doc & Xls:
uses ComObj;

// Doc (source links provided above)
function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count;//get the number of chars to select
    Result:=WordApp.Documents.item(1).Range(0, CharsCount).Text;//Select the text and retrieve the selection
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

procedure TForm1.Button2Click(Sender: TObject);
begin
  if OpenDialog1.Execute then
    Memo1.Lines.Add(ExtractTextFromWordFile(OpenDialog1.FileName));
end;

//XLS
procedure ExtractTextFromExcelFile(xlsMem:TMemo; const FileName:string);
var
  XLApp: OleVariant;
  Sheets: Variant;
  i, j: Integer;
begin
  XLApp := CreateOleObject('Excel.Application');
  XLApp.Visible := False;
  XLApp.Workbooks.Open(FileName);

  //Tailor to suit your need
  for i := 1 to XLApp.Workbooks.Count do begin  //Just in case
    for j:= 1 to XLApp.Workbooks[i].Sheets.Count do
      begin
        Sheets:= XLApp.Workbooks[i].WorkSheets[j];
        xlsMem.Lines.Add('Sheet Name: '+ Sheets.Name +#13+#10);        

        Sheets.Activate;
        XLApp.range[XLApp.cells[1, 1], Sheets.Cells[Sheets.Rows.Count,Sheets.Columns.Count]].Select;        
        XLApp.Selection.Copy;
        xlsMem.PasteFromClipboard;
      end; end;
  XLApp.DisplayAlerts := False;
  XLApp.Quit;
  XLApp := Unassigned;
end;

procedure TForm1.Button3Click(Sender: TObject);
begin
  if OpenDialog1.Execute then
    ExtractTextFromExcelFile(Memo1, OpenDialog1.FileName);
end;

Open in new window


PS: Doc & Xls Automation require MS Office to be installed.
0
 
shawn857Author Commented:
Thank you gentlemen!

Cheers
   Shawn
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

  • 3
  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now