Solved

How to read pdf/xls/doc files in my app and be able to read the contents?

Posted on 2015-02-11
8
251 Views
Last Modified: 2015-02-19
Hi, I have an app that reads plain ascii text files (.txt, .csv) and does processing on the strings in those files. Is there a way in Delphi to directly read in .pdf/.xls/.doc files and have them readable as well?

Thanks
    Shawn

P.S: I use D7.
0
Comment
Question by:shawn857
  • 3
  • 3
  • 2
8 Comments
 
LVL 25

Assisted Solution

by:Sinisa Vuk
Sinisa Vuk earned 100 total points
ID: 40605259
Try to read similar questions/answers on EE first:
Q_26787229, Q_28214245, Q_28573693

For short - there is no all-in-one solution. Depending on file extensions you should run appropriate reader.

Other:
tmssoftware's flexcel, kluug - xlsx-ods-delphi

Do you want to read for some strings in files or you want to show real thing to customer?
0
 

Author Comment

by:shawn857
ID: 40605304
Thanks Sinisa, I will have a look at that.

I just want my app to internally read the strings... no need to show real thing to user.

Thanks!
   Shawn
0
 
LVL 24

Expert Comment

by:jimyX
ID: 40605413
Hi Shawn,
Do you need to extract all text from those files?

OR

Do all your files look alike, and you have a pattern for reading strings?

Could you give samples of what PDF, XLS and DOC files might look like?
0
 
LVL 25

Expert Comment

by:Sinisa Vuk
ID: 40605417
one note - pdfs are zipped inside - so no clear text at all.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 24

Expert Comment

by:jimyX
ID: 40605598
Some pointers:

Doc: Automation will allow reading the content easily.

XLS: Reading xls files as table using ADO.

PDF: PDF is tough but could be parsed.
http://www.swissdelphicenter.ch/en/showcode.php?id=2169
http://www.foolabs.com/xpdf/about.html
Extracting text with Quick PDF free sdk:
http://www.quickpdflibrary.com/faq/extract-text-and-images-and-insert-into-new-pdf.php
0
 

Author Comment

by:shawn857
ID: 40609122
Jimy - yes I need to extract/read *all* the text from the files.

Thanks
   Shawn
0
 
LVL 24

Accepted Solution

by:
jimyX earned 400 total points
ID: 40609523
Drop a Memo and three buttons on your form.

PDF:
By using xPDF, just download the binaries they offer, and put next to your application.
You will need PdfToText.exe which accepts two parameters (that matter right now), others you can use as need be:

PdfToText PDF_File Txt_File
PDF_File as input
Txt_File as output.

By using ShellApi and wait for external process:

uses ShellApi;

function ExecPdfToTxt(ExecuteFile, ParamString: String): boolean;
var
  SEInfo: TShellExecuteInfo;
  ExitCode: DWORD;
begin
    FillChar(SEInfo, SizeOf(SEInfo), 0) ;
    SEInfo.cbSize := SizeOf(TShellExecuteInfo) ;
    with SEInfo do begin
      fMask := SEE_MASK_NOCLOSEPROCESS;
      Wnd := Application.Handle;
      lpFile := PChar(ExecuteFile) ;
      lpParameters:= PChar(ParamString);

      nShow := SW_HIDE;
    end;
    if ShellExecuteEx(@SEInfo) then begin
      repeat
        Application.ProcessMessages;
        GetExitCodeProcess(SEInfo.hProcess, ExitCode) ;
      until (ExitCode <> STILL_ACTIVE) or
 	 Application.Terminated;
      Result:= True;
    end
    else Result:= False;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  param, pTxt,
  PdfToTxt: String;
begin
  //provide full path if the tool is not located at same dir as your application
  PdfToTxt:= 'pdftotext.exe';
  if OpenDialog1.Execute then
    begin
      param:= OpenDialog1.FileName;
      pTxt:= ExtractFileName(param) + '.txt';
      param:= '"'+ param + '"' +' '+ '"' + pTxt+ '"';

      if ExecPdfToTxt(PdfToTxt, param) then
        begin
          Memo1.Lines.LoadFromFile(pTxt);
          //then you can delete the file, if at no further use
          //DeleteFile(pTxt)
        end
      else
        showmessage('Error: not executed');
    end;
end;

Open in new window


Doc & Xls:
uses ComObj;

// Doc (source links provided above)
function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count;//get the number of chars to select
    Result:=WordApp.Documents.item(1).Range(0, CharsCount).Text;//Select the text and retrieve the selection
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

procedure TForm1.Button2Click(Sender: TObject);
begin
  if OpenDialog1.Execute then
    Memo1.Lines.Add(ExtractTextFromWordFile(OpenDialog1.FileName));
end;

//XLS
procedure ExtractTextFromExcelFile(xlsMem:TMemo; const FileName:string);
var
  XLApp: OleVariant;
  Sheets: Variant;
  i, j: Integer;
begin
  XLApp := CreateOleObject('Excel.Application');
  XLApp.Visible := False;
  XLApp.Workbooks.Open(FileName);

  //Tailor to suit your need
  for i := 1 to XLApp.Workbooks.Count do begin  //Just in case
    for j:= 1 to XLApp.Workbooks[i].Sheets.Count do
      begin
        Sheets:= XLApp.Workbooks[i].WorkSheets[j];
        xlsMem.Lines.Add('Sheet Name: '+ Sheets.Name +#13+#10);        

        Sheets.Activate;
        XLApp.range[XLApp.cells[1, 1], Sheets.Cells[Sheets.Rows.Count,Sheets.Columns.Count]].Select;        
        XLApp.Selection.Copy;
        xlsMem.PasteFromClipboard;
      end; end;
  XLApp.DisplayAlerts := False;
  XLApp.Quit;
  XLApp := Unassigned;
end;

procedure TForm1.Button3Click(Sender: TObject);
begin
  if OpenDialog1.Execute then
    ExtractTextFromExcelFile(Memo1, OpenDialog1.FileName);
end;

Open in new window


PS: Doc & Xls Automation require MS Office to be installed.
0
 

Author Closing Comment

by:shawn857
ID: 40619917
Thank you gentlemen!

Cheers
   Shawn
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
TcomboBox uninverted item 3 44
Thread safe  opinion 7 111
delphi exception 7 58
Press three keys together and trigger a function 3 38
Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy re…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now