Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How to read pdf/xls/doc files in my app and be able to read the contents?

Posted on 2015-02-11
8
Medium Priority
?
389 Views
Last Modified: 2015-02-19
Hi, I have an app that reads plain ascii text files (.txt, .csv) and does processing on the strings in those files. Is there a way in Delphi to directly read in .pdf/.xls/.doc files and have them readable as well?

Thanks
    Shawn

P.S: I use D7.
0
Comment
Question by:shawn857
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 3
  • 2
8 Comments
 
LVL 27

Assisted Solution

by:Sinisa Vuk
Sinisa Vuk earned 400 total points
ID: 40605259
Try to read similar questions/answers on EE first:
Q_26787229, Q_28214245, Q_28573693

For short - there is no all-in-one solution. Depending on file extensions you should run appropriate reader.

Other:
tmssoftware's flexcel, kluug - xlsx-ods-delphi

Do you want to read for some strings in files or you want to show real thing to customer?
0
 

Author Comment

by:shawn857
ID: 40605304
Thanks Sinisa, I will have a look at that.

I just want my app to internally read the strings... no need to show real thing to user.

Thanks!
   Shawn
0
 
LVL 24

Expert Comment

by:jimyX
ID: 40605413
Hi Shawn,
Do you need to extract all text from those files?

OR

Do all your files look alike, and you have a pattern for reading strings?

Could you give samples of what PDF, XLS and DOC files might look like?
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 27

Expert Comment

by:Sinisa Vuk
ID: 40605417
one note - pdfs are zipped inside - so no clear text at all.
0
 
LVL 24

Expert Comment

by:jimyX
ID: 40605598
Some pointers:

Doc: Automation will allow reading the content easily.

XLS: Reading xls files as table using ADO.

PDF: PDF is tough but could be parsed.
http://www.swissdelphicenter.ch/en/showcode.php?id=2169
http://www.foolabs.com/xpdf/about.html
Extracting text with Quick PDF free sdk:
http://www.quickpdflibrary.com/faq/extract-text-and-images-and-insert-into-new-pdf.php
0
 

Author Comment

by:shawn857
ID: 40609122
Jimy - yes I need to extract/read *all* the text from the files.

Thanks
   Shawn
0
 
LVL 24

Accepted Solution

by:
jimyX earned 1600 total points
ID: 40609523
Drop a Memo and three buttons on your form.

PDF:
By using xPDF, just download the binaries they offer, and put next to your application.
You will need PdfToText.exe which accepts two parameters (that matter right now), others you can use as need be:

PdfToText PDF_File Txt_File
PDF_File as input
Txt_File as output.

By using ShellApi and wait for external process:

uses ShellApi;

function ExecPdfToTxt(ExecuteFile, ParamString: String): boolean;
var
  SEInfo: TShellExecuteInfo;
  ExitCode: DWORD;
begin
    FillChar(SEInfo, SizeOf(SEInfo), 0) ;
    SEInfo.cbSize := SizeOf(TShellExecuteInfo) ;
    with SEInfo do begin
      fMask := SEE_MASK_NOCLOSEPROCESS;
      Wnd := Application.Handle;
      lpFile := PChar(ExecuteFile) ;
      lpParameters:= PChar(ParamString);

      nShow := SW_HIDE;
    end;
    if ShellExecuteEx(@SEInfo) then begin
      repeat
        Application.ProcessMessages;
        GetExitCodeProcess(SEInfo.hProcess, ExitCode) ;
      until (ExitCode <> STILL_ACTIVE) or
 	 Application.Terminated;
      Result:= True;
    end
    else Result:= False;
end;

procedure TForm1.Button1Click(Sender: TObject);
var
  param, pTxt,
  PdfToTxt: String;
begin
  //provide full path if the tool is not located at same dir as your application
  PdfToTxt:= 'pdftotext.exe';
  if OpenDialog1.Execute then
    begin
      param:= OpenDialog1.FileName;
      pTxt:= ExtractFileName(param) + '.txt';
      param:= '"'+ param + '"' +' '+ '"' + pTxt+ '"';

      if ExecPdfToTxt(PdfToTxt, param) then
        begin
          Memo1.Lines.LoadFromFile(pTxt);
          //then you can delete the file, if at no further use
          //DeleteFile(pTxt)
        end
      else
        showmessage('Error: not executed');
    end;
end;

Open in new window


Doc & Xls:
uses ComObj;

// Doc (source links provided above)
function ExtractTextFromWordFile(const FileName:string):string;
var
  WordApp    : Variant;
  CharsCount : integer;
begin
  WordApp := CreateOleObject('Word.Application');
  try
    WordApp.Visible := False;
    WordApp.Documents.open(FileName);
    CharsCount:=Wordapp.Documents.item(1).Characters.Count;//get the number of chars to select
    Result:=WordApp.Documents.item(1).Range(0, CharsCount).Text;//Select the text and retrieve the selection
    WordApp.documents.item(1).Close;
  finally
   WordApp.Quit;
  end;
end;

procedure TForm1.Button2Click(Sender: TObject);
begin
  if OpenDialog1.Execute then
    Memo1.Lines.Add(ExtractTextFromWordFile(OpenDialog1.FileName));
end;

//XLS
procedure ExtractTextFromExcelFile(xlsMem:TMemo; const FileName:string);
var
  XLApp: OleVariant;
  Sheets: Variant;
  i, j: Integer;
begin
  XLApp := CreateOleObject('Excel.Application');
  XLApp.Visible := False;
  XLApp.Workbooks.Open(FileName);

  //Tailor to suit your need
  for i := 1 to XLApp.Workbooks.Count do begin  //Just in case
    for j:= 1 to XLApp.Workbooks[i].Sheets.Count do
      begin
        Sheets:= XLApp.Workbooks[i].WorkSheets[j];
        xlsMem.Lines.Add('Sheet Name: '+ Sheets.Name +#13+#10);        

        Sheets.Activate;
        XLApp.range[XLApp.cells[1, 1], Sheets.Cells[Sheets.Rows.Count,Sheets.Columns.Count]].Select;        
        XLApp.Selection.Copy;
        xlsMem.PasteFromClipboard;
      end; end;
  XLApp.DisplayAlerts := False;
  XLApp.Quit;
  XLApp := Unassigned;
end;

procedure TForm1.Button3Click(Sender: TObject);
begin
  if OpenDialog1.Execute then
    ExtractTextFromExcelFile(Memo1, OpenDialog1.FileName);
end;

Open in new window


PS: Doc & Xls Automation require MS Office to be installed.
0
 

Author Closing Comment

by:shawn857
ID: 40619917
Thank you gentlemen!

Cheers
   Shawn
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
Hello everybody This Article will show you how to validate number with TEdit control, What's the TEdit control? TEdit is a standard Windows edit control on a form, it allows to user to write, read and copy/paste single line of text. Usua…
Visualize your data even better in Access queries. Given a date and a value, this lesson shows how to compare that value with the previous value, calculate the difference, and display a circle if the value is the same, an up triangle if it increased…
In response to a need for security and privacy, and to continue fostering an environment members can turn to for support, solutions, and education, Experts Exchange has created anonymous question capabilities. This new feature is available to our Pr…
Suggested Courses

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question