Link to home
Start Free TrialLog in
Avatar of Mutley2003
Mutley2003

asked on

using Ifilter to examine a PDF downloaded in a Twebbrowser


There is an excellent discussion of using an Ifilter to get the text out of a PDF document here

https://www.experts-exchange.com/questions/20293579/Using-IFilter-With-Delphi.html

but my problem is a bit different.

I want to extract the text of a pdf that has been downloaded in a browser (IE, Twebbrowser) but I want to know

a) when that download is complete .. can I use OnDocumentComplete or does that only work for the HTML pages

b) where the pdf is , so I can examine it. I suppose it is in a cache somewhere, but how can I find it/establish the correspondence between the original pdf url and the name in the cache?

thanks
ASKER CERTIFIED SOLUTION
Avatar of Jacco
Jacco
Flag of Netherlands image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Mutley2003
Mutley2003

ASKER

Hi Jacco

I also monitored TWebBrowser events and got
onBeforeNavigate2 not busy , loading
http://www.fia.com/resources/documents/1797101136__Appendix_L_a.pdf
onDownloadBegin busy , loading
onDownloadComplete not busy , loading
onDownloadBegin busy , loading
onNavigateComplete2 busy , loading
http://www.fia.com/resources/documents/1797101136__Appendix_L_a.pdf
onCommandStateChange busy , interactive
onDownloadComplete not busy , interactive
onDocumentComplete not busy , complete
http://www.fia.com/resources/documents/1797101136__Appendix_L_a.pdf
onDocumentComplete not busy , complete
http://www.fia.com/resources/documents/1797101136__Appendix_L_a.pdf
onDownloadBegin busy , complete
onDownloadComplete not busy , complete

as you say, a whole bunch of completion events.

This reminds me of what TwebBrowser does with frames.



as for using Indy and a direct download, thanks for the idea but that won't work for what I want.

So that leaves the problem

b) where the pdf is , so I can examine it. I suppose it is in a cache somewhere, but how can I find it/establish the correspondence between the original pdf url and the name in the cache?

and you suggest that
". IE probable directly streams it to the AcrobatReader and it not on disk"

I guess that is possible and I might believe it if I had a good utility app that watched changes to the disk .. some wrapper around FindFirstChangeNotification or some such.


Also, I vaguely remember that there is a mechanism for telling IE how to handle certain filetypes .. it is not plugins, not pluggable protocols .. the name escapes me.  If I knew how that worked, then maybe I would know what IE does with PDF.


any ideas?


 
I have searched my whole C drive and found nothing. I really think the PDF exists only in memory.

Regards Jacco
well, when I get some time I will monitor disk changes with FindFirstChangeNotification
and let you know what I find out