Mutley2003
asked on
using Ifilter to examine a PDF downloaded in a Twebbrowser
There is an excellent discussion of using an Ifilter to get the text out of a PDF document here
https://www.experts-exchange.com/questions/20293579/Using-IFilter-With-Delphi.html
but my problem is a bit different.
I want to extract the text of a pdf that has been downloaded in a browser (IE, Twebbrowser) but I want to know
a) when that download is complete .. can I use OnDocumentComplete or does that only work for the HTML pages
b) where the pdf is , so I can examine it. I suppose it is in a cache somewhere, but how can I find it/establish the correspondence between the original pdf url and the name in the cache?
thanks
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
I have searched my whole C drive and found nothing. I really think the PDF exists only in memory.
Regards Jacco
Regards Jacco
ASKER
well, when I get some time I will monitor disk changes with FindFirstChangeNotificatio n
and let you know what I find out
and let you know what I find out
ASKER
I also monitored TWebBrowser events and got
onBeforeNavigate2 not busy , loading
http://www.fia.com/resources/documents/1797101136__Appendix_L_a.pdf
onDownloadBegin busy , loading
onDownloadComplete not busy , loading
onDownloadBegin busy , loading
onNavigateComplete2 busy , loading
http://www.fia.com/resources/documents/1797101136__Appendix_L_a.pdf
onCommandStateChange busy , interactive
onDownloadComplete not busy , interactive
onDocumentComplete not busy , complete
http://www.fia.com/resources/documents/1797101136__Appendix_L_a.pdf
onDocumentComplete not busy , complete
http://www.fia.com/resources/documents/1797101136__Appendix_L_a.pdf
onDownloadBegin busy , complete
onDownloadComplete not busy , complete
as you say, a whole bunch of completion events.
This reminds me of what TwebBrowser does with frames.
as for using Indy and a direct download, thanks for the idea but that won't work for what I want.
So that leaves the problem
b) where the pdf is , so I can examine it. I suppose it is in a cache somewhere, but how can I find it/establish the correspondence between the original pdf url and the name in the cache?
and you suggest that
". IE probable directly streams it to the AcrobatReader and it not on disk"
I guess that is possible and I might believe it if I had a good utility app that watched changes to the disk .. some wrapper around FindFirstChangeNotificatio
Also, I vaguely remember that there is a mechanism for telling IE how to handle certain filetypes .. it is not plugins, not pluggable protocols .. the name escapes me. If I knew how that worked, then maybe I would know what IE does with PDF.
any ideas?