Solved

HowTo read contents of a .pdf/.zip ?

Posted on 2002-07-29
4
206 Views
Last Modified: 2010-08-05
Hello all

any pointers on how to

1. get the text contents of files such as *.html ( or any url ),*.pdf etc and storing them in a db
i've the VBA-word to get the contents of a word file but not decided on how to approach a .pdf and .html file, any pointers here ?

2. programmatically unpacking a .zip file's contents to a folder of choice, and then going to step 1

on another note (you get the points even if the following is not answered ) i need to do (programmatic) searches on the contents of files ( that's why i store the contents into a db and do a sql server full-text search ), but also might need to do regular expression searches -
any leads here ?

TIA

0
Comment
Question by:dkjnkm
4 Comments
 
LVL 4

Accepted Solution

by:
AlonHirsch earned 200 total points
ID: 7187274
Hi,

For HTML and other Text based files - it's very easy. Simply read the file into a string variable and write that variable into a Text field in SQL Server using AppendChunk.

For PDF and other binary file types - you would need to get some sort of control or something that can read those types of files and then do the same type of thing : translate them to text and appendchunk to the database.

To Unzip files in a ZIP you would need some sort of UNZIP control or DLL. InfoZip have a freeware (I think) DLL that has that capability. Go to http://www.infozip.com or http://www.infozip.org and search from there.

HTH,
Alon
0
 
LVL 69

Expert Comment

by:Éric Moreau
ID: 7187535
To unzip, you may use this free component: http://vbaccelerator.com/codelib/zip/zipvb.htm
0
 
LVL 49

Expert Comment

by:DanRollins
ID: 8049086
Hi dkjnkm,
It appears that you have forgotten this question. I will ask Community Support to close it unless you finalize it within 7 days. I will ask a Community Support Moderator to:

    Accept AlonHirsch's comment(s) as an answer.

dkjnkm, if you think your question was not answered at all or if you need help, just post a new comment here; Community Support will help you.  DO NOT accept this comment as an answer.

EXPERTS: If you disagree with that recommendation, please post an explanatory comment.
==========
DanRollins -- EE database cleanup volunteer
0
 

Expert Comment

by:SpideyMod
ID: 8095929
per recommendation

SpideyMod
Community Support Moderator @Experts Exchange
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Introduction In a recent article (http://www.experts-exchange.com/A_7811-A-Better-Concatenate-Function.html) for the Excel community, I showed an improved version of the Excel Concatenate() function.  While writing that article I realized that no o…
Introduction While answering a recent question about filtering a custom class collection, I realized that this could be accomplished with very little code by using the ScriptControl (SC) library.  This article will introduce you to the SC library a…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now