Avatar of peispud
peispudFlag for Canada

asked on 

Convert the contents of a pdf file to a string in VBA

Hi.

I am using Microsoft Access.

I have a PDF file.  I need to use VBA to parse the file to extract data.
I have tried the following.

1) Open PDF
2) Select All                     then copy
3)  Not I go into my Microsoft Access application.     Using VBA, I convert the clipboard into a string
4 ) Parse the string & extract the info that I require.

The PDF document can be between 20 to 100 pages long.  (It is generated by another system dynamically.)

The solution that I have tried is not working well for me.

I need the text and the spacing to remain intact so that I can parse the string effectively.

Thank you.
Microsoft Access

Avatar of undefined
Last Comment
peispud
Avatar of PatHartman
PatHartman
Flag of United States of America image

Not working well?  What does that mean?  Are you getting errors?  Are you getting text but not what you expected?  Are you not getting any text at all?

I think you're going to need the full version of Adobe to do this with VBA.
Avatar of Arana (G.P.)
Arana (G.P.)

The contents of a PDF has more than only text, it has postcript language in it, its not a plain text file, if what you want is the contents of the file then
    Dim F As Integer
    f = FreeFile()
    Open filename For Binary Access Read Lock Write As #f
        myVar = Space(FileLen(filename))
        Get #f, , Myvar
    Close #f

that will get you all the file into Myvar

Iff what you want is the TEXT of the pdf as if you were reading it,  then you will need some pdf library like PDFBOX in order for vb to read the file the way you want it
1. Or, ...skip the whole PDF step, and get the data directly form the source (if you can)
The PDF document can be between 20 to 100 pages long.  (It is generated by another system dynamically.)
Do you have access to this "other system"?
If so, then see if you can get the data *before* it is converted to a PDF

2. Or perhaps we are spinning our wheels on this one...
Is it possible to post a sample of this PDF?, ...and also provide a sample of the exact output needed?
Perhaps an expert here can propose a more efficient process.

JeffCoachman
ASKER CERTIFIED SOLUTION
Avatar of PatHartman
PatHartman
Flag of United States of America image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of peispud
peispud
Flag of Canada image

ASKER

I cannot post an example of the pdf file because of privacy issues.

To further describe the issue....   I had hoped to  open the PDF file ,  then Ctrl-A,  & copy.  At this point, I had hoped for the entire text in the document to be copied in  the clipboard.   This is not the case.

I have never heard of PDFBOX .  It may be a challenge for me  because the example did was not using VBA.

I was hoping for a free pdf to word tool, or programming ideas.  I'll leave this up for a bit longer.

I appreciate your help so far.
Avatar of peispud
peispud
Flag of Canada image

ASKER

Thanks everyone for your assistance.
Microsoft Access
Microsoft Access

Microsoft Access is a rapid application development (RAD) relational database tool. Access can be used for both desktop and web-based applications, and uses VBA (Visual Basic for Applications) as its coding language.

226K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo