Solved

PDF Fields

Posted on 2010-09-21
28
1,228 Views
Last Modified: 2012-05-10
I want to read the fields in a PDF file , is there an adobe component, or other component, that I can reference in Access/VBA and/or inVB.Net to reliably read the PD document fields with a view to extracting the data and updating a database?

I appreciate that adobe components may require purchase of their products but I need to know which one.  I would prefer not to have to use a third party software as this would require updates/licensing and I will be supplying to a company who already have adobe pdf writer installed.


0
Comment
Question by:kokane
  • 14
  • 12
28 Comments
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33733381
You can use the Acrobat IAC interface to read form fields in your VB/VBA or .NET application. Take a lookecat http ://www.adobe.com/devnet/acrobat for more information. If you want a free 3rd party component, try the .Net version of Text: http://sourceforge.net/projects/itextsharp/
0
 

Author Comment

by:kokane
ID: 33741768
Thanks khkremer that give me a healthy nudge in the right direction.

However I  would like more specific information regarding  the adobe component that would allow me to read PDF fields in vba/vb.net. Specifically I need to know which products it ships with so I can check if the customer I want to supply the software to has already got it installed as he already has a PDF writer and also because i want to be able to download it and trial it with Access VBA and or VB.Net. The component mentioned in answer was Acrobat IAC interface  and I was directed to the Acrobat website to locate further information rather than the specfic information I require.  There seems a real shortage of anyone saying this method is successfull when I look on Google which leads me to beleive that it may work in theory rather than in practice. I am also hoping to use VB.NET Exress 2008 or VBA and most references seem to be for problems witihin Visual Studio and be out of  date.

I find it very suprising that I cant find a single working example on the web -  which leads me to think the 3rd party route is the only viable one and  itextsharp for example seems to have many posts suggesting compatability problems.
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33743554
It's part of Adobe Acrobat. You can find the API here: http://livedocs.adobe.com/acrobat_sdk/9.1/Acrobat9_1_HTMLHelp/IAC_API_OLE_Objects.103.1.html (click on the little button in the upper left corner to show the navigation pane.

Let me see if I can whip up an example for you. So, just to recap, you want to read a form field in VBA. I don't use Access, but I can give you a Word or Excel example.
0
 
LVL 44

Accepted Solution

by:
Karl Heinz Kremer earned 500 total points
ID: 33743786
Here is an example I just did with Word: I created a button and in the button handler I am reading two values from a PDF file. The program uses the JSO (the JavaScript Object, you can read more about that in one of my blog posts: http://www.khk.net/wordpress/2009/03/11/acrobat-javascript-and-vb-walk-into-a-bar/)


Private Sub CommandButton1_Click()
    Dim AcroApp As Acrobat.CAcroApp
    Dim theForm As Acrobat.CAcroPDDoc
    Dim jso As Object
    Dim text1, text2 As String
    
    Set AcroApp = CreateObject("AcroExch.App")
    Set theForm = CreateObject("AcroExch.PDDoc")
    theForm.Open ("C:\temp\sampleForm.pdf")
    Set jso = theForm.GetJSObject
    
    ' get the information from the form fiels Text1 and Text2
    text1 = jso.getField("Text1").Value
    text2 = jso.getField("Text2").Value
    
    MsgBox "Values read from PDF: " & text1 & " " & text2
    theForm.Close
     
    AcroApp.Exit
    Set AcroApp = Nothing
    Set theForm = Nothing
     
    MsgBox "Done"
End Sub

Open in new window

0
 

Author Comment

by:kokane
ID: 33749130
khkremer,

That is excellent thanks . Looks very doable - presumably it is helpful if the field names are meaningful, or are they just named  Text1-n sequentailly?

My only remaining problem is to work out which Acrobat product I have to buy -I asked this from Adobe technical support some time ago and the guy firstly insisted that there was no such thing and then directed me to the hlp pages - Ideally I would like to download a trrial version and test it rather than buying it and finding there is a problem. Dont like to be greed but any ideas? I 'm off now to have a look.
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33749228
You can name the fields whatever you want - of course, having meaningful names is better than just using GUIDs for example.

You need Adobe Acrobat - the full version (either Standard, Pro or Pro Extended). The free Reader does not provide these APIs.

There is a 30 day free trial for Adobe Acrobat Pro and Pro Ex at http://www.adobe.com/products/acrobat/ - just click on the "Try" button below the product image.
0
 

Author Comment

by:kokane
ID: 33749327
khkremer

Thanks again I was just returning to say I  was downloading the Pro Verison having read your previous article on the topic - I'm developing a warm feeling that this fecking thing is going to work.
   
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33749341
Let me know if you run into problems. As I said, I am not familiar with Access, but anything else I can help you with.
0
 

Author Comment

by:kokane
ID: 33749865
I have downloaded code now compiles in Access and Excel (2003)  but I get an error on the read field - I have checked the pdf document to grt the name of one field - in this case has a name of 'WB1.2.Title' which I am using "Field1 = jso.getField("WMB1.2.Title").Value"   I have References Adobe Arobat 9.0 Type Library and Acess 3.0 type Library selected.

The error is Run-Time error '91' Object variable or With Object variable not set - the same in both Excel and Access. This error is caused by "You attempted to use an object variable that isn't yet referencing a valid object."

It does not complain with the line "Set jso = theForm.GetJSObject" which seems strange given the error message.

Is there some reference to the Javascript I am missing in the VBA? Alternatively is there a simple command other than getfield that will tell if it can do anything and perhaps it is just this type of command that is not working?

I will try tomorrow on my other machine which has got office 2007 - bit someone on your website said they had got it (or soemthing similar) working with 2003.

0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33750071
Is this a Designer form?
0
 

Author Comment

by:kokane
ID: 33750116
Is this a Designer form?

Sorry not sure what that is - a type of PDF document? I know they use to collect company information and they have a PDF writer.
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33750160
There are two different form systems build into Acrobat: AcroForms and XFA or Designer forms. Bring up the document information (Ctrl-D) and see what application created the document.
0
 

Author Comment

by:kokane
ID: 33750178
PDF Producer: Acrobat Distiller9.0.0 (Macintosh)
PDF Version 1.6 (Acrobat 7.x)
 
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33750245
Would you be able to share the file with me? It looks like an Acroform, so it should work with the code you have.
0
 

Author Comment

by:kokane
ID: 33750315
Prefer if not publically viewable - any other means of sending it ?.  Perhaps I can amend it.

I just tried it with another PDF (template1 which comes with the version 9 )and it worked first time, then I updated the field and tried it again and it failed with the same error as before. Will retry that in case I didi something silly.
0
 

Author Comment

by:kokane
ID: 33750346
Yes exactly the same error when I try to read an Adobe File (template1) which has been updated by me but fine if not updated.  i will try another field in the updateddocument to see if it is the field or the fact that the document has been updated.
0
 

Author Comment

by:kokane
ID: 33750359
Apologies just tried that agian and sucessfully reading the updated file. It is half 2 here in the UK beginning to flag.

Is there anyway I can send you the file privately
0
 

Author Comment

by:kokane
ID: 33750383
I have now realised that it is folder problem I was pointing it to the wrong place - apologies and thanks for your help.

0
 

Author Comment

by:kokane
ID: 33750393
Please reply to this so I can award you the points - and thanks again.
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33750638
See my profile page, there is an email address you can use to send the file to me.
0
 

Author Comment

by:kokane
ID: 33752507
khkremer:

As a supplementary question (greedy type me) any idea how you can extract ALL the fields names in a PDF document - this would remove the requirement to identify them first and simply move the values to corresponding DF fields which you could dynamically create?

0
 

Author Comment

by:kokane
ID: 33753174
Moderator,

I did not email to reolve the issue  - the confusion has arisen because I mis-implmented a previous answer i was given and realised after a number of other message were posted. Is it acceptable that I now seek addtional information to my original question AFTER I  have selected the answer from above?  
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33753299
SouthMod - I don't think we've met before. As you can see, I've been doing this for quite some time, and I think I know how EE works. Sometimes it is necessary to have access to a file that the asker does not want to make available publicly (for a number of reasons), and in such cases, I offer to receive such a file in a private email to me. That does not mean that the question would be resolved in public email - it's just use to provide information that is required to resolve the issue, the remainder of the resolution would still be posted on EE (e.g. as in "you've used the wrong field name when trying to extract data from the document, replace xyz with abc and it will work"). Also, I don't understand why you are under the impression that kokane did answer his/her own question. As a moderator stepping into a long exchange of information, I expect you to read the whole exchange, and understand how we ended up here. I did clearly provide a sample script that the asker used to validate that Acrobat can actually be controlled via VBA (in http:#a33743786).
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33753338
kokane - I think you should accept the comment that has the code snippet as the answer to this question. The discussion we had afterwards was about getting Acrobat and do not really provide any more information.

As far as your related question goes, I would post a second question, and reference this question in it. This way, if somebody else wants to participate, they will have all the information.

Hope that helps.
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 33753624
Now that the question is closed, there is a link to "ask a related question" in the comment box, use that to create your new question.
Thanks.
0
 

Author Comment

by:kokane
ID: 33760268
SouthMod

"Which seemed to indicate that the Author had resolved their issue outside of EE assistance. Of course, if I'm wrong, then the author is free to choose whatever method is appropriate to close the matter."

I obviously should have made that clearer - I had not implemented the solution correctly as given earlier by khkremer.  As soon as I realised this I sent the message expalining that I was looking in the wrong folder.

On another point  I was advised via the on-line help to make a request for assistance I now undrestand I should have simply added another comment to my question - can you close the request down please or tell me how to do it.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

This article is in response to a question here (http://www.experts-exchange.com/Other/URLs/Q_28283850.html) at Experts Exchange. The Original Poster has a scanned signature and wants to make the background transparent so that the signature may be pl…
*Adobe Acrobat 9 was used for this article.  Particular steps may vary depending on software versions. Adobe Acrobat has many, many variables that my be utilized to customize your forms for clarity and ease of use. The Form Editing Tool will be y…
In this video, we show how to convert an image-only PDF file into a PDF Searchable Image file, that is, a file with both the image (typically from scanning) and text, which is created in an automated fashion with Optical Character Recognition (OCR) …
In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a c…

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now