Link to home
Start Free TrialLog in
Avatar of kokane
kokane

asked on

PDF Fields

I want to read the fields in a PDF file , is there an adobe component, or other component, that I can reference in Access/VBA and/or inVB.Net to reliably read the PD document fields with a view to extracting the data and updating a database?

I appreciate that adobe components may require purchase of their products but I need to know which one.  I would prefer not to have to use a third party software as this would require updates/licensing and I will be supplying to a company who already have adobe pdf writer installed.


Avatar of Karl Heinz Kremer
Karl Heinz Kremer
Flag of United States of America image

You can use the Acrobat IAC interface to read form fields in your VB/VBA or .NET application. Take a lookecat http ://www.adobe.com/devnet/acrobat for more information. If you want a free 3rd party component, try the .Net version of Text: http://sourceforge.net/projects/itextsharp/
Avatar of kokane
kokane

ASKER

Thanks khkremer that give me a healthy nudge in the right direction.

However I  would like more specific information regarding  the adobe component that would allow me to read PDF fields in vba/vb.net. Specifically I need to know which products it ships with so I can check if the customer I want to supply the software to has already got it installed as he already has a PDF writer and also because i want to be able to download it and trial it with Access VBA and or VB.Net. The component mentioned in answer was Acrobat IAC interface  and I was directed to the Acrobat website to locate further information rather than the specfic information I require.  There seems a real shortage of anyone saying this method is successfull when I look on Google which leads me to beleive that it may work in theory rather than in practice. I am also hoping to use VB.NET Exress 2008 or VBA and most references seem to be for problems witihin Visual Studio and be out of  date.

I find it very suprising that I cant find a single working example on the web -  which leads me to think the 3rd party route is the only viable one and  itextsharp for example seems to have many posts suggesting compatability problems.
It's part of Adobe Acrobat. You can find the API here: http://livedocs.adobe.com/acrobat_sdk/9.1/Acrobat9_1_HTMLHelp/IAC_API_OLE_Objects.103.1.html (click on the little button in the upper left corner to show the navigation pane.

Let me see if I can whip up an example for you. So, just to recap, you want to read a form field in VBA. I don't use Access, but I can give you a Word or Excel example.
ASKER CERTIFIED SOLUTION
Avatar of Karl Heinz Kremer
Karl Heinz Kremer
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of kokane

ASKER

khkremer,

That is excellent thanks . Looks very doable - presumably it is helpful if the field names are meaningful, or are they just named  Text1-n sequentailly?

My only remaining problem is to work out which Acrobat product I have to buy -I asked this from Adobe technical support some time ago and the guy firstly insisted that there was no such thing and then directed me to the hlp pages - Ideally I would like to download a trrial version and test it rather than buying it and finding there is a problem. Dont like to be greed but any ideas? I 'm off now to have a look.
You can name the fields whatever you want - of course, having meaningful names is better than just using GUIDs for example.

You need Adobe Acrobat - the full version (either Standard, Pro or Pro Extended). The free Reader does not provide these APIs.

There is a 30 day free trial for Adobe Acrobat Pro and Pro Ex at http://www.adobe.com/products/acrobat/ - just click on the "Try" button below the product image.
Avatar of kokane

ASKER

khkremer

Thanks again I was just returning to say I  was downloading the Pro Verison having read your previous article on the topic - I'm developing a warm feeling that this fecking thing is going to work.
   
Let me know if you run into problems. As I said, I am not familiar with Access, but anything else I can help you with.
Avatar of kokane

ASKER

I have downloaded code now compiles in Access and Excel (2003)  but I get an error on the read field - I have checked the pdf document to grt the name of one field - in this case has a name of 'WB1.2.Title' which I am using "Field1 = jso.getField("WMB1.2.Title").Value"   I have References Adobe Arobat 9.0 Type Library and Acess 3.0 type Library selected.

The error is Run-Time error '91' Object variable or With Object variable not set - the same in both Excel and Access. This error is caused by "You attempted to use an object variable that isn't yet referencing a valid object."

It does not complain with the line "Set jso = theForm.GetJSObject" which seems strange given the error message.

Is there some reference to the Javascript I am missing in the VBA? Alternatively is there a simple command other than getfield that will tell if it can do anything and perhaps it is just this type of command that is not working?

I will try tomorrow on my other machine which has got office 2007 - bit someone on your website said they had got it (or soemthing similar) working with 2003.

Is this a Designer form?
Avatar of kokane

ASKER

Is this a Designer form?

Sorry not sure what that is - a type of PDF document? I know they use to collect company information and they have a PDF writer.
There are two different form systems build into Acrobat: AcroForms and XFA or Designer forms. Bring up the document information (Ctrl-D) and see what application created the document.
Avatar of kokane

ASKER

PDF Producer: Acrobat Distiller9.0.0 (Macintosh)
PDF Version 1.6 (Acrobat 7.x)
 
Would you be able to share the file with me? It looks like an Acroform, so it should work with the code you have.
Avatar of kokane

ASKER

Prefer if not publically viewable - any other means of sending it ?.  Perhaps I can amend it.

I just tried it with another PDF (template1 which comes with the version 9 )and it worked first time, then I updated the field and tried it again and it failed with the same error as before. Will retry that in case I didi something silly.
Avatar of kokane

ASKER

Yes exactly the same error when I try to read an Adobe File (template1) which has been updated by me but fine if not updated.  i will try another field in the updateddocument to see if it is the field or the fact that the document has been updated.
Avatar of kokane

ASKER

Apologies just tried that agian and sucessfully reading the updated file. It is half 2 here in the UK beginning to flag.

Is there anyway I can send you the file privately
Avatar of kokane

ASKER

I have now realised that it is folder problem I was pointing it to the wrong place - apologies and thanks for your help.

Avatar of kokane

ASKER

Please reply to this so I can award you the points - and thanks again.
See my profile page, there is an email address you can use to send the file to me.
Avatar of kokane

ASKER

khkremer:

As a supplementary question (greedy type me) any idea how you can extract ALL the fields names in a PDF document - this would remove the requirement to identify them first and simply move the values to corresponding DF fields which you could dynamically create?

Avatar of kokane

ASKER

Moderator,

I did not email to reolve the issue  - the confusion has arisen because I mis-implmented a previous answer i was given and realised after a number of other message were posted. Is it acceptable that I now seek addtional information to my original question AFTER I  have selected the answer from above?  
SouthMod - I don't think we've met before. As you can see, I've been doing this for quite some time, and I think I know how EE works. Sometimes it is necessary to have access to a file that the asker does not want to make available publicly (for a number of reasons), and in such cases, I offer to receive such a file in a private email to me. That does not mean that the question would be resolved in public email - it's just use to provide information that is required to resolve the issue, the remainder of the resolution would still be posted on EE (e.g. as in "you've used the wrong field name when trying to extract data from the document, replace xyz with abc and it will work"). Also, I don't understand why you are under the impression that kokane did answer his/her own question. As a moderator stepping into a long exchange of information, I expect you to read the whole exchange, and understand how we ended up here. I did clearly provide a sample script that the asker used to validate that Acrobat can actually be controlled via VBA (in http:#a33743786).
kokane - I think you should accept the comment that has the code snippet as the answer to this question. The discussion we had afterwards was about getting Acrobat and do not really provide any more information.

As far as your related question goes, I would post a second question, and reference this question in it. This way, if somebody else wants to participate, they will have all the information.

Hope that helps.
Now that the question is closed, there is a link to "ask a related question" in the comment box, use that to create your new question.
Thanks.
Avatar of kokane

ASKER

SouthMod

"Which seemed to indicate that the Author had resolved their issue outside of EE assistance. Of course, if I'm wrong, then the author is free to choose whatever method is appropriate to close the matter."

I obviously should have made that clearer - I had not implemented the solution correctly as given earlier by khkremer.  As soon as I realised this I sent the message expalining that I was looking in the wrong folder.

On another point  I was advised via the on-line help to make a request for assistance I now undrestand I should have simply added another comment to my question - can you close the request down please or tell me how to do it.