We help IT Professionals succeed at work.
Get Started

Java and Converting PDF files to XML Documents

dogsareit
dogsareit asked
on
126 Views
Last Modified: 2020-01-12
I need to programmatically convert pdf files to XML to be able to extract data and insert into a database.
I have researched and seen many examples.
My environment is: I am developing using localhost, I have Java 13.0.1 installed , I have set the java bin path in environment variable (rebooted afterwards). I have both inetpub and wampserver installed (listening on different ports) and have successfully compiled java classes (beginners examples) on my computer.
I found this coding (listed below) at: https://stackoverflow.com/questions/16936013/java-code-for-pdf-to-xml-conversion.

I am not very skilled at Java. I compiled the class  at the cmd line: javac c:\wamp\www\PDFConvert\ConvertPDFToXML.java and receive the errors (36 of them!). The errors are concern with the first 3 lines after the public class declaration  - static StreamResult streamResult;  static TransformerHandler handler; static AttributesImpl atts;
The errors are  "cannot find static streamResult steamResult" ; "cannot find static streamResult TransformerHandler " ;"cannot find static streamResult AttributesImpl " for each time the above 3 appeared in the coding.
SO I decided to add the following code to the top of the coding:
import java.util.stream;
import javax.xml.transform.sax;
import org.xml.sax.helpers;

Open in new window


That just resulted in producing same type of errors for those lines.
I have attached a screenshot of the errors.
Could someone be as so kind as to help and educate me in what I am doing wrong ?? I don't know what I am doing wrong.
Below is the complete coding - including what I inserted (first 3 lines).

import java.util.stream;
import javax.xml.transform.sax;
import org.xml.sax.helpers;
// FROM:  https://stackoverflow.com/questions/16936013/java-code-for-pdf-to-xml-conversion


public class ConvertPDFToXML {
            static StreamResult streamResult;
            static TransformerHandler handler;
            static AttributesImpl atts;

            public static void main(String[] args) throws IOException {

                    try {
                            Document document = new Document();
                            document.open();
                            PdfReader reader = new PdfReader("C:\\PaymodeRCCL.pdf");
                            PdfDictionary page = reader.getPageN(1);
                            PRIndirectReference objectReference = (PRIndirectReference) page
                                            .get(PdfName.CONTENTS);
                            PRStream stream = (PRStream) PdfReader
                                            .getPdfObject(objectReference);
                            byte[] streamBytes = PdfReader.getStreamBytes(stream);
                            PRTokeniser tokenizer = new PRTokeniser(streamBytes);

                            StringBuffer strbufe = new StringBuffer();
                            while (tokenizer.nextToken()) {
                                    if (tokenizer.getTokenType() == PRTokeniser.TK_STRING) {
                                            strbufe.append(tokenizer.getStringValue());
                                    }
                            }
                            String test = strbufe.toString();
                            streamResult = new StreamResult("data.xml");
                            initXML();
                            process(test);
                            closeXML();
                            document.add(new Paragraph(".."));
                            document.close();
                    } catch (Exception e) {
                    }
            }

            public static void initXML() throws ParserConfigurationException,
                            TransformerConfigurationException, SAXException {
                    SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory
                                    .newInstance();

                    handler = tf.newTransformerHandler();
                    Transformer serializer = handler.getTransformer();
                    serializer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
                    serializer.setOutputProperty(
                                    "{http://xml.apache.org/xslt}indent-amount", "4");
                    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
                    handler.setResult(streamResult);
                    handler.startDocument();
                    atts = new AttributesImpl();
                    handler.startElement("", "", "data", atts);
            }

            public static void process(String s) throws SAXException {
                    String[] elements = s.split("\\|");
                    atts.clear();
                    handler.startElement("", "", "Message", atts);
                    handler.characters(elements[0].toCharArray(), 0, elements[0].length());
                    handler.endElement("", "", "Message");
            }

            public static void closeXML() throws SAXException {
                    handler.endElement("", "", "data");
                    handler.endDocument();
            }
    }

Open in new window


Screenshot of errors

Error Messages when compiling Java Class
Comment
Watch Question
CERTIFIED EXPERT
Commented:
This problem has been solved!
Unlock 1 Answer and 6 Comments.
See Answer
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE