troubleshooting Question

Java and Converting PDF files to XML Documents

Avatar of dogsareit
dogsareitFlag for United States of America asked on
PDFJavaXML
6 Comments1 Solution128 ViewsLast Modified:
I need to programmatically convert pdf files to XML to be able to extract data and insert into a database.
I have researched and seen many examples.
My environment is: I am developing using localhost, I have Java 13.0.1 installed , I have set the java bin path in environment variable (rebooted afterwards). I have both inetpub and wampserver installed (listening on different ports) and have successfully compiled java classes (beginners examples) on my computer.
I found this coding (listed below) at: https://stackoverflow.com/questions/16936013/java-code-for-pdf-to-xml-conversion.

I am not very skilled at Java. I compiled the class  at the cmd line: javac c:\wamp\www\PDFConvert\ConvertPDFToXML.java and receive the errors (36 of them!). The errors are concern with the first 3 lines after the public class declaration  - static StreamResult streamResult;  static TransformerHandler handler; static AttributesImpl atts;
The errors are  "cannot find static streamResult steamResult" ; "cannot find static streamResult TransformerHandler " ;"cannot find static streamResult AttributesImpl " for each time the above 3 appeared in the coding.
SO I decided to add the following code to the top of the coding:
import java.util.stream;
import javax.xml.transform.sax;
import org.xml.sax.helpers;

That just resulted in producing same type of errors for those lines.
I have attached a screenshot of the errors.
Could someone be as so kind as to help and educate me in what I am doing wrong ?? I don't know what I am doing wrong.
Below is the complete coding - including what I inserted (first 3 lines).

import java.util.stream;
import javax.xml.transform.sax;
import org.xml.sax.helpers;
// FROM:  https://stackoverflow.com/questions/16936013/java-code-for-pdf-to-xml-conversion


public class ConvertPDFToXML {
            static StreamResult streamResult;
            static TransformerHandler handler;
            static AttributesImpl atts;

            public static void main(String[] args) throws IOException {

                    try {
                            Document document = new Document();
                            document.open();
                            PdfReader reader = new PdfReader("C:\\PaymodeRCCL.pdf");
                            PdfDictionary page = reader.getPageN(1);
                            PRIndirectReference objectReference = (PRIndirectReference) page
                                            .get(PdfName.CONTENTS);
                            PRStream stream = (PRStream) PdfReader
                                            .getPdfObject(objectReference);
                            byte[] streamBytes = PdfReader.getStreamBytes(stream);
                            PRTokeniser tokenizer = new PRTokeniser(streamBytes);

                            StringBuffer strbufe = new StringBuffer();
                            while (tokenizer.nextToken()) {
                                    if (tokenizer.getTokenType() == PRTokeniser.TK_STRING) {
                                            strbufe.append(tokenizer.getStringValue());
                                    }
                            }
                            String test = strbufe.toString();
                            streamResult = new StreamResult("data.xml");
                            initXML();
                            process(test);
                            closeXML();
                            document.add(new Paragraph(".."));
                            document.close();
                    } catch (Exception e) {
                    }
            }

            public static void initXML() throws ParserConfigurationException,
                            TransformerConfigurationException, SAXException {
                    SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory
                                    .newInstance();

                    handler = tf.newTransformerHandler();
                    Transformer serializer = handler.getTransformer();
                    serializer.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
                    serializer.setOutputProperty(
                                    "{http://xml.apache.org/xslt}indent-amount", "4");
                    serializer.setOutputProperty(OutputKeys.INDENT, "yes");
                    handler.setResult(streamResult);
                    handler.startDocument();
                    atts = new AttributesImpl();
                    handler.startElement("", "", "data", atts);
            }

            public static void process(String s) throws SAXException {
                    String[] elements = s.split("\\|");
                    atts.clear();
                    handler.startElement("", "", "Message", atts);
                    handler.characters(elements[0].toCharArray(), 0, elements[0].length());
                    handler.endElement("", "", "Message");
            }

            public static void closeXML() throws SAXException {
                    handler.endElement("", "", "data");
                    handler.endDocument();
            }
    }

Screenshot of errors

Error Messages when compiling Java Class
ASKER CERTIFIED SOLUTION
kenfcamp

Our community of experts have been thoroughly vetted for their expertise and industry experience.

Join our community to see this answer!
Unlock 1 Answer and 6 Comments.
Start Free Trial
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 1 Answer and 6 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros