?
Solved

Convert PDF tables into Excel

Posted on 2002-07-01
8
Medium Priority
?
3,580 Views
Last Modified: 2009-12-16
I know this is supposed to be possible with acrobat 5.0 (not reader), but i only have the reader and was wondering if there is a way of converting a table (which spans 931 pages with a heading at the top of the page - which prevents easy copying and pasting) that is stored in PDF format into excel.
Would acrobat 5 be able to do this anyway as it spans so many pages? - This will be my last resort - buying the full version.
As this is urgent I will post the same question in the Microsoft Office section - so if more than one person gives me a helpful answer i can share points about.
0
Comment
Question by:dcollis
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
8 Comments
 
LVL 8

Expert Comment

by:tskelly082598
ID: 7122373
I searched and located these instructions for using Adobe Acrobat (the full version).

http://www.library.mcgill.ca/edrs/services/publications/howto/PDFtoXLS/PDFtoExcel.html
0
 

Author Comment

by:dcollis
ID: 7122388
yes i also found this, but was not sure whether it would allow me to do all 931 pages at once. it talks about selecting text and then exporting it to a text file and then importing it.
Not sure if it would let me do entire document.
0
 
LVL 30

Expert Comment

by:weed
ID: 7122500
You cant do any sort of exports from the reader version. You do need Acrobat to do that.
0
On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

 

Author Comment

by:dcollis
ID: 7123358
okay never mind -
Have copied and pasted text from acrobat reader into text file.
Then wrote a java program to extract the information and put relevant separators in e.g. "#". Then imported into excel using the "#" as a delimiter.

If anyone wants to see my code in the future let me know.

I will cancel this question now - thanks for the replies i did get though.
0
 
LVL 27

Expert Comment

by:Asta Cu
ID: 7146475
Rather than deleting this, you may consider asking Community Support for a refund and moving this to our PAQ instead where it can help others.  This would especially helpful if you added the specifics you created to address this.

":0)  Asta
0
 
LVL 6

Accepted Solution

by:
Mindphaser earned 0 total points
ID: 7149242
Points refunded and moved to PAQ

** Mindphaser - Community Support Moderator **
0
 
LVL 27

Expert Comment

by:Asta Cu
ID: 7149263
Thank you, Mindphaser for your excellent and quick response.
":0) Asta
0
 

Author Comment

by:dcollis
ID: 7367316
Had alot of email about this recently so to update this with a (slightly better explanation)

not sure how helpful my code will be, but i'll explain basically what i
did -
Adobe acrobat viewer allows you to select and copy all text - so i selected
all the text and pasted it into notepad (all 100000 lines of it).
unfortunately the problem is that it does not retain its formating once in
notepad, so there is seemingly no way to import it into excel - i.e. no
separators etc.
Luckily in this case, the data was relatively uniform, and i was able to run
a program that went through the text file line by line and inserted
separators (e.g. # - or any symbol that currently doesn't appear in the pdf
text). - The hard part here is getting the program to know where to insert separators...
This would not work in all cases - when data is *very* messy with no pattern
at all, it will be close to impossible to get a program to recognise where
to put in the separators.
Anyway , once the separators are in, it is a simple process to import into excel.

Apparently acrobat full version will export to excel properly so if you
can't do it manually then try and hold of a copy....

but anyway, the java code ---

[code]
import java.io.*;

public class textedit
{
      public static void main(String[] args)
      {
            loadfile(args[0]);
      }

      
      // Okay - this basically just loads the file by line into an array so i can run a program on it...

      public static void loadfile (String  file)
      {
            boolean Eof = true;
            int lnum = 0;
            String line;
            String [] textarray = new String[100000];
            try
            {
                  FileReader fr = new FileReader(file);
                  BufferedReader inFile = new BufferedReader(fr);
                  while (Eof)
                  {
                        line = inFile.readLine();
                        if (line == null)
                        {
                              Eof = false;
                        }
                        else
                        {
                              textarray[lnum] = line;
                              lnum = lnum +1;

                        }
                  }
            }

            catch(FileNotFoundException e)
            {
                  System.err.println("Caught FileNotFoundException: " + e.getMessage());
            }
            catch(IOException e)
            {
                  System.err.println("Caught IOException: " + e.getMessage());
            }

            checkit(textarray,lnum);
            //tester(textarray,lnum);
      }

      // This is the part that does the actual work...
      public static void checkit(String [] line, int lnum)
      {
            char [] chArray;
            String temp = "";
            boolean norec = false;
            for (int i=0;i < lnum;i++)
            {
                  chArray = line[i].toCharArray();
                  for (int t = 0;t<chArray.length - 4;t++)
                  {
                        //Luckily for me the word Flik appeared on every single line, so i used it
                        //as a reference to work from - basically you need to find some kind of
                        //algorithm that will allow you to divide every line with separators
                        if       ( (chArray[t] == 'F') &&
                              (chArray[t+1] == 'l') &&
                              (chArray[t+2] == 'i') &&
                              (chArray[t+3] == 'k'))
                        {
                                    chArray[t-1] = '#';
                                    for (int z = (t+5);z<chArray.length-2;z++)
                                    {
                                          if (chArray[z] == ' '){
                                                chArray[z] = '#';
                                          }
                                    }
                                    if (chArray[t-3]==' '){chArray[t-3] = '#';}
                                    else{ if (chArray[t-4]==' '){chArray[t-4] = '#';norec = true;}
                                    else{ if (chArray[t-5]==' '){chArray[t-5] = '#';norec = true;}
                                    else{ if (chArray[t-6]==' '){chArray[t-6] = '#';norec = true;}
                                    else{ if (chArray[t-7]==' '){chArray[t-7] = '#';norec = true;}
                                    }}}}
                                    // There were times when a field was missing on a record, so i replaced # with ^
                                    //So I knew that it should be a double space. I late replaced ^ with ## with a text editor
                                    if (!norec){
                                          chArray[t-1] = '^';
                                    }
                                    norec = false;
                        }
                  }


                  for (int x = 0;x<chArray.length;x++)
                  {
                        temp = temp + chArray[x];
                  }
                  line[i] = temp;
                  temp = "";

            }
            tester(line,lnum);
      }

      // And now we print it out...
      public static void tester(String [] line, int lnum)
      {
            for (int i=0;i < lnum;i++)
            {
                  System.out.println(line[i]);

            }
      }


}
[/code]

And this can be run to export into a text file by typing
java textedit filename.txt > newfile.txt

Not great code, but it works...
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Getting information about Fonts being used in a PDF file A colleague of mine recently faced an issue related to the PDF file format. The PDFs were containing mission critical client information, they were successfully mailed but there was a sm…
In a previous article published here at Experts Exchange, Signature Image with Transparent Background (http://www.experts-exchange.com/Web_Development/Document_Imaging/A_12380-Signature-Image-with-Transparent-Background.html), I explained how to cre…
In this video, we show how to perform Bates Numbering/Stamping of PDF documents using Power PDF Advanced, the newest product from the Document Imaging division of Nuance Communications. There are two editions of Power PDF — Standard and Advanced. Th…
Sometimes we receive PDF files that are in the wrong orientation. They may be sideways or even upside down. This most commonly happens with scanned or faxed documents. It is possible to rotate the view of these PDFs with the free Adobe Reader produc…
Suggested Courses

649 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question