Advertisement

11.05.2007 at 08:45AM PST, ID: 22939487
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

Perl parsing PDF

Tags: perl, pdf, parse
I want to parse the data in a PDf file using Perl. I need to look for the string "Department" and retrieve the value assigned to it. This is the header of the PDF file. Then there are 4 columns in the PDF file. There are values in these columns I need to parse. I can use reg exp for the parse but how do I do that in PDF file

      Student id       credits      fee    scholarship

John
Harry
Start your free trial to view this solution
Question Stats
Zone: Programming
Question Asked By: saibsk
Solution Provided By: saibsk
Participating Experts: 2
Solution Grade: B
Views: 54
Translate:
Loading Advertisement...
11.05.2007 at 09:45AM PST, ID: 20217877

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.05.2007 at 11:15AM PST, ID: 20218546

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.05.2007 at 12:00PM PST, ID: 20219003

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.05.2007 at 12:58PM PST, ID: 20219420

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.05.2007 at 01:20PM PST, ID: 20219601

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.05.2007 at 01:26PM PST, ID: 20219658

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.05.2007 at 01:31PM PST, ID: 20219695

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.05.2007 at 01:43PM PST, ID: 20219775

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.05.2007 at 01:55PM PST, ID: 20219869

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.05.2007 at 02:09PM PST, ID: 20219966

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.06.2007 at 07:22AM PST, ID: 20224561

Rank: Genius

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
11.06.2007 at 08:19AM PST, ID: 20225062

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
01.10.2008 at 02:25PM PST, ID: 20632048

All comments and solutions are available to Premium Service Members only.

Start your 7-day free trial and see for yourself why Experts Exchange is the easiest and most proven technology resource in the world. Get Started

Already a member? Login to view this solution.

 
 
Loading Advertisement...
Microsoft
  • Internet Protocols
  • Applications
  • Development
  • OS
  • Hardware
  • Windows Security
Apple
  • Operating Systems
  • Hardware
  • Programming
  • Networking
  • Software
Internet
  • Search Engines
  • File Sharing
  • WebTrends / Stats
  • Spy / Ad Blockers
  • Web Browsers
  • New Net Users
  • Web Development
  • Chat / IM
  • Anti Spam
  • Web Servers
  • Anti-Virus
  • Email Clients
Gamers
  • Tips
  • Online / MMORPG
  • Puzzle
  • Emulators
  • Action / Adventure
  • Role Playing
  • Consoles
  • Game Programming
  • Strategy
  • Sports
  • Misc
  • Computer Games
Digital Living
  • Hardware
  • Automotive
  • New Net Users
  • New Users
  • Software
  • Digital Music
  • Gaming World
  • Home Security
  • Apple
  • Networking Hardware
Virus & Spyware
  • Vulnerabilities
  • IDS
  • Encryption
  • Anti-Virus
  • Operating Systems Security
  • Software Firewalls
  • WebApplications
  • Cell Phones
  • Operating Systems
  • Internet
  • Hardware Firewalls
Hardware
  • Displays / Monitors
  • Handhelds / PDAs
  • Components
  • Peripherals
  • Laptops/Notebooks
  • Servers
  • Misc
  • Apple
  • Embedded Hardware
  • Networking Hardware
  • Storage
  • Desktops
  • New Users
Software
  • System Utilities
  • Industry Specific
  • Network Management
  • Photos / Graphics
  • Page Layout
  • VMware
  • Misc
  • Web Development
  • OS
  • CYGWIN
  • Voice Recognition
  • Virtualization
  • Message Queue
  • Quality Assurance
  • Security
  • Firewalls
  • MultiMedia Applications
  • Development
  • Database
  • Office / Productivity
  • Business Management
  • OS/2 Apps
  • Server Software
  • Internet / Email
ITPro
  • OS
  • Storage
  • Encryption
  • Operating Systems Security
  • Apple Hardware
  • Laptops & Notebooks
  • Servers
  • Networking Hardware
  • Peripherals
  • Devices
  • Displays / Monitors
  • WebTrends / Stats
  • Search Engines
  • Firewalls
  • Web Computing
  • WebApplications
  • IDS
  • Vulnerabilities
  • Email Clients
  • File Sharing
  • Spy / Ad Blockers
  • Web Browsers
  • Web Servers
  • Networking
  • Anti-Virus
  • Consulting
  • Chat / IM
  • Anti Spam
Developer
  • Web Servers
  • Web Browsers
  • Game Programming
  • Dev Tools
  • Industry Specific
  • Office / Productivity
  • Database
  • CYGWIN
  • Web Development
  • Search Engines
  • File Sharing
  • WebTrends / Stats
  • Programming
  • Content Management
  • Application Servers
  • Protocols
Storage
  • Removable Backup Media
  • Storage Technology
  • Servers
  • Grid
  • Remote Access
  • Backup / Restore
  • Misc
  • Hard Drives
OS
  • Miscellaneous
  • Security
  • Development
  • Linux
  • VMware
  • MainFrame OS
  • Unix
  • Apple
  • OS / 2
  • AS / 400
  • BeOS
  • Microsoft
  • VMS / OpenVMS
Database
  • Oracle
  • Miscellaneous
  • MySQL
  • Software
  • Sybase
  • Contact Management
  • PostgreSQL
  • Data Manipulation
  • Clarion
  • InterSystems Cache
  • Siebel
  • MUMPS
  • OLAP
  • SQLBase
  • SAS
  • GIS & GPS
  • 4GL
  • Berkeley DB
  • DB2
  • Informix
  • Interbase / Firebird
  • FoxPro
  • Reporting
  • LDAP
  • Filemaker Pro
  • MS SQL Server
  • dBase
  • MS Access
Security
  • Misc
  • Web Browsers
  • Software Firewalls
  • Operating Systems Security
  • File Sharing
  • Spy / Ad Blockers
  • Vulnerabilities
  • WebApplications
  • IDS
  • Anti-Virus
  • Encryption
  • Anti Spam
  • Email Clients
  • VPN
  • Chat / IM
Programming
  • Editors IDEs
  • Installation
  • Handhelds / PDAs
  • Multimedia Programming
  • System / Kernel
  • Automation
  • Algorithms
  • Game
  • Signal Processing
  • Project Management
  • Open Source
  • Database
  • Misc
  • Languages
  • Processor Platforms
  • Theory
Web Development
  • Scripting
  • Blogs
  • Web Servers
  • Software
  • Search Engines
  • Web Graphics
  • Web Services
  • Images
  • Internet Marketing
  • Images and Photos
  • Components
  • Document Imaging
  • Web Languages/Standards
  • Illustration
  • WebApplications
  • Fonts
  • WebTrends / Stats
  • Authoring
  • Digital Camera Software
  • Miscellaneous
Networking
  • Protocols
  • Apple Networking
  • Network Management
  • Message Queue
  • Application Servers
  • Content Management
  • File Servers
  • Email Servers
  • Misc
  • Java Editors & IDEs
  • Wireless
  • Networking Hardware
  • Backup / Restore
  • System Utilities
  • ISPs & Hosting
  • Web Servers
  • Storage Technology
  • Removable Backup Media
  • Servers
  • Web Computing
  • Broadband
  • Grid
  • OS / 2
  • Novell Netware
  • Unix Networking
  • Windows Networking
  • Security
  • Telecommunications
  • Operating Systems
  • Linux Networking
Other
  • Lounge
  • Business Travel
  • Community Support
  • New Net Users
  • Philosophy / Religion
  • Math / Science
  • Miscellaneous
  • URLs
  • Expert Lounge
  • Politics
  • Puzzles / Riddles
  • Automotive
Community Support
  • Suggestions
  • New to EE
  • New Topics
  • CleanUp
  • Announcements
  • General
  • Feedback
  • Input
  • EE Bugs
 
11.05.2007 at 09:45AM PST, ID: 20217877

Rank: Genius

That will depend on how the data is setup in the PDF file.  Can you post the PDF file somewhere?
 
11.05.2007 at 11:15AM PST, ID: 20218546
It has this format.

 Department:

            Student id       credits      fee    scholarship

John
Harry

But I can't post the PDF file data. I need the parse the above data in the PDF file.
 
11.05.2007 at 12:00PM PST, ID: 20219003

Rank: Genius

Without seeing the actual data, it will be hard to help.  Anyways, here is something that should get you going.


use CAM::PDF;

my $pdf = CAM::PDF->new('test1.pdf');
my $page1 = $pdf->getPageContent(1);  #or whatever page you need

my @lines=split(/\n/, $page1);
my $names=0;
foreach (@lines) {
      if(/Student id\s+credits\s+fee\s+scholarship/) {
            $names=1;
      }
      elsif($names==1) {
            if(/(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/) {
                  print "Name=$1\n";
                  print "ID=$2\n";
                  print "Credits=$3\n";
                  print "Fee=$4\n";
                  print "Scholarship=$5\n";
            }
      }
}

 
11.05.2007 at 12:58PM PST, ID: 20219420
But what if there is more than 1 page in a PDF document?
 
11.05.2007 at 01:20PM PST, ID: 20219601

Rank: Genius

Does each page have the "Student id       credits      fee    scholarship" header, or is that only listed once?

You can get the number of pages, and loop through all of them with:
my $pages = $pdf->numPages();
for(1..$pages) {
    my $page = $pdf->getPageText($_);
    .....
}


 
11.05.2007 at 01:26PM PST, ID: 20219658
#!/usr/bin/perl
      
      use CAM::PDF;
      
      $fileName = 'test.pdf';
      
      print "File: $fileName";
      
      my $pdf = CAM::PDF->new($filename);
      
      print "PDF:$pdf";
      
      
      my $page1 = $pdf->getPageContent(1);

When I execute this code it says cannot call the method getPageContent on undefined variable. Additionally I tried print the $pdf that is empty.

THe headers are listed in each page.
 
11.05.2007 at 01:31PM PST, ID: 20219695

Rank: Genius

what is the output from this:
...
my $pdf = CAM::PDF->new($filename) or die "Could not create CAM::PDF:\n  $!\n  $@\n";
...
 
11.05.2007 at 01:43PM PST, ID: 20219775
It says no such file or directory. I am in the /export/home/user/Students directory. My perl script and the pdf file are both located in the Students dir. I tried both  giving the full path to the file and then executing and just with the file name. Gives the same error.
 
11.05.2007 at 01:55PM PST, ID: 20219869

Rank: Genius

Do you have proper permissions to on the directory and file?

my $fileName = 'test.pdf';
die "File does not exist\n" unless -e $fileName;
die "File is not readable\n" unless -r $fileName;
my $pdf = CAM::PDF->new($filename);
die "Could not create CAM::PDF:\n  $!\n  $@\n" unless $pdf;
 
11.05.2007 at 02:09PM PST, ID: 20219966
Use of uninitialized value in pattern match (m//) at /usr/perl5/site_perl/5.8.4/CAM/PDF.pm line 293.
Use of uninitialized value in length at /usr/perl5/site_perl/5.8.4/CAM/PDF.pm line 303.
Use of uninitialized value in string eq at /usr/perl5/site_perl/5.8.4/CAM/PDF.pm line 306.
Use of uninitialized value in open at /usr/perl5/site_perl/5.8.4/CAM/PDF.pm line 320.
Use of uninitialized value in concatenation (.) or string at /usr/perl5/site_perl/5.8.4/CAM/PDF.pm line 322.
Could not create CAM::PDF:
  No such file or directory

prints the above
 
11.06.2007 at 07:22AM PST, ID: 20224561

Rank: Genius

Maybe the CAM::PDF module isn't installed properly, or you have an old version.  Try upgrading to the latest, or reinstalling.
 
11.06.2007 at 08:19AM PST, ID: 20225062
I got the pdftotext tool installed on my system. For now I am able to convert the file to text.
Accepted Solution
 
01.10.2008 at 02:25PM PST, ID: 20632048
A request has been made in Community Support to close this question:
http://www.experts-exchange.com/Q_23073853.html

If there are no objections, a moderator will finalize this question in approximately 4 days as follows:
PAQ with refund using {http:#a20225062}

Please leave any recommendations here.

Vee_Mod
Community Support Moderator
 
 
01.14.2008 at 03:38PM PST, ID: 20658465
Closed, 500 points refunded.
Vee_Mod
Community Support Moderator
 
 
 
20080236-EE-VQP-29 / EE_QW_2_20070628