Solved

convert pdftotext from cd =text on local directory (preserve pdf title)

Posted on 2003-11-09
6
298 Views
Last Modified: 2013-12-26
Hello,
I've got *.pdf on a cd. I'd like to convert all *.pdf using pdftotext to a local directory. The final output would be a text file with the name of the pdf.
For example
~cdrom/myfirst.pdf
~cdrom/mysecond.pdf
would be converted to text on a local directory
~targettextdir/myfirst.txt
~targettextdir/mysecond.txt


The directory is
/mnt/cdrom1/PDFs/

I want to use
pdftotext -layout

#and use the pdf file as the title of the text file
#I need to get all .pdfs in the directory
#then convert them to .txt
pdftotext -layout /mnt/cdrom1/PDFs/*.pdf
#the finished text files will have the original pdf title
#and then put the converted text files to a directory
/ConvertedPDFtoText

0
Comment
Question by:dgdaniels
6 Comments
 

Author Comment

by:dgdaniels
ID: 9710952
Ok, trying to sort it out myself...Help appreciated! I lifted choice chunks from other posts
 
The script will be used multiple times on different CDs of PDFs

My new script =
recursivepdftotext.sh

!/bin/bash
#####################################################################
# setup variables
#####################################################################
PDFDIR=' /mnt/cdrom1/PDFs/'
#####################################################################
# Convert pdf files to text
####################################################################
cd ${PDFDIR}

for files in ${PDFDIR}#<--------I need to tell it to look for *.pdfs but not sure how to do that
#Maybe for *.pdf files in ${PDFDIR}???
do
#       if [ -f ${files} ]; then
        echo converting ${files} >>/tmp/convertedpdf #<-------- I like seeing what's happening
        pdftotext ${files} --layout ${files}.txt >>~/ConvertedPDFtoText
done

or this one?
####################################
#!/bin/bash
#probably need to set a variable for the PDF files
!/bin/bash
#####################################################################
# setup variables
#####################################################################
PDFDIR=' /mnt/cdrom1/PDFs/'
FILE=*.pdf #<--------I don't think that's it :(
#copy it
cp /mnt/cdrom1/PDFs/$FILE ~/tmp/PDF

#do  processing here
for f in $*{PDFDIR}
do
  mv $f $N.pdf
  pdftotext -layout $N
  echo $f renamed into $N.txt #<-------- I like seeing what's happening
done
exit 0
############################
0
 

Author Comment

by:dgdaniels
ID: 9711321
ok getting closer

This command will do what I want from the /mnt/cdrom1

for i in `find -name '*.pdf'`; do pdftotext -layout $i ~/convertedpdfs/$i.txt; done

but, I'm pulling it from the cdrom at conversion time which is slow and hard on the cd drive... I need to move all of the files first...

the -layout option does not take args which gave me some trouble
0
 
LVL 9

Expert Comment

by:HamdyHassan
ID: 9715771

Try the following

cd $PDFDIR
for i in `ls *.pdf`
do
   echo $i
done


If you have pdf at sub-folder, then you need find command

cd $PDFDIR
for i in `find . -name ".pdf"  `
do
   echo $i
done
0
 
LVL 48

Accepted Solution

by:
Tintin earned 250 total points
ID: 9718346
Here's a script that will do what you require

#!/bin/sh
SOURCE=/mnt/cdrom1/PDFs
DEST=/target/dir

#
# If the PDF files are all in one directory, then use
#
# for file in $SOURCE/*.pdf
#
# otherwise use the one below

for file in `find $SOURCE -name "*.pdf"`
do
    filename=`basename $file | sed "s/\.pdf/\.txt/"`
   pdftotext -layout $file >$DEST/$filename
done
0
 
LVL 18

Expert Comment

by:liddler
ID: 10191662
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

Answered by Tintin

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

liddler
EE Cleanup Volunteer
0

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How to create frequencies of a variable from SAS dataset? 10 125
repeateFront java challenge 31 92
wordcount challenge 11 121
Path of Workbook 3 76
In this article, I'll describe -- and show pictures of -- some of the significant additions that have been made available to programmers in the MFC Feature Pack for Visual C++ 2008.  These same feature are in the MFC libraries that come with Visual …
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

805 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question