?
Solved

convert pdftotext from cd =text on local directory (preserve pdf title)

Posted on 2003-11-09
6
Medium Priority
?
337 Views
Last Modified: 2013-12-26
Hello,
I've got *.pdf on a cd. I'd like to convert all *.pdf using pdftotext to a local directory. The final output would be a text file with the name of the pdf.
For example
~cdrom/myfirst.pdf
~cdrom/mysecond.pdf
would be converted to text on a local directory
~targettextdir/myfirst.txt
~targettextdir/mysecond.txt


The directory is
/mnt/cdrom1/PDFs/

I want to use
pdftotext -layout

#and use the pdf file as the title of the text file
#I need to get all .pdfs in the directory
#then convert them to .txt
pdftotext -layout /mnt/cdrom1/PDFs/*.pdf
#the finished text files will have the original pdf title
#and then put the converted text files to a directory
/ConvertedPDFtoText

0
Comment
Question by:dgdaniels
5 Comments
 

Author Comment

by:dgdaniels
ID: 9710952
Ok, trying to sort it out myself...Help appreciated! I lifted choice chunks from other posts
 
The script will be used multiple times on different CDs of PDFs

My new script =
recursivepdftotext.sh

!/bin/bash
#####################################################################
# setup variables
#####################################################################
PDFDIR=' /mnt/cdrom1/PDFs/'
#####################################################################
# Convert pdf files to text
####################################################################
cd ${PDFDIR}

for files in ${PDFDIR}#<--------I need to tell it to look for *.pdfs but not sure how to do that
#Maybe for *.pdf files in ${PDFDIR}???
do
#       if [ -f ${files} ]; then
        echo converting ${files} >>/tmp/convertedpdf #<-------- I like seeing what's happening
        pdftotext ${files} --layout ${files}.txt >>~/ConvertedPDFtoText
done

or this one?
####################################
#!/bin/bash
#probably need to set a variable for the PDF files
!/bin/bash
#####################################################################
# setup variables
#####################################################################
PDFDIR=' /mnt/cdrom1/PDFs/'
FILE=*.pdf #<--------I don't think that's it :(
#copy it
cp /mnt/cdrom1/PDFs/$FILE ~/tmp/PDF

#do  processing here
for f in $*{PDFDIR}
do
  mv $f $N.pdf
  pdftotext -layout $N
  echo $f renamed into $N.txt #<-------- I like seeing what's happening
done
exit 0
############################
0
 

Author Comment

by:dgdaniels
ID: 9711321
ok getting closer

This command will do what I want from the /mnt/cdrom1

for i in `find -name '*.pdf'`; do pdftotext -layout $i ~/convertedpdfs/$i.txt; done

but, I'm pulling it from the cdrom at conversion time which is slow and hard on the cd drive... I need to move all of the files first...

the -layout option does not take args which gave me some trouble
0
 
LVL 9

Expert Comment

by:HamdyHassan
ID: 9715771

Try the following

cd $PDFDIR
for i in `ls *.pdf`
do
   echo $i
done


If you have pdf at sub-folder, then you need find command

cd $PDFDIR
for i in `find . -name ".pdf"  `
do
   echo $i
done
0
 
LVL 48

Accepted Solution

by:
Tintin earned 1000 total points
ID: 9718346
Here's a script that will do what you require

#!/bin/sh
SOURCE=/mnt/cdrom1/PDFs
DEST=/target/dir

#
# If the PDF files are all in one directory, then use
#
# for file in $SOURCE/*.pdf
#
# otherwise use the one below

for file in `find $SOURCE -name "*.pdf"`
do
    filename=`basename $file | sed "s/\.pdf/\.txt/"`
   pdftotext -layout $file >$DEST/$filename
done
0
 
LVL 18

Expert Comment

by:liddler
ID: 10191662
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

Answered by Tintin

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

liddler
EE Cleanup Volunteer
0

Featured Post

Upgrade your Question Security!

Add Premium security features to your question to ensure its privacy or anonymity. Learn more about your ability to control Question Security today.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: Load and Save to file, Document-View interaction inside the SDI. Continuing from the second article about sudoku.   Open the project in visual studio. From the class view select CSudokuDoc and double click to open the header …
Ready to get certified? Check out some courses that help you prepare for third-party exams.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
When cloud platforms entered the scene, users and companies jumped on board to take advantage of the many benefits, like the ability to work and connect with company information from various locations. What many didn't foresee was the increased risk…
Suggested Courses
Course of the Month14 days, 2 hours left to enroll

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question