Solved

convert pdftotext from cd =text on local directory (preserve pdf title)

Posted on 2003-11-09
6
288 Views
Last Modified: 2013-12-26
Hello,
I've got *.pdf on a cd. I'd like to convert all *.pdf using pdftotext to a local directory. The final output would be a text file with the name of the pdf.
For example
~cdrom/myfirst.pdf
~cdrom/mysecond.pdf
would be converted to text on a local directory
~targettextdir/myfirst.txt
~targettextdir/mysecond.txt


The directory is
/mnt/cdrom1/PDFs/

I want to use
pdftotext -layout

#and use the pdf file as the title of the text file
#I need to get all .pdfs in the directory
#then convert them to .txt
pdftotext -layout /mnt/cdrom1/PDFs/*.pdf
#the finished text files will have the original pdf title
#and then put the converted text files to a directory
/ConvertedPDFtoText

0
Comment
Question by:dgdaniels
6 Comments
 

Author Comment

by:dgdaniels
ID: 9710952
Ok, trying to sort it out myself...Help appreciated! I lifted choice chunks from other posts
 
The script will be used multiple times on different CDs of PDFs

My new script =
recursivepdftotext.sh

!/bin/bash
#####################################################################
# setup variables
#####################################################################
PDFDIR=' /mnt/cdrom1/PDFs/'
#####################################################################
# Convert pdf files to text
####################################################################
cd ${PDFDIR}

for files in ${PDFDIR}#<--------I need to tell it to look for *.pdfs but not sure how to do that
#Maybe for *.pdf files in ${PDFDIR}???
do
#       if [ -f ${files} ]; then
        echo converting ${files} >>/tmp/convertedpdf #<-------- I like seeing what's happening
        pdftotext ${files} --layout ${files}.txt >>~/ConvertedPDFtoText
done

or this one?
####################################
#!/bin/bash
#probably need to set a variable for the PDF files
!/bin/bash
#####################################################################
# setup variables
#####################################################################
PDFDIR=' /mnt/cdrom1/PDFs/'
FILE=*.pdf #<--------I don't think that's it :(
#copy it
cp /mnt/cdrom1/PDFs/$FILE ~/tmp/PDF

#do  processing here
for f in $*{PDFDIR}
do
  mv $f $N.pdf
  pdftotext -layout $N
  echo $f renamed into $N.txt #<-------- I like seeing what's happening
done
exit 0
############################
0
 

Author Comment

by:dgdaniels
ID: 9711321
ok getting closer

This command will do what I want from the /mnt/cdrom1

for i in `find -name '*.pdf'`; do pdftotext -layout $i ~/convertedpdfs/$i.txt; done

but, I'm pulling it from the cdrom at conversion time which is slow and hard on the cd drive... I need to move all of the files first...

the -layout option does not take args which gave me some trouble
0
 
LVL 9

Expert Comment

by:HamdyHassan
ID: 9715771

Try the following

cd $PDFDIR
for i in `ls *.pdf`
do
   echo $i
done


If you have pdf at sub-folder, then you need find command

cd $PDFDIR
for i in `find . -name ".pdf"  `
do
   echo $i
done
0
 
LVL 48

Accepted Solution

by:
Tintin earned 250 total points
ID: 9718346
Here's a script that will do what you require

#!/bin/sh
SOURCE=/mnt/cdrom1/PDFs
DEST=/target/dir

#
# If the PDF files are all in one directory, then use
#
# for file in $SOURCE/*.pdf
#
# otherwise use the one below

for file in `find $SOURCE -name "*.pdf"`
do
    filename=`basename $file | sed "s/\.pdf/\.txt/"`
   pdftotext -layout $file >$DEST/$filename
done
0
 
LVL 18

Expert Comment

by:liddler
ID: 10191662
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

Answered by Tintin

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

liddler
EE Cleanup Volunteer
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How to create frequencies of a variable from SAS dataset? 10 122
centeredAverage challenge 8 132
NotAlone Challenge 20 80
canBalance challenge 34 70
Introduction: Ownerdraw of the grid button.  A singleton class implentation and usage. Continuing from the fifth article about sudoku.   Open the project in visual studio. Go to the class view – CGridButton should be visible as a class.  R…
Introduction: Dialogs (1) modal - maintaining the database. Continuing from the ninth article about sudoku.   You might have heard of modal and modeless dialogs.  Here with this Sudoku application will we use one of each type: a modal dialog …
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
This Micro Tutorial demonstrates using Microsoft Excel pivot tables, how to reverse engineer competitors' marketing strategies through backlinks.

863 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now