Solved

convert pdftotext from cd =text on local directory (preserve pdf title)

Posted on 2003-11-09
6
285 Views
Last Modified: 2013-12-26
Hello,
I've got *.pdf on a cd. I'd like to convert all *.pdf using pdftotext to a local directory. The final output would be a text file with the name of the pdf.
For example
~cdrom/myfirst.pdf
~cdrom/mysecond.pdf
would be converted to text on a local directory
~targettextdir/myfirst.txt
~targettextdir/mysecond.txt


The directory is
/mnt/cdrom1/PDFs/

I want to use
pdftotext -layout

#and use the pdf file as the title of the text file
#I need to get all .pdfs in the directory
#then convert them to .txt
pdftotext -layout /mnt/cdrom1/PDFs/*.pdf
#the finished text files will have the original pdf title
#and then put the converted text files to a directory
/ConvertedPDFtoText

0
Comment
Question by:dgdaniels
6 Comments
 

Author Comment

by:dgdaniels
ID: 9710952
Ok, trying to sort it out myself...Help appreciated! I lifted choice chunks from other posts
 
The script will be used multiple times on different CDs of PDFs

My new script =
recursivepdftotext.sh

!/bin/bash
#####################################################################
# setup variables
#####################################################################
PDFDIR=' /mnt/cdrom1/PDFs/'
#####################################################################
# Convert pdf files to text
####################################################################
cd ${PDFDIR}

for files in ${PDFDIR}#<--------I need to tell it to look for *.pdfs but not sure how to do that
#Maybe for *.pdf files in ${PDFDIR}???
do
#       if [ -f ${files} ]; then
        echo converting ${files} >>/tmp/convertedpdf #<-------- I like seeing what's happening
        pdftotext ${files} --layout ${files}.txt >>~/ConvertedPDFtoText
done

or this one?
####################################
#!/bin/bash
#probably need to set a variable for the PDF files
!/bin/bash
#####################################################################
# setup variables
#####################################################################
PDFDIR=' /mnt/cdrom1/PDFs/'
FILE=*.pdf #<--------I don't think that's it :(
#copy it
cp /mnt/cdrom1/PDFs/$FILE ~/tmp/PDF

#do  processing here
for f in $*{PDFDIR}
do
  mv $f $N.pdf
  pdftotext -layout $N
  echo $f renamed into $N.txt #<-------- I like seeing what's happening
done
exit 0
############################
0
 

Author Comment

by:dgdaniels
ID: 9711321
ok getting closer

This command will do what I want from the /mnt/cdrom1

for i in `find -name '*.pdf'`; do pdftotext -layout $i ~/convertedpdfs/$i.txt; done

but, I'm pulling it from the cdrom at conversion time which is slow and hard on the cd drive... I need to move all of the files first...

the -layout option does not take args which gave me some trouble
0
 
LVL 9

Expert Comment

by:HamdyHassan
ID: 9715771

Try the following

cd $PDFDIR
for i in `ls *.pdf`
do
   echo $i
done


If you have pdf at sub-folder, then you need find command

cd $PDFDIR
for i in `find . -name ".pdf"  `
do
   echo $i
done
0
 
LVL 48

Accepted Solution

by:
Tintin earned 250 total points
ID: 9718346
Here's a script that will do what you require

#!/bin/sh
SOURCE=/mnt/cdrom1/PDFs
DEST=/target/dir

#
# If the PDF files are all in one directory, then use
#
# for file in $SOURCE/*.pdf
#
# otherwise use the one below

for file in `find $SOURCE -name "*.pdf"`
do
    filename=`basename $file | sed "s/\.pdf/\.txt/"`
   pdftotext -layout $file >$DEST/$filename
done
0
 
LVL 18

Expert Comment

by:liddler
ID: 10191662
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

Answered by Tintin

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

liddler
EE Cleanup Volunteer
0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

Suggested Solutions

Introduction: Ownerdraw of the grid button.  A singleton class implentation and usage. Continuing from the fifth article about sudoku.   Open the project in visual studio. Go to the class view – CGridButton should be visible as a class.  R…
Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
This video discusses moving either the default database or any database to a new volume.

744 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now