Solved

convert pdftotext from cd =text on local directory (preserve pdf title)

Posted on 2003-11-09
6
312 Views
Last Modified: 2013-12-26
Hello,
I've got *.pdf on a cd. I'd like to convert all *.pdf using pdftotext to a local directory. The final output would be a text file with the name of the pdf.
For example
~cdrom/myfirst.pdf
~cdrom/mysecond.pdf
would be converted to text on a local directory
~targettextdir/myfirst.txt
~targettextdir/mysecond.txt


The directory is
/mnt/cdrom1/PDFs/

I want to use
pdftotext -layout

#and use the pdf file as the title of the text file
#I need to get all .pdfs in the directory
#then convert them to .txt
pdftotext -layout /mnt/cdrom1/PDFs/*.pdf
#the finished text files will have the original pdf title
#and then put the converted text files to a directory
/ConvertedPDFtoText

0
Comment
Question by:dgdaniels
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
6 Comments
 

Author Comment

by:dgdaniels
ID: 9710952
Ok, trying to sort it out myself...Help appreciated! I lifted choice chunks from other posts
 
The script will be used multiple times on different CDs of PDFs

My new script =
recursivepdftotext.sh

!/bin/bash
#####################################################################
# setup variables
#####################################################################
PDFDIR=' /mnt/cdrom1/PDFs/'
#####################################################################
# Convert pdf files to text
####################################################################
cd ${PDFDIR}

for files in ${PDFDIR}#<--------I need to tell it to look for *.pdfs but not sure how to do that
#Maybe for *.pdf files in ${PDFDIR}???
do
#       if [ -f ${files} ]; then
        echo converting ${files} >>/tmp/convertedpdf #<-------- I like seeing what's happening
        pdftotext ${files} --layout ${files}.txt >>~/ConvertedPDFtoText
done

or this one?
####################################
#!/bin/bash
#probably need to set a variable for the PDF files
!/bin/bash
#####################################################################
# setup variables
#####################################################################
PDFDIR=' /mnt/cdrom1/PDFs/'
FILE=*.pdf #<--------I don't think that's it :(
#copy it
cp /mnt/cdrom1/PDFs/$FILE ~/tmp/PDF

#do  processing here
for f in $*{PDFDIR}
do
  mv $f $N.pdf
  pdftotext -layout $N
  echo $f renamed into $N.txt #<-------- I like seeing what's happening
done
exit 0
############################
0
 

Author Comment

by:dgdaniels
ID: 9711321
ok getting closer

This command will do what I want from the /mnt/cdrom1

for i in `find -name '*.pdf'`; do pdftotext -layout $i ~/convertedpdfs/$i.txt; done

but, I'm pulling it from the cdrom at conversion time which is slow and hard on the cd drive... I need to move all of the files first...

the -layout option does not take args which gave me some trouble
0
 
LVL 9

Expert Comment

by:HamdyHassan
ID: 9715771

Try the following

cd $PDFDIR
for i in `ls *.pdf`
do
   echo $i
done


If you have pdf at sub-folder, then you need find command

cd $PDFDIR
for i in `find . -name ".pdf"  `
do
   echo $i
done
0
 
LVL 48

Accepted Solution

by:
Tintin earned 250 total points
ID: 9718346
Here's a script that will do what you require

#!/bin/sh
SOURCE=/mnt/cdrom1/PDFs
DEST=/target/dir

#
# If the PDF files are all in one directory, then use
#
# for file in $SOURCE/*.pdf
#
# otherwise use the one below

for file in `find $SOURCE -name "*.pdf"`
do
    filename=`basename $file | sed "s/\.pdf/\.txt/"`
   pdftotext -layout $file >$DEST/$filename
done
0
 
LVL 18

Expert Comment

by:liddler
ID: 10191662
No comment has been added lately, so it's time to clean up this TA.
I will leave a recommendation in the Cleanup topic area that this question is:

Answered by Tintin

Please leave any comments here within the next seven days.

PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

liddler
EE Cleanup Volunteer
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
repeatSeparator  java  challenge 13 59
EvenOdd challenge 10 160
ASP.NET C# MessageBox.Show Showing a modal dialog box or form when the application ... 2 196
Making an alias 7 121
Introduction: Dynamic window placements and drawing on a form, simple usage of windows registry as a storage place for information. Continuing from the first article about sudoku.  There we have designed the application and put a lot of user int…
Introduction: Ownerdraw of the grid button.  A singleton class implentation and usage. Continuing from the fifth article about sudoku.   Open the project in visual studio. Go to the class view – CGridButton should be visible as a class.  R…
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
Finds all prime numbers in a range requested and places them in a public primes() array. I've demostrated a template size of 30 (2 * 3 * 5) but larger templates can be built such 210  (2 * 3 * 5 * 7) or 2310  (2 * 3 * 5 * 7 * 11). The larger templa…

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question