Solved

Convert Powerpoint to Text/HTML from the command line

Posted on 2008-10-25
5
389 Views
Last Modified: 2009-10-06
I didn't know quite how to categorise this question as it's an odd one, so apologies if I've added it to the wrong category. And if I have please point me to the correct category so I can ask the question again.

I have been using catppt in batch files to extract text from powerpoint files on my Windows 2003 server. It works fine EXCEPT there seems to be a bug in it and the -b option which is supposed to allow you to specify a character string to be placed at the end of each slide is unrecognised and the default operation of entering a formfeed character doesn't work either.

Up until now it's been OK because I have only been cataloguing the text for searching purposes. However I've just been handed a new project for creating a text preview for these files and of course I've no way of splitting the text into separate slides.

I've been googling for a couple of days on the subject and have drawn a complete blank. There are plenty of solutions for doing it in the windows GUI but not at the command line. So I thought I would throw it open to the experts.

Does anyone know of a command line DOS/Windows programme which can extract text or HTML from a Powerpoint file?

500 points as ever :-)
0
Comment
Question by:Dave6969
  • 2
  • 2
5 Comments
 
LVL 44

Expert Comment

by:scrathcyboy
ID: 22806136
catppt is the only one I have seen --
http://www.s2services.com/powerpoint-text-extractors.htm

But you can also check this list --
http://www.google.com/search?num=30&q=DOS+extract+text+from+.PPT

However, remember that PPT is graphics like PDF, so yo could convert all of them to PDF and run a text extractor on PDF file.  If less than 50 files, easier to do it by hand probably.
0
 

Author Comment

by:Dave6969
ID: 22807259
Cheers for that.

Yeah catppt would suit my purposes perfectly -- if only it weren't broken in the important respect of reliably  splitting up slides. :-(  I can't believe the guy who write it missed that out when he managed it in the companion apps! And the app is pretty hard to track down (even the link on that page doesn't work) so I'm guessing the guy isn't developing it any further.

It did occur to me that if I could change the ppt into a pdf with a page for every slide I could extract the text that way, but how to convert it to a pdf with a command line app? Again I've looked but all the ones I've seen have a GUI (and not often a very goon one either!)

Unfortunately there are hundreds of the things, so doing it by hand isn't an option I'm afraid.
0
 
LVL 44

Expert Comment

by:scrathcyboy
ID: 22808761
"but how to convert it to a pdf with a command line app?"

It's no easier.  You have to take what you can get, or convert them all manually to PDF and then you can select text on each one in PDF.  You can't call that person's app "broken" just because he decided to do it one way.  That is the decision of independent developers, especailly when you get it for FREE !!!!
0
 
LVL 21

Expert Comment

by:GlennaShaw
ID: 22809021
Here's this one, but I can't attest to it's effectiveness:
http://www.softpedia.com/get/Office-tools/Other-Office-Tools/All2Txt.shtml
0
 

Accepted Solution

by:
Dave6969 earned 0 total points
ID: 22822264
OK. Well thanks.

I did try All2txt but to be honest it was even worse because all the text came out on a single line, so at least catppt breaks it down to one line per line. I guess I'll just have to do the best I can with catppt.

Oh, and I can call that app broken. On all the man pages I've seen for it, it specifically documents a -b option for adding a string to place at the end of each slide. I know it's free and I'm grateful to guy, but why put in a useful option and then not ensure it works?
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Call a function within the ASP code 4 19
how can I add blockquote after on my wordpress site 3 17
innerHTML 7 34
Insert Button on a table 16 37
Have you tried to learn about Unicode, UTF-8, and multibyte text encoding and all the articles are just too "academic" or too technical? This article aims to make the whole topic easy for just about anyone to understand.
This article explains how to prepare an HTML email signature template file containing dynamic placeholders for users' Azure AD data. Furthermore, it explains how to use this file to remotely set up a department-wide email signature policy in Office …
In this tutorial viewers will learn how to embed videos in a webpage using HTML5. Ensure your DOCTYPE declaration is set to HTML5: "<!DOCTYPE html>": Use the <video> tag to insert a video. Define the src as the URL of your video; this is similar to …
HTML5 has deprecated a few of the older ways of showing media as well as offering up a new way to create games and animations. Audio, video, and canvas are just a few of the adjustments made between XHTML and HTML5. As we learned in our last micr…

840 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question