Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

export mediawiki pages to individual text files

Posted on 2010-09-06
9
1,212 Views
Last Modified: 2013-12-14
i have a mediawiki server with hundreds of pages that i need to export to either Word, pdf, rtf (or maybe txt). and i need to export each mediawiki page into its own file, not all into one file. and teh export needs to be of the rendered content, not the mediawiki code: so instead of tags formattted text and pictures and all. how to do that?

because there are hundreds of pages, manual copy paste or save as not an option ..

ta.
0
Comment
Question by:KristjanLaane
  • 5
  • 3
9 Comments
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33613429
From the client side (i.e. a browser), the pages will be rendered using HTML.

So, a HTML to xxx (PDF, Word, RTF, etc.) would certainly do the job.

Combine that with some sort of scripting engine to navigate the links and you should be done.

If you have a all the links already on 1 page (or an index or something), then that would certainly save some time.

Just googling around ...

A commercial offering ... http://html2pdf.seven49.net/Web/

There are also apps available on SourceForge and other places that do HTML 2 PDF.

What do you intend to do with the end results?

If you've got hundreds of good pages being edited/maintained, then why bother with a frozen snapshot? If you need a snapshot, clone the DB and lock it away.


There is also http://www.mediawiki.org/wiki/Extension:Pdf_Export

This looks promising. You would need to script the navigation (I think), but certainly looks the way I would go if it was me.
0
 
LVL 51

Expert Comment

by:Ted Bouskill
ID: 33613732
Any way you look at the problem it's a lot of hard work.

At our company a MediaWIKI server was created to store documentation and over time the stakeholders have realized it's not what they want.  My time inherited maintenance of the server and we've been tasked with trying to find a way to move the content into other formats and all the solutions require a lot of time.
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33613783
@Ted. Not for the faint hearted.
0
Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

 

Author Comment

by:KristjanLaane
ID: 33613908
Thanks for the replies! I have the special: all pages which congregates links to all the pages which would probably become very handy. I should have specified this earlier but my first preference is to outpu word files : do you know any html 2 word tools that can be scripted to spit out a word file per link?
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33614256
What version of Word?

You could probably just use MSWord to open the URLs directly.

Can you try doing one manually?

Load MSWord.
File|Open
Enter URL of page.

Does it look OK?

You are going to get some differences. Word is NOT natively a HTML rendering engine (or did that all change in Office 2007+ - yuech!)

If that is the case then a simple VBA macro would probably do the trick.

0
 

Author Comment

by:KristjanLaane
ID: 33616011
i tried opening directly, but 1) the rendering differences are big and ugly 2) i dont know how to log in to my wiki from word to access most of the content that is log-in only

i think what is needed is something that is able to export the main content of any given wiki page, but not the mediawiki navigation stuff etc, and then using that "pure" export (also without any mediawiki tags) and convert that to word somehow. my thinking is to try to convert all the pages to pdf files somehow (i might try  http://www.mediawiki.org/wiki/Extension:Pdf_Export ) but only if i know of a way to then convert those pdfs into word after?

p.s. also i need to work out how to script  http://www.mediawiki.org/wiki/Extension:Pdf_Export

p.p.s. its harder than i thought it woudl be, i agree!
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33616161
What do you intend to do with all those word documents?
0
 

Author Comment

by:KristjanLaane
ID: 33616316
the content does not need to be shared anymore, so only i need local access, and i need offline access to this content in an editable form so Word is good for that, in addition to providing WYSIWYG. ...
0
 
LVL 40

Accepted Solution

by:
Richard Quadling earned 500 total points
ID: 33616361
I think getting the exporter working would be the best way to go. Once it is in PDF format, there are any number of PDF 2 Word converters.
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
BGP prefix and routing 3 60
replica website 2 35
Code Manager | Snippits 2 36
Dreamweaver code color same as CS6 or CS2015 2 11
Learn by example how to specify CSS selectors for Selenium WebDriver test automation software.
When you try to share a printer , you may receive one of the following error messages. Error message when you use the Add Printer Wizard to share a printer: Windows could not share your printer. Operation could not be completed (Error 0x000006…
After creating this article (http://www.experts-exchange.com/articles/23699/Setup-Mikrotik-routers-with-OSPF.html), I decided to make a video (no audio) to show you how to configure the routers and run some trace routes and pings between the 7 sites…
The is a quite short video tutorial. In this video, I'm going to show you how to create self-host WordPress blog with free hosting service.

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question