Solved

export mediawiki pages to individual text files

Posted on 2010-09-06
9
1,205 Views
Last Modified: 2013-12-14
i have a mediawiki server with hundreds of pages that i need to export to either Word, pdf, rtf (or maybe txt). and i need to export each mediawiki page into its own file, not all into one file. and teh export needs to be of the rendered content, not the mediawiki code: so instead of tags formattted text and pictures and all. how to do that?

because there are hundreds of pages, manual copy paste or save as not an option ..

ta.
0
Comment
Question by:KristjanLaane
  • 5
  • 3
9 Comments
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33613429
From the client side (i.e. a browser), the pages will be rendered using HTML.

So, a HTML to xxx (PDF, Word, RTF, etc.) would certainly do the job.

Combine that with some sort of scripting engine to navigate the links and you should be done.

If you have a all the links already on 1 page (or an index or something), then that would certainly save some time.

Just googling around ...

A commercial offering ... http://html2pdf.seven49.net/Web/

There are also apps available on SourceForge and other places that do HTML 2 PDF.

What do you intend to do with the end results?

If you've got hundreds of good pages being edited/maintained, then why bother with a frozen snapshot? If you need a snapshot, clone the DB and lock it away.


There is also http://www.mediawiki.org/wiki/Extension:Pdf_Export

This looks promising. You would need to script the navigation (I think), but certainly looks the way I would go if it was me.
0
 
LVL 51

Expert Comment

by:Ted Bouskill
ID: 33613732
Any way you look at the problem it's a lot of hard work.

At our company a MediaWIKI server was created to store documentation and over time the stakeholders have realized it's not what they want.  My time inherited maintenance of the server and we've been tasked with trying to find a way to move the content into other formats and all the solutions require a lot of time.
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33613783
@Ted. Not for the faint hearted.
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 

Author Comment

by:KristjanLaane
ID: 33613908
Thanks for the replies! I have the special: all pages which congregates links to all the pages which would probably become very handy. I should have specified this earlier but my first preference is to outpu word files : do you know any html 2 word tools that can be scripted to spit out a word file per link?
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33614256
What version of Word?

You could probably just use MSWord to open the URLs directly.

Can you try doing one manually?

Load MSWord.
File|Open
Enter URL of page.

Does it look OK?

You are going to get some differences. Word is NOT natively a HTML rendering engine (or did that all change in Office 2007+ - yuech!)

If that is the case then a simple VBA macro would probably do the trick.

0
 

Author Comment

by:KristjanLaane
ID: 33616011
i tried opening directly, but 1) the rendering differences are big and ugly 2) i dont know how to log in to my wiki from word to access most of the content that is log-in only

i think what is needed is something that is able to export the main content of any given wiki page, but not the mediawiki navigation stuff etc, and then using that "pure" export (also without any mediawiki tags) and convert that to word somehow. my thinking is to try to convert all the pages to pdf files somehow (i might try  http://www.mediawiki.org/wiki/Extension:Pdf_Export ) but only if i know of a way to then convert those pdfs into word after?

p.s. also i need to work out how to script  http://www.mediawiki.org/wiki/Extension:Pdf_Export

p.p.s. its harder than i thought it woudl be, i agree!
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33616161
What do you intend to do with all those word documents?
0
 

Author Comment

by:KristjanLaane
ID: 33616316
the content does not need to be shared anymore, so only i need local access, and i need offline access to this content in an editable form so Word is good for that, in addition to providing WYSIWYG. ...
0
 
LVL 40

Accepted Solution

by:
Richard Quadling earned 500 total points
ID: 33616361
I think getting the exporter working would be the best way to go. Once it is in PDF format, there are any number of PDF 2 Word converters.
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Get to know the ins and outs of building a web-based ERP system for your enterprise. Development timeline, technology, and costs outlined.
SEO can be a real minefield to navigate, but there are three simple ways to up your SEO game just be re-assessing your content output.
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.
The is a quite short video tutorial. In this video, I'm going to show you how to create self-host WordPress blog with free hosting service.

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question