Solved

export mediawiki pages to individual text files

Posted on 2010-09-06
9
1,222 Views
Last Modified: 2013-12-14
i have a mediawiki server with hundreds of pages that i need to export to either Word, pdf, rtf (or maybe txt). and i need to export each mediawiki page into its own file, not all into one file. and teh export needs to be of the rendered content, not the mediawiki code: so instead of tags formattted text and pictures and all. how to do that?

because there are hundreds of pages, manual copy paste or save as not an option ..

ta.
0
Comment
Question by:KristjanLaane
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
9 Comments
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33613429
From the client side (i.e. a browser), the pages will be rendered using HTML.

So, a HTML to xxx (PDF, Word, RTF, etc.) would certainly do the job.

Combine that with some sort of scripting engine to navigate the links and you should be done.

If you have a all the links already on 1 page (or an index or something), then that would certainly save some time.

Just googling around ...

A commercial offering ... http://html2pdf.seven49.net/Web/

There are also apps available on SourceForge and other places that do HTML 2 PDF.

What do you intend to do with the end results?

If you've got hundreds of good pages being edited/maintained, then why bother with a frozen snapshot? If you need a snapshot, clone the DB and lock it away.


There is also http://www.mediawiki.org/wiki/Extension:Pdf_Export

This looks promising. You would need to script the navigation (I think), but certainly looks the way I would go if it was me.
0
 
LVL 51

Expert Comment

by:Ted Bouskill
ID: 33613732
Any way you look at the problem it's a lot of hard work.

At our company a MediaWIKI server was created to store documentation and over time the stakeholders have realized it's not what they want.  My time inherited maintenance of the server and we've been tasked with trying to find a way to move the content into other formats and all the solutions require a lot of time.
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33613783
@Ted. Not for the faint hearted.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:KristjanLaane
ID: 33613908
Thanks for the replies! I have the special: all pages which congregates links to all the pages which would probably become very handy. I should have specified this earlier but my first preference is to outpu word files : do you know any html 2 word tools that can be scripted to spit out a word file per link?
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33614256
What version of Word?

You could probably just use MSWord to open the URLs directly.

Can you try doing one manually?

Load MSWord.
File|Open
Enter URL of page.

Does it look OK?

You are going to get some differences. Word is NOT natively a HTML rendering engine (or did that all change in Office 2007+ - yuech!)

If that is the case then a simple VBA macro would probably do the trick.

0
 

Author Comment

by:KristjanLaane
ID: 33616011
i tried opening directly, but 1) the rendering differences are big and ugly 2) i dont know how to log in to my wiki from word to access most of the content that is log-in only

i think what is needed is something that is able to export the main content of any given wiki page, but not the mediawiki navigation stuff etc, and then using that "pure" export (also without any mediawiki tags) and convert that to word somehow. my thinking is to try to convert all the pages to pdf files somehow (i might try  http://www.mediawiki.org/wiki/Extension:Pdf_Export ) but only if i know of a way to then convert those pdfs into word after?

p.s. also i need to work out how to script  http://www.mediawiki.org/wiki/Extension:Pdf_Export

p.p.s. its harder than i thought it woudl be, i agree!
0
 
LVL 40

Expert Comment

by:Richard Quadling
ID: 33616161
What do you intend to do with all those word documents?
0
 

Author Comment

by:KristjanLaane
ID: 33616316
the content does not need to be shared anymore, so only i need local access, and i need offline access to this content in an editable form so Word is good for that, in addition to providing WYSIWYG. ...
0
 
LVL 40

Accepted Solution

by:
Richard Quadling earned 500 total points
ID: 33616361
I think getting the exporter working would be the best way to go. Once it is in PDF format, there are any number of PDF 2 Word converters.
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Cisco 4400 will not take SFP module ? SFP 10 GB module 1 48
Hidden network 2 42
AD Design Best Practices 6 36
WDS can't PXE boot 3 35
When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
In this blog, I will share you some basic tips for content marketing and to rank your website on Google.
The viewer will learn how to dynamically set the form action using jQuery.
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question