Solved

export mediawiki pages to individual text files

Posted on 2010-09-06
9
1,183 Views
Last Modified: 2013-12-14
i have a mediawiki server with hundreds of pages that i need to export to either Word, pdf, rtf (or maybe txt). and i need to export each mediawiki page into its own file, not all into one file. and teh export needs to be of the rendered content, not the mediawiki code: so instead of tags formattted text and pictures and all. how to do that?

because there are hundreds of pages, manual copy paste or save as not an option ..

ta.
0
Comment
Question by:KristjanLaane
  • 5
  • 3
9 Comments
 
LVL 40

Expert Comment

by:RQuadling
Comment Utility
From the client side (i.e. a browser), the pages will be rendered using HTML.

So, a HTML to xxx (PDF, Word, RTF, etc.) would certainly do the job.

Combine that with some sort of scripting engine to navigate the links and you should be done.

If you have a all the links already on 1 page (or an index or something), then that would certainly save some time.

Just googling around ...

A commercial offering ... http://html2pdf.seven49.net/Web/

There are also apps available on SourceForge and other places that do HTML 2 PDF.

What do you intend to do with the end results?

If you've got hundreds of good pages being edited/maintained, then why bother with a frozen snapshot? If you need a snapshot, clone the DB and lock it away.


There is also http://www.mediawiki.org/wiki/Extension:Pdf_Export

This looks promising. You would need to script the navigation (I think), but certainly looks the way I would go if it was me.
0
 
LVL 51

Expert Comment

by:tedbilly
Comment Utility
Any way you look at the problem it's a lot of hard work.

At our company a MediaWIKI server was created to store documentation and over time the stakeholders have realized it's not what they want.  My time inherited maintenance of the server and we've been tasked with trying to find a way to move the content into other formats and all the solutions require a lot of time.
0
 
LVL 40

Expert Comment

by:RQuadling
Comment Utility
@Ted. Not for the faint hearted.
0
 

Author Comment

by:KristjanLaane
Comment Utility
Thanks for the replies! I have the special: all pages which congregates links to all the pages which would probably become very handy. I should have specified this earlier but my first preference is to outpu word files : do you know any html 2 word tools that can be scripted to spit out a word file per link?
0
Free camera licenses with purchase of My Cloud NAS

Milestone Arcus software is compatible with thousands of industry-leading cameras for added flexibility. Upon installation on your My Cloud NAS, you will receive two (2) camera licenses already enabled in the software. And for a limited time, get additional camera licenses FREE.

 
LVL 40

Expert Comment

by:RQuadling
Comment Utility
What version of Word?

You could probably just use MSWord to open the URLs directly.

Can you try doing one manually?

Load MSWord.
File|Open
Enter URL of page.

Does it look OK?

You are going to get some differences. Word is NOT natively a HTML rendering engine (or did that all change in Office 2007+ - yuech!)

If that is the case then a simple VBA macro would probably do the trick.

0
 

Author Comment

by:KristjanLaane
Comment Utility
i tried opening directly, but 1) the rendering differences are big and ugly 2) i dont know how to log in to my wiki from word to access most of the content that is log-in only

i think what is needed is something that is able to export the main content of any given wiki page, but not the mediawiki navigation stuff etc, and then using that "pure" export (also without any mediawiki tags) and convert that to word somehow. my thinking is to try to convert all the pages to pdf files somehow (i might try  http://www.mediawiki.org/wiki/Extension:Pdf_Export ) but only if i know of a way to then convert those pdfs into word after?

p.s. also i need to work out how to script  http://www.mediawiki.org/wiki/Extension:Pdf_Export

p.p.s. its harder than i thought it woudl be, i agree!
0
 
LVL 40

Expert Comment

by:RQuadling
Comment Utility
What do you intend to do with all those word documents?
0
 

Author Comment

by:KristjanLaane
Comment Utility
the content does not need to be shared anymore, so only i need local access, and i need offline access to this content in an editable form so Word is good for that, in addition to providing WYSIWYG. ...
0
 
LVL 40

Accepted Solution

by:
RQuadling earned 500 total points
Comment Utility
I think getting the exporter working would be the best way to go. Once it is in PDF format, there are any number of PDF 2 Word converters.
0

Featured Post

Give your grad a cloud of their own!

With up to 8TB of storage, give your favorite graduate their own personal cloud to centralize all their photos, videos and music in one safe place. They can save, sync and share all their stuff, and automatic photo backup helps free up space on their smartphone and tablet.

Join & Write a Comment

Even if you have implemented a Mobile Device Management solution company wide, it is a good idea to make sure you are taking into account all of the major risks to your electronic protected health information (ePHI).
I've been asked to discuss some of the UX activities that I'm using with my team. Here I will share some details about how we approach UX projects.
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now