Extract site content
Posted on 2009-05-16
One of my clients currently has a site that is rather large with many pages, links to pdfs and external links.
They are currently using a CMS called MySource Matrix. I am redeveloping the new site (no CMS) on a dedicated server. I do have access to the back end of the current CMS but this thing is impossible to extract anything meaningful from. I also have no root access or any access to the server itself. It's a complete mess.
So, my question is... Is there any way to extract or build a hierarchy of each and every page (in essence a site map) and also extract all linked PDFs (hopefully maintaining some form of link to the parent page)?
Is there any method or software package to perform such a task?
I am desperate. Please, any ideas at all.