Link to home
Start Free TrialLog in
Avatar of rgoggins
rgogginsFlag for Australia

asked on

Extract site content

One of my clients currently has a site that is rather large with many pages, links to pdfs and external links.

They are currently using a CMS called MySource Matrix. I am redeveloping the new site (no CMS) on a dedicated server. I do have access to the back end of the current CMS but this thing is impossible to extract anything meaningful from. I also have no root access or any access to the server itself. It's a complete mess.

So, my question is... Is there any way to extract or build a hierarchy of each and every page (in essence a site map) and also extract all linked PDFs (hopefully maintaining some form of link to the parent page)?

Is there any method or software package to perform such a task?

I am desperate. Please, any ideas at all.

Thanks
Avatar of paololabe
paololabe
Flag of Italy image

I think you could use an utility or library to generate a sitemap.xml and parse it to extract pdf link

 
Avatar of rgoggins

ASKER

Thanks paololabe. Could you elaborate a little more on your suggestion?
Anyone?
Avatar of James Williams
Offline Explorer Enterprise.

Will do all that and MORE.
ASKER CERTIFIED SOLUTION
Avatar of James Williams
James Williams
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thank you selvol.

OEE is absolutely perfect for what I need.

Thanks again,
Rob