rgoggins
asked on
Extract site content
One of my clients currently has a site that is rather large with many pages, links to pdfs and external links.
They are currently using a CMS called MySource Matrix. I am redeveloping the new site (no CMS) on a dedicated server. I do have access to the back end of the current CMS but this thing is impossible to extract anything meaningful from. I also have no root access or any access to the server itself. It's a complete mess.
So, my question is... Is there any way to extract or build a hierarchy of each and every page (in essence a site map) and also extract all linked PDFs (hopefully maintaining some form of link to the parent page)?
Is there any method or software package to perform such a task?
I am desperate. Please, any ideas at all.
Thanks
They are currently using a CMS called MySource Matrix. I am redeveloping the new site (no CMS) on a dedicated server. I do have access to the back end of the current CMS but this thing is impossible to extract anything meaningful from. I also have no root access or any access to the server itself. It's a complete mess.
So, my question is... Is there any way to extract or build a hierarchy of each and every page (in essence a site map) and also extract all linked PDFs (hopefully maintaining some form of link to the parent page)?
Is there any method or software package to perform such a task?
I am desperate. Please, any ideas at all.
Thanks
I think you could use an utility or library to generate a sitemap.xml and parse it to extract pdf link
ASKER
Thanks paololabe. Could you elaborate a little more on your suggestion?
ASKER
Anyone?
Offline Explorer Enterprise.
Will do all that and MORE.
Will do all that and MORE.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you selvol.
OEE is absolutely perfect for what I need.
Thanks again,
Rob
OEE is absolutely perfect for what I need.
Thanks again,
Rob