Extract site content

Posted on 2009-05-16
Last Modified: 2013-12-20
One of my clients currently has a site that is rather large with many pages, links to pdfs and external links.

They are currently using a CMS called MySource Matrix. I am redeveloping the new site (no CMS) on a dedicated server. I do have access to the back end of the current CMS but this thing is impossible to extract anything meaningful from. I also have no root access or any access to the server itself. It's a complete mess.

So, my question is... Is there any way to extract or build a hierarchy of each and every page (in essence a site map) and also extract all linked PDFs (hopefully maintaining some form of link to the parent page)?

Is there any method or software package to perform such a task?

I am desperate. Please, any ideas at all.

Question by:rgoggins
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2

Expert Comment

ID: 24402893
I think you could use an utility or library to generate a sitemap.xml and parse it to extract pdf link


Author Comment

ID: 24402906
Thanks paololabe. Could you elaborate a little more on your suggestion?

Author Comment

ID: 24406475
NFR key for Veeam Backup for Microsoft Office 365

Veeam is happy to provide a free NFR license (for 1 year, up to 10 users). This license allows for the non‑production use of Veeam Backup for Microsoft Office 365 in your home lab without any feature limitations.

LVL 17

Expert Comment

ID: 24406482
Offline Explorer Enterprise.

Will do all that and MORE.
LVL 17

Accepted Solution

selvol earned 500 total points
ID: 24407344
Like I stated. Metaproducts Offline Explorer Is an Exelent Data Extractor.

I have used it many times. Doing almost exactly what you need to to.
This app is not a cheap program as many are.
Best thing is they have a Free 30 trial. I believe it is Unrestricted.

Author Closing Comment

ID: 31582231
Thank you selvol.

OEE is absolutely perfect for what I need.

Thanks again,

Featured Post

Building an interactive eFuture classroom

Watch and learn how ATEN provided a total control system solution including seamless switching matrix switch, HDBaseT extenders, PDU, lighting control to build an interactive eFuture classroom.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

An enjoyable and seamless user experience can go a long way on an eCommerce site. While a cohesive layout and engaging copy play roles in creating a positive user experience, some sites neglect aspects that seem marginal but in actuality prove very …
Dramatic changes are revolutionizing how we build and use technology. Every company is automating, digitizing, and modernizing operations. We need a better, more connected way to work together as teams so we can harness the insights from our system…
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

738 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question