Solved

Extract site content

Posted on 2009-05-16
6
594 Views
Last Modified: 2013-12-20
One of my clients currently has a site that is rather large with many pages, links to pdfs and external links.

They are currently using a CMS called MySource Matrix. I am redeveloping the new site (no CMS) on a dedicated server. I do have access to the back end of the current CMS but this thing is impossible to extract anything meaningful from. I also have no root access or any access to the server itself. It's a complete mess.

So, my question is... Is there any way to extract or build a hierarchy of each and every page (in essence a site map) and also extract all linked PDFs (hopefully maintaining some form of link to the parent page)?

Is there any method or software package to perform such a task?

I am desperate. Please, any ideas at all.

Thanks
0
Comment
Question by:rgoggins
  • 3
  • 2
6 Comments
 
LVL 8

Expert Comment

by:paololabe
ID: 24402893
I think you could use an utility or library to generate a sitemap.xml and parse it to extract pdf link

 
0
 
LVL 1

Author Comment

by:rgoggins
ID: 24402906
Thanks paololabe. Could you elaborate a little more on your suggestion?
0
 
LVL 1

Author Comment

by:rgoggins
ID: 24406475
Anyone?
0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 17

Expert Comment

by:selvol
ID: 24406482
Offline Explorer Enterprise.

Will do all that and MORE.
0
 
LVL 17

Accepted Solution

by:
selvol earned 500 total points
ID: 24407344
Like I stated. Metaproducts Offline Explorer Is an Exelent Data Extractor.

I have used it many times. Doing almost exactly what you need to to.
This app is not a cheap program as many are.
Best thing is they have a Free 30 trial. I believe it is Unrestricted.

http://dl.filekicker.com/send/file/167627-YJTV/eesetup.exe
0
 
LVL 1

Author Closing Comment

by:rgoggins
ID: 31582231
Thank you selvol.

OEE is absolutely perfect for what I need.

Thanks again,
Rob
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Get to know the ins and outs of building a web-based ERP system for your enterprise. Development timeline, technology, and costs outlined.
For many of us, the  holiday season kindles the natural urge to give back to our friends, family members and communities. While it's easy for friends to notice the impact of such deeds, understanding the contributions of businesses and enterprises i…
The viewer will get a basic understanding of what section 508 compliance can entail, learn about skip navigation links, alt text, transcripts, and font size controls.
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

820 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question