Solved

Extract site content

Posted on 2009-05-16
6
591 Views
Last Modified: 2013-12-20
One of my clients currently has a site that is rather large with many pages, links to pdfs and external links.

They are currently using a CMS called MySource Matrix. I am redeveloping the new site (no CMS) on a dedicated server. I do have access to the back end of the current CMS but this thing is impossible to extract anything meaningful from. I also have no root access or any access to the server itself. It's a complete mess.

So, my question is... Is there any way to extract or build a hierarchy of each and every page (in essence a site map) and also extract all linked PDFs (hopefully maintaining some form of link to the parent page)?

Is there any method or software package to perform such a task?

I am desperate. Please, any ideas at all.

Thanks
0
Comment
Question by:rgoggins
  • 3
  • 2
6 Comments
 
LVL 8

Expert Comment

by:paololabe
ID: 24402893
I think you could use an utility or library to generate a sitemap.xml and parse it to extract pdf link

 
0
 
LVL 1

Author Comment

by:rgoggins
ID: 24402906
Thanks paololabe. Could you elaborate a little more on your suggestion?
0
 
LVL 1

Author Comment

by:rgoggins
ID: 24406475
Anyone?
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 
LVL 17

Expert Comment

by:selvol
ID: 24406482
Offline Explorer Enterprise.

Will do all that and MORE.
0
 
LVL 17

Accepted Solution

by:
selvol earned 500 total points
ID: 24407344
Like I stated. Metaproducts Offline Explorer Is an Exelent Data Extractor.

I have used it many times. Doing almost exactly what you need to to.
This app is not a cheap program as many are.
Best thing is they have a Free 30 trial. I believe it is Unrestricted.

http://dl.filekicker.com/send/file/167627-YJTV/eesetup.exe
0
 
LVL 1

Author Closing Comment

by:rgoggins
ID: 31582231
Thank you selvol.

OEE is absolutely perfect for what I need.

Thanks again,
Rob
0

Featured Post

3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Learn by example how to specify CSS selectors for Selenium WebDriver test automation software.
When it comes to security, there are always trade-offs between security and convenience/ease of administration. This article examines some of the main pros and cons of using key authentication vs password authentication for hosting an SFTP server.
This video teaches users how to migrate an existing Wordpress website to a new domain.
Any person in technology especially those working for big companies should at least know about the basics of web accessibility. Believe it or not there are even laws in place that require businesses to provide such means for the disabled and aging p…

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now