<div>
I think you could use an utility or library to generate a sitemap.xml and parse it to extract pdf link<br />
<br />
&nbsp;<br />

</div>


I think you could use an utility or library to generate a sitemap.xml and parse it to extract pdf link

 


<div>
Thanks paololabe. Could you elaborate a little more on your suggestion?
</div>


Thanks paololabe. Could you elaborate a little more on your suggestion?

<div>
Offline Explorer Enterprise.<br />
<br />
Will do all that and MORE.
</div>


Offline Explorer Enterprise.

Will do all that and MORE.

<div>
Thank you selvol.<br />
<br />
OEE is absolutely perfect for what I need.<br />
<br />
Thanks again,<br />
Rob
</div>


Thank you selvol.

OEE is absolutely perfect for what I need.

Thanks again,
Rob

<div>
<div class="content wysiwyg-content">
One of my clients currently has a site that is rather large with many pages, links to pdfs and external links.<br />
<br />
They are currently using a CMS called MySource Matrix. I am redeveloping the new site (no CMS) on a dedicated server. I do have access to the back end of the current CMS but this thing is impossible to extract anything meaningful from. I also have no root access or any access to the server itself. It's a complete mess.<br />
<br />
So, my question is... Is there any way to extract or build a hierarchy of each and every page (in essence a site map) and also extract all linked PDFs (hopefully maintaining some form of link to the parent page)?<br />
<br />
Is there any method or software package to perform such a task?<br />
<br />
I am desperate. Please, any ideas at all.<br />
<br />
Thanks
</div>
</div>


One of my clients currently has a site that is rather large with many pages, links to pdfs and external links.

They are currently using a CMS called MySource Matrix. I am redeveloping the new site (no CMS) on a dedicated server. I do have access to the back end of the current CMS but this thing is impossible to extract anything meaningful from. I also have no root access or any access to the server itself. It's a complete mess.

So, my question is... Is there any way to extract or build a hierarchy of each and every page (in essence a site map) and also extract all linked PDFs (hopefully maintaining some form of link to the parent page)?

Is there any method or software package to perform such a task?

I am desperate. Please, any ideas at all.

Thanks

Extract site content

Web development includes all aspects of presenting content on intranets and the Internet, including delivery development, protocols, languages and standards, server software, browser clients, databases and multimedia generation.

Web Development

Networking is the process of connecting computing devices, peripherals and terminals together through a system that uses wiring, cabling or radio waves that enable their users to communicate, share information and interact over distances. Often associated are issues regarding operating systems, hardware and equipment, cloud and virtual networking, protocols, architecture, storage and management.