My issue comes in 2 parts. The most immediate need I have is a targeted crawl of one particular site (as part of a demo application). So, I don't envision it to be too difficult. My second, less pressing concern, is an extensible solution that can be applied to most websites.
First, let me say that I've looked through a lot of web crawlers and a lot of them handle dynamic content poorly or not at all. I realize it's a tough problem. So you don't need to solve this harder part of the issue to get full points. Just point me in the right direction (though if you got a solution, I'd be grateful).
I am trying to gather data from the following site:
27-Oct-2003 Sergeant Aubrey D. Bell Baghdad Hostile - hostile fire - IED attack
28-Oct-2003 Specialist Isaac Campoy Balad (near) - Salah ad Din Hostile - hostile fire - IED attack
06-Oct-2003 Specialist Spencer Timothy Karol Al Haswah - Babil Hostile - hostile fire - IED attack
I will award full points to anyone that can provide me a script targeted at the above site and general advice about a more generic approach. I need the targeted crawler by the end of next week for a demo. Thank you.