Link to home
Start Free TrialLog in
Avatar of Shannon_Lowder
Shannon_LowderFlag for United States of America

asked on

Web Content Scraper with Graphic Interface

I've been researching tools and services that will allow me to gather publicly posted data and store it in a database.  I'm trying to do this so I can index the data and create tools to search the data more efficiently.  

The problem is all the tools I find require you to be a fairly technical user to set up the "scraping" or extraction.  I am looking for a tool that I can have business users configure to extract the data.  I'm currently meeting with several software as a service providers to see what they can offer, but I'd like to be able to run this process internally.

If I used a service to collect this data, they would have a copy, and may be able to provide competition to my business. The perfect solution would have 3 key features.

1. be able to extract data from HTML, PDF, XLS, image data (OCR functionality)
2. Be easy enough to configure that it wouldn't require a programmer or equally talented user to set up the extract configuration.
3. output directly to SQL or to an common intermediate file that could be imported via SSIS package to SQL.

Hopefully this solution does exist already, if not, then SAAS will have to be the way to go.  I look forward to any help you can provide.
Avatar of Michel Plungjan
Michel Plungjan
Flag of Denmark image

Hi Shannon,

I think this is a request for software development, no?
If so, you may not get much help at EE, where we more answer specific questions about existing code rather than write (biggish) software from scratch
ASKER CERTIFIED SOLUTION
Avatar of Nenad Rajsic
Nenad Rajsic
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Shannon_Lowder

ASKER

mplungjan -- I was wanting to make sure something hadn't already been built before I commission new work.

vukovarcan -- I hadn't considered that.  I'll definitely consider it now.

cs97jjm3 -- I'll check that site out now.

Sorry for the delay in my reply.
Both are good suggestions.  I'll approach a few open source developers and see what they would think of extending their products.  I've also sent out a request for more information from pixieware.  It sounds like you still have to "code" a scrape configuration file.  I may be mistaken though.  Thank you all for contributing!