Web Content Scraper with Graphic Interface

I've been researching tools and services that will allow me to gather publicly posted data and store it in a database.  I'm trying to do this so I can index the data and create tools to search the data more efficiently.  

The problem is all the tools I find require you to be a fairly technical user to set up the "scraping" or extraction.  I am looking for a tool that I can have business users configure to extract the data.  I'm currently meeting with several software as a service providers to see what they can offer, but I'd like to be able to run this process internally.

If I used a service to collect this data, they would have a copy, and may be able to provide competition to my business. The perfect solution would have 3 key features.

1. be able to extract data from HTML, PDF, XLS, image data (OCR functionality)
2. Be easy enough to configure that it wouldn't require a programmer or equally talented user to set up the extract configuration.
3. output directly to SQL or to an common intermediate file that could be imported via SSIS package to SQL.

Hopefully this solution does exist already, if not, then SAAS will have to be the way to go.  I look forward to any help you can provide.
LVL 9
Shannon_LowderAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Michel PlungjanIT ExpertCommented:
Hi Shannon,

I think this is a request for software development, no?
If so, you may not get much help at EE, where we more answer specific questions about existing code rather than write (biggish) software from scratch
0
Nenad RajsicCommented:
Just a thought.

Rather than contacting SAAS companies and developing things from scratch why not contact one of the developers who already develop content scrappers and ask them to build something for you? It should be easy for them and cheap for you
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
James MurrellProduct SpecialistCommented:
unsure but a while back someone recommend http://www.pixieware.com/ for a project like this
0
Shannon_LowderAuthor Commented:
mplungjan -- I was wanting to make sure something hadn't already been built before I commission new work.

vukovarcan -- I hadn't considered that.  I'll definitely consider it now.

cs97jjm3 -- I'll check that site out now.

Sorry for the delay in my reply.
0
Shannon_LowderAuthor Commented:
Both are good suggestions.  I'll approach a few open source developers and see what they would think of extending their products.  I've also sent out a request for more information from pixieware.  It sounds like you still have to "code" a scrape configuration file.  I may be mistaken though.  Thank you all for contributing!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Software

From novice to tech pro — start learning today.