what tool or utility can be used to speed collecting text from websites that return jokes, sayings, pictures from queries

I need to collect a great deal of jokes, sayings and quotes, clipart etc. related to specific subjects. Is there any software, utility, robot or such that will aid in the collection or harvesting of above text and picture files and allow them to stored and categorized in ms excel or similar application
Dov_BAsked:
Who is Participating?
 
MacroShadowCommented:
I don't know of any such utility and it would seem that neither do any of EE's experts.

Using VBA you can get the html of a website and TRY to properly parse it to separate the jokes etc. but it probably is more work than manually collecting them.
0
 
Scott Fell, EE MVEDeveloper & EE ModeratorCommented:
You would need to start with a manual search for a site you like. From there you can download using http://www.httrack.com/ but please be aware of how NOT  to use it http://www.httrack.com/html/abuse.html including:

Are the pages copyrighted?
Can you copy them only for private purpose?
Do not make online mirrors unless you are authorized to do so
Do not steal private information
Do not grab emails
Do not grab private information
0
 
Scott Fell, EE MVEDeveloper & EE ModeratorCommented:
The tool I suggested would be the easiest way I can think of.  You don't need any special coding skills or data repository.  Manually is going to be the easiest and help you weed through what is copyright or not.

The only other option would be an automatic search.  Search api's from google or bing are not meant for screen scraping and therefor your option is to create your own search logs.  There are services like 80 legs http://80legs.com/ that will do the crawl work for you.  You will still need to program how to find jokes and get only the jokes content.  This is not a trivial thing to do for both money or the amount of time to spend.

Manual searching for what you want will lead you to the sources you need.  For instance, my first google result for wc fields quotes is http://www.brainyquote.com/quotes/authors/w/w_c_fields.html.  However, reading their TOS  http://www.brainyquote.com/inquire/terms.html
In other words, by accepting this Agreement, you can use our stuff for legitimate academic, research, and reporting projects, but you can't use it to just copy and paste a bunch of our stuff on your own website. That hurts our search engine rankings, not to mention our feelings. We'd also point out that we don't pay for anything you submit to us via our submission form or suggestion email inbox simply because you provide it of your own volition. By submitting material to us, you acknowledge that you have the right to do so, and that you completely transfer to us any rights you might have had in the submission.
Read more at http://www.brainyquote.com/inquire/terms.html#RgrKzSWv6WTXVI73.99


Good luck on your project.
0
 
Dov_BAuthor Commented:
Super cool Hashgocha Protis! interestingly after googling forever I suddenly got an email asking me to make a spreadsheet to help automate a bikur cholim effort. As I began working on the bikur cholim project, lo and behold a link showing how to use ms excel to get data from a webpage showed up! It worked like a dream! acces web data from excel
0
 
Dov_BAuthor Commented:
I appreciate very much your emphasis on respecting the hard work and rights of other people. I do not put any jokes on my own website. I am a teacher and public speaker and spend a great deal of time looking for interesting things to keep my listeners awake while I lecture. The riddles quote etc. are kep for easy acces in my own excel spreadsheet on my personal hard drive.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.