Solved

what tool or utility can be used to speed collecting text from websites that return jokes, sayings, pictures from queries

Posted on 2014-01-26
5
157 Views
Last Modified: 2014-03-03
I need to collect a great deal of jokes, sayings and quotes, clipart etc. related to specific subjects. Is there any software, utility, robot or such that will aid in the collection or harvesting of above text and picture files and allow them to stored and categorized in ms excel or similar application
0
Comment
Question by:Dov_B
  • 2
  • 2
5 Comments
 
LVL 52

Assisted Solution

by:Scott Fell, EE MVE
Scott Fell,  EE MVE earned 350 total points
ID: 39811247
You would need to start with a manual search for a site you like. From there you can download using http://www.httrack.com/ but please be aware of how NOT  to use it http://www.httrack.com/html/abuse.html including:

Are the pages copyrighted?
Can you copy them only for private purpose?
Do not make online mirrors unless you are authorized to do so
Do not steal private information
Do not grab emails
Do not grab private information
0
 
LVL 27

Accepted Solution

by:
MacroShadow earned 150 total points
ID: 39811465
I don't know of any such utility and it would seem that neither do any of EE's experts.

Using VBA you can get the html of a website and TRY to properly parse it to separate the jokes etc. but it probably is more work than manually collecting them.
0
 
LVL 52

Assisted Solution

by:Scott Fell, EE MVE
Scott Fell,  EE MVE earned 350 total points
ID: 39811589
The tool I suggested would be the easiest way I can think of.  You don't need any special coding skills or data repository.  Manually is going to be the easiest and help you weed through what is copyright or not.

The only other option would be an automatic search.  Search api's from google or bing are not meant for screen scraping and therefor your option is to create your own search logs.  There are services like 80 legs http://80legs.com/ that will do the crawl work for you.  You will still need to program how to find jokes and get only the jokes content.  This is not a trivial thing to do for both money or the amount of time to spend.

Manual searching for what you want will lead you to the sources you need.  For instance, my first google result for wc fields quotes is http://www.brainyquote.com/quotes/authors/w/w_c_fields.html.  However, reading their TOS  http://www.brainyquote.com/inquire/terms.html
In other words, by accepting this Agreement, you can use our stuff for legitimate academic, research, and reporting projects, but you can't use it to just copy and paste a bunch of our stuff on your own website. That hurts our search engine rankings, not to mention our feelings. We'd also point out that we don't pay for anything you submit to us via our submission form or suggestion email inbox simply because you provide it of your own volition. By submitting material to us, you acknowledge that you have the right to do so, and that you completely transfer to us any rights you might have had in the submission.
Read more at http://www.brainyquote.com/inquire/terms.html#RgrKzSWv6WTXVI73.99


Good luck on your project.
0
 

Author Comment

by:Dov_B
ID: 39811598
Super cool Hashgocha Protis! interestingly after googling forever I suddenly got an email asking me to make a spreadsheet to help automate a bikur cholim effort. As I began working on the bikur cholim project, lo and behold a link showing how to use ms excel to get data from a webpage showed up! It worked like a dream! acces web data from excel
0
 

Author Comment

by:Dov_B
ID: 39811611
I appreciate very much your emphasis on respecting the hard work and rights of other people. I do not put any jokes on my own website. I am a teacher and public speaker and spend a great deal of time looking for interesting things to keep my listeners awake while I lecture. The riddles quote etc. are kep for easy acces in my own excel spreadsheet on my personal hard drive.
0

Featured Post

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

This article will inform Clients about common and important expectations from the freelancers (Experts) who are looking at your Gig.
FAQ pages provide a simple way for you to supply and for customers to find answers to the most common questions about your company. Here are six reasons why your company website should have a FAQ page
In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF's Info Dictionary, as well as some other information, including the page count. We show how to isolate the page count in a…
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question