Solved

what tool or utility can be used to speed collecting text from websites that return jokes, sayings, pictures from queries

Posted on 2014-01-26
5
151 Views
Last Modified: 2014-03-03
I need to collect a great deal of jokes, sayings and quotes, clipart etc. related to specific subjects. Is there any software, utility, robot or such that will aid in the collection or harvesting of above text and picture files and allow them to stored and categorized in ms excel or similar application
0
Comment
Question by:Dov_B
  • 2
  • 2
5 Comments
 
LVL 52

Assisted Solution

by:Scott Fell, EE MVE
Scott Fell,  EE MVE earned 350 total points
ID: 39811247
You would need to start with a manual search for a site you like. From there you can download using http://www.httrack.com/ but please be aware of how NOT  to use it http://www.httrack.com/html/abuse.html including:

Are the pages copyrighted?
Can you copy them only for private purpose?
Do not make online mirrors unless you are authorized to do so
Do not steal private information
Do not grab emails
Do not grab private information
0
 
LVL 26

Accepted Solution

by:
MacroShadow earned 150 total points
ID: 39811465
I don't know of any such utility and it would seem that neither do any of EE's experts.

Using VBA you can get the html of a website and TRY to properly parse it to separate the jokes etc. but it probably is more work than manually collecting them.
0
 
LVL 52

Assisted Solution

by:Scott Fell, EE MVE
Scott Fell,  EE MVE earned 350 total points
ID: 39811589
The tool I suggested would be the easiest way I can think of.  You don't need any special coding skills or data repository.  Manually is going to be the easiest and help you weed through what is copyright or not.

The only other option would be an automatic search.  Search api's from google or bing are not meant for screen scraping and therefor your option is to create your own search logs.  There are services like 80 legs http://80legs.com/ that will do the crawl work for you.  You will still need to program how to find jokes and get only the jokes content.  This is not a trivial thing to do for both money or the amount of time to spend.

Manual searching for what you want will lead you to the sources you need.  For instance, my first google result for wc fields quotes is http://www.brainyquote.com/quotes/authors/w/w_c_fields.html.  However, reading their TOS  http://www.brainyquote.com/inquire/terms.html
In other words, by accepting this Agreement, you can use our stuff for legitimate academic, research, and reporting projects, but you can't use it to just copy and paste a bunch of our stuff on your own website. That hurts our search engine rankings, not to mention our feelings. We'd also point out that we don't pay for anything you submit to us via our submission form or suggestion email inbox simply because you provide it of your own volition. By submitting material to us, you acknowledge that you have the right to do so, and that you completely transfer to us any rights you might have had in the submission.
Read more at http://www.brainyquote.com/inquire/terms.html#RgrKzSWv6WTXVI73.99


Good luck on your project.
0
 

Author Comment

by:Dov_B
ID: 39811598
Super cool Hashgocha Protis! interestingly after googling forever I suddenly got an email asking me to make a spreadsheet to help automate a bikur cholim effort. As I began working on the bikur cholim project, lo and behold a link showing how to use ms excel to get data from a webpage showed up! It worked like a dream! acces web data from excel
0
 

Author Comment

by:Dov_B
ID: 39811611
I appreciate very much your emphasis on respecting the hard work and rights of other people. I do not put any jokes on my own website. I am a teacher and public speaker and spend a great deal of time looking for interesting things to keep my listeners awake while I lecture. The riddles quote etc. are kep for easy acces in my own excel spreadsheet on my personal hard drive.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Envision that you are chipping away at another e-business site with a team of pundit developers and designers. Everything seems, by all accounts, to be going easily.
I've been asked to discuss some of the UX activities that I'm using with my team. Here I will share some details about how we approach UX projects.
This Micro Tutorial will demonstrate the scrolling table in Microsoft Excel using the INDEX function.
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now