Celebrate National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

what tool or utility can be used to speed collecting text from websites that return jokes, sayings, pictures from queries

Posted on 2014-01-26
5
Medium Priority
?
164 Views
Last Modified: 2014-03-03
I need to collect a great deal of jokes, sayings and quotes, clipart etc. related to specific subjects. Is there any software, utility, robot or such that will aid in the collection or harvesting of above text and picture files and allow them to stored and categorized in ms excel or similar application
0
Comment
Question by:Dov_B
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 53

Assisted Solution

by:Scott Fell, EE MVE
Scott Fell,  EE MVE earned 1400 total points
ID: 39811247
You would need to start with a manual search for a site you like. From there you can download using http://www.httrack.com/ but please be aware of how NOT  to use it http://www.httrack.com/html/abuse.html including:

Are the pages copyrighted?
Can you copy them only for private purpose?
Do not make online mirrors unless you are authorized to do so
Do not steal private information
Do not grab emails
Do not grab private information
0
 
LVL 27

Accepted Solution

by:
MacroShadow earned 600 total points
ID: 39811465
I don't know of any such utility and it would seem that neither do any of EE's experts.

Using VBA you can get the html of a website and TRY to properly parse it to separate the jokes etc. but it probably is more work than manually collecting them.
0
 
LVL 53

Assisted Solution

by:Scott Fell, EE MVE
Scott Fell,  EE MVE earned 1400 total points
ID: 39811589
The tool I suggested would be the easiest way I can think of.  You don't need any special coding skills or data repository.  Manually is going to be the easiest and help you weed through what is copyright or not.

The only other option would be an automatic search.  Search api's from google or bing are not meant for screen scraping and therefor your option is to create your own search logs.  There are services like 80 legs http://80legs.com/ that will do the crawl work for you.  You will still need to program how to find jokes and get only the jokes content.  This is not a trivial thing to do for both money or the amount of time to spend.

Manual searching for what you want will lead you to the sources you need.  For instance, my first google result for wc fields quotes is http://www.brainyquote.com/quotes/authors/w/w_c_fields.html.  However, reading their TOS  http://www.brainyquote.com/inquire/terms.html
In other words, by accepting this Agreement, you can use our stuff for legitimate academic, research, and reporting projects, but you can't use it to just copy and paste a bunch of our stuff on your own website. That hurts our search engine rankings, not to mention our feelings. We'd also point out that we don't pay for anything you submit to us via our submission form or suggestion email inbox simply because you provide it of your own volition. By submitting material to us, you acknowledge that you have the right to do so, and that you completely transfer to us any rights you might have had in the submission.
Read more at http://www.brainyquote.com/inquire/terms.html#RgrKzSWv6WTXVI73.99


Good luck on your project.
0
 

Author Comment

by:Dov_B
ID: 39811598
Super cool Hashgocha Protis! interestingly after googling forever I suddenly got an email asking me to make a spreadsheet to help automate a bikur cholim effort. As I began working on the bikur cholim project, lo and behold a link showing how to use ms excel to get data from a webpage showed up! It worked like a dream! acces web data from excel
0
 

Author Comment

by:Dov_B
ID: 39811611
I appreciate very much your emphasis on respecting the hard work and rights of other people. I do not put any jokes on my own website. I am a teacher and public speaker and spend a great deal of time looking for interesting things to keep my listeners awake while I lecture. The riddles quote etc. are kep for easy acces in my own excel spreadsheet on my personal hard drive.
0

Featured Post

Certified OpenStack Administrator Course

We just refreshed our COA course based on the Newton exam.  With 14 labs, this course goes over the different OpenStack services that are part of the certification: Dashboard, Identity Service, Image Service, Networking, Compute, Object Storage, Block Storage, and Orchestration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes a serious pitfall that can happen when deleting shapes using VBA.
Without even knowing it, most of us are using web applications on a daily basis.  In fact, Gmail and Yahoo email, Twitter, Facebook, and eBay are used by most of us daily—and they are web applications. We generally confuse these web applications to…
This Micro Tutorial will demonstrate in Google Sheets how to use the HYPERLINK function to create live links inside your spreadsheet.
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
Suggested Courses

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question