Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 208
  • Last Modified:

Recommendations for web scraping based on certain criteria?

Well, experts, I have a bit of a challenge ...

I'm unfamiliar with what is available in the data collection/web scraping arena (either free or chargeable). Bottom line, this is the type of information I need to get:

Photography-related websites or blogs (NOT photographers) which meet certain SEO criteria (a high traffic him of visitors would be a good example). I know there are numerous ways to gauge traffic (Alexa rank, back links, etc.). The ideal information, although I have no idea how it could be obtained, would be the number of visitors (either monthly, annually, etc.). The other critical piece of information is an e-mail address by which I could contact each website or blog (typically found on most websites under one or more the following categories: support, information, contact, etc.).

The ultimate goal is to assemble a list of at least several hundred (I would hope  something more like several thousand would be more likely) websites that meet the criteria. I guess the minimum criteria would be: URL, brief website description, some indication of traffic rank, and e-mail address. The other criteria are harder to define for purposes of this post, but since I'm just trying to get a handle on this whole web scraping-data collection area, I don't want to muddy the waters with difficult to understand selection criteria.

I've done numerous searches, all of which have not resulted in anything close to what I need. My hope is that someone at EE is aware of an online or standalone software package which could supply most or all of what I need. As another option, I suppose purchasing an e-mail list is an option; however, I have never done that either so I don't know where to start.

I do not program so any solution involving that, would not work in my case. I also don't have the time or money to have custom programs developed to accomplish this, (unless my idea of what it would require is much more than what it actually would take).

I can't help but believe that somewhere, someone has developed this type of software tool, but I have no idea where to even start looking. Any suggestions or guidance would be greatly appreciated. Thank you.

If anyone has a suggestion as to a better zone to identify, please let me know because I don't understand what half of the zones mean anyway.
0
photoman11
Asked:
photoman11
  • 3
  • 2
1 Solution
 
dpearsonCommented:
You should be able to get everything except an email address from traffic information.  Companies that sell this sort of data (e.g. comscore http://www.comscore.com/) should have tools that let you search for companies in specific spaces - like photographic sites.  You should be able to use their tools to identify the top photographic sites on the web, with a link etc.

Collecting more than that by automatic screen scraping would be a challenge.  Most web sites specifically work to block screen scraping of their email addresses.  There are a number of ways to block this and any major site will adopt one of them in order to stop spamming of their support/contact aliases.  (This is why so many sites use a form for contacting them rather than a visible email address).  So I suspect you'll be out of luck there.  Somebody may sell a database with that sort of contact info but now you're really surfing in the dark underbelly of the web.  Be VERY CAREFUL if you start down that road.  E.g. if you buy a list of email addresses using a credit card you should expect that card to immediately be resold and used fraudulently.

Doug
0
 
photoman11Author Commented:
dpearson,

I think I understand what you're saying. however, I'm not sure how to get everything except the e-mail addresses. I looked at the comscore site and I couldn't find any category or product which correlated with what I am looking for. Do you know of any online or downloadable software products which do this?

Thanks
0
 
dpearsonCommented:
I'm not aware of any specific products that do this - but it seems odd that sites which aggregate and sell traffic data (like comScore) wouldn't provide these sorts of search tools.  Seems like an obvious need if you're looking to identify traffic levels or competitors in a particular industry sector.  Did you try contacting them to make sure they can't meet this need?

Doug
0
 
photoman11Author Commented:
Doug,

I did not contact them yet. I base my conclusion on going through their website and reviewing their products/services. But I will contact them to find out for sure. Thanks again.
0
 
photoman11Author Commented:
Unfortunately, I was right about them. However, I did find somebody through oDesk who has experience working with the Firefox add-on" SE0 quake, which will provide most of the information I need… I think.
0

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now