Solved

NoSQL or RDBMS? Web crawler

Posted on 2012-03-27
3
1,168 Views
Last Modified: 2016-02-10
I need to crawl selected websites for about 5-10 different attributes.

For example lets say its a car website and on each page of the site there is information about a particular car for sale and it includes the vehicle make, model, year, price and etc.
I need all this information to be collected and stored in a database but since a good car sale website could have thousands of pages it can become a lot of data to collect.

I don't expect to have more than a few hundred words collected from each page so i think it would be under 1KB of data per record i store.

At the moment I don't know if i should be using NoSQL or a MySQL database since i will have an insane amount of rows/records created.

Any thoughts on going one way or the other? I need to do certain data manipulation on all the rows/records such as organizing the car by price from highest to lowest and etc.
0
Comment
Question by:checkmofoshoduno
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 24

Accepted Solution

by:
johanntagle earned 500 total points
ID: 37774846
Either MySQL or MongoDB (that's the only NoSQL db I've used so far) should be able to handle your needs, though MongoDB provides better horizontal scaling via sharding, should your data be really that huge.  If everything can be contained on one server, I would think the decision would depend on whether or not you can pre-define all the fields you need.  If so, I would personally use MySQL because I find querying via SQL more straightforward vs having to deal with mapreduce and the like.  But if you need to dynamically store different field names, then a NoSQL database is the way to go.
0

Featured Post

Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question