I need to crawl selected websites for about 5-10 different attributes.
For example lets say its a car website and on each page of the site there is information about a particular car for sale and it includes the vehicle make, model, year, price and etc.
I need all this information to be collected and stored in a database but since a good car sale website could have thousands of pages it can become a lot of data to collect.
I don't expect to have more than a few hundred words collected from each page so i think it would be under 1KB of data per record i store.
At the moment I don't know if i should be using NoSQL or a MySQL database since i will have an insane amount of rows/records created.
Any thoughts on going one way or the other? I need to do certain data manipulation on all the rows/records such as organizing the car by price from highest to lowest and etc.