NoSQL Databases





A NoSQL database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to clusters of machines and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are specified from those used by default in relational databases, making some operations faster in NoSQL. Sometimes the data structures used by NoSQL databases are also viewed as "more flexible" than relational database tables.

Share tech news, updates, or what's on your mind.

Sign up to Post


I developed a dynamic web application in Eclipse, which uses a servlet to access to a database. Whenever I restart the application, it works as expected. After a few times of runs, however, it doesn't work and access to database returns no results.

Is there anything I have missed configuring either tomcat or the web.xml file?

I attached the web.xml below. Part of the servlet file is down below.


import java.util.List;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.mongodb.BasicDBObject;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.DBObject;

 * Servlet implementation class RunServlet
public class RunServlet extends HttpServlet {
      private static final long serialVersionUID = 1L;

We're selecting a low cost but capable database for use in our Linux appliances (4-16 GB RAM, quad core cpus, 500-2 TB drives) that gather information from devices. Each appliance collects 10-30kb of information from 5k+ devices, updates this information every minute, and completely refreshes the data 10+ times daily.

The data must be stored to a redundant persistent database, automatically scale horizontally automatically with each added appliance, and be shared among all appliances in groups of up to 20 appliances. with automatic failover  The database should be virtualizable because the appliance is shipped in virtual and physical form factors and we may want to move it into a cloud like AWZ.

The device data does not need more than a dozen indexes, joins are unnecessary, and most information can be stored as xml/json format or as normalized fields with a big blob for the bulk of the data that's not indexed.

We've investigated nosql databases like mongodb, but they seem to have restrictive requirements (e.g. mongodb uses all the memory on the server) and couchbase is new and looked rough when we tried to integrate with it.

We're not sure if the SQL databases like Postgres can scale horizontally like we'd like it to. Mysql is out because it requires pay for this application.

Any ideas on which databases warrant a closer look?
My scenario:

I have many machines which stores data in local databases (sqlite).
There is one `client` machine that pull data from all machines and save it to his local database for use other parts of software. Saying pull I mean that he want get some data between dates, i.e. whole day, whole month. (By the way - how it's call that scenario?)

I'am struggling with choosing correct technology.

Currently I making RPC with Thrift framework. It's easy to use but I don't know is it good way, because I making some replication mechanism myself.
With that RPC I call function i.e  GetNewData(Revision). And I get all new tables data. Then I save it to client's database.

Other idea was use MongoDB or CouchDB with replication (i don't know if these databases can by setup like in my scenario).

Other idea was to save Sqlite database to file, download it and merge. Or save only new record and merge. It is simple, and because it is simple I'am afraid that I will be closed for future extensions.

I need some advice and discussion.
How best do I store and retrieve matrices from a database? Taking MySQL, for example, do I simply use 3 tables: the document, the attributes and the document_attributes, or is that too simplistic and poor for performance. Do I not use mysql and something of the NoSQL sort like MongoDB?
I need to crawl selected websites for about 5-10 different attributes.

For example lets say its a car website and on each page of the site there is information about a particular car for sale and it includes the vehicle make, model, year, price and etc.
I need all this information to be collected and stored in a database but since a good car sale website could have thousands of pages it can become a lot of data to collect.

I don't expect to have more than a few hundred words collected from each page so i think it would be under 1KB of data per record i store.

At the moment I don't know if i should be using NoSQL or a MySQL database since i will have an insane amount of rows/records created.

Any thoughts on going one way or the other? I need to do certain data manipulation on all the rows/records such as organizing the car by price from highest to lowest and etc.

I have a maven project in Java. The project has a few dependent jar such apache commons, mongodb, etc.

I want to build a single runnable jar file such that I can use

java -jar myjar.jar or other command to run the Linux command line.

The main entry is:

My pom.xml is attached.

Please help!
This is returning nothing.


find objects within a 5 mile radius using php,mongodb

$m = new Mongo(); //open a connection to the server
$db = $m->myapp //get the myapp database

$collection = $db->shops;
$collection->ensureIndex(array('geo_coords' => '2d'));

$lat = 37.0625;
$lng = -95.677068;

$radiusMiles = 5; //get all results within 5 miles
$radiusOfEarth = 3956; //avg radius of earth in miles

$result = $collection->find(
    array('geo_coords' =>
        array('$within' =>
            array('$centerSphere' =>
                    array($lng, $lat), $radiusMiles/$radiusOfEarth

Open in new window


I originally had a  multi-threading application, each of the threads creating a list of objects and writing the list to database. This method creates a lot of database IO, thereby slowing down the application.

I now want to create a static list, and have each thread add objects to the list and save the list of objects to database whenever the size of list reaches 5000.

How can I pass through the static list to each of the threads effectively?

I currently define the static list in which utilizes the Java  ExecutorSerices framework.

as follow:

public static List<DBObject> toMove;

The way to call and use the toMove variable is:

            synchronized (Incremental.toMove) {

                  for (String code : code2erid.keySet()) {
                        DBObject q = new BasicDBObject();
                        q.put("code", code);
                        DBObject o = NeferCollection.getInstance().getCollection()
                        o.put("erid", code2erid.get(code));


                        int size = Incremental.toMove.size();

                        if (size % 1000 == 0 && size > 0) {



                              System.out.println("Moving " + size + " records...");



My question is whether my way is  effective to pass through a static object into each thread.
I am reading up on MongoDB and came upon this javascript (I am not good in javascript either){$where : function() { return this.x == this.y; }})

how is this interpreted?
what is $where, is where an argument of some sort? how is it entered?
what is "this", how is it evaluated?

I have encountered an issue mostly related to the MongoDB io.

I need to read a large number of records out of MongoDB whenever there is a new record coming in and this new records will cause every records to change in the DB.

So I am thinking of a solution that I read all the records once, convert them into an HashMap object, modify the object with the new record, and then save the hashmap object to hard drive.

I wonder if the solution is feasible or functionally doable.

Hi there,

I have MongoDB installed on my server. It works, but I am having a very tough time with inclusion of the extension for PHP.

- I've upgraded my php to the latest version (but it also didn't work on the 5.2.x)
- I've used the pecl mongo install and uninstall command many times
- I've made sure that the is in the correct inclusion directory for PHP extensions
- i've added the to my correct php.ini file.
- I've also tried the manual complie of the driver

When i try phpinfo(); the mongo extension is simply not there. But it is interesting that i see the mongo module if i run php -m on the server. weird right!

Im on CentOS, using Apache 2.2 with php 5.3.8

Thank you

On this page when I try to enter text into the input box inside chrome nothing happens ie no text gets entered, but it seems to be fine in ie, firefox, safari...  Think its something simple?

System: ubuntu, apache, cakephp with mongodb

When I want to sort by multiple keys on mongodb , I got some problem , And can I fix it ??





error: { "$err" : "too much data for sort() with no index. add an index or speci fy a smaller limit", "code" : 10128 }

There are records in one of my Mongodb collection 'exmple' (please see attached code).

I would like to have Java queries so that I can get the records for each of the following scenarios:

1) records whose "nextfetchtime" field is not exist;

2) records whose "nextfetchtime" field has value of null;

{ "_id" : { "$oid" : "4e72eac3e4b0a2c48b0eb84d"} , "code" : "5g9a4gini94404mp1inoip5ni4gpcgn4gnin50i0ci5a0n95g555o4pgg1m45o15" , "path" : "/data2/data/htmlsource/2011-09-16/dianpingcom/" , "finishtime" : 1316154241099 , "nextfetchtime" : 1318746051658 , "codeTime" : 1316154051696 , "predicted" : false}
{ "_id" : { "$oid" : "4e7564247af2f2aa38b3fbe8"} , "code" : "mw2Da0toDk6xhkz6IVNxmEQSsQ3nLnJ84Ccxvw0734i5Hv5auTHk5aQtmUq1KxhY" , "path" : "c:/data/berkeley/database/parsedcodedb-bak/2011-07-03/dianping" , "finishtime" : 1316316196568}
{ "_id" : { "$oid" : "4e7564377af2587c0e06e5df"} , "code" : "fl6Q3Nw3BCMrX5LxHoY49llQ81vfYtW0yTGec9arAHYch8yvDvGPJ37LOzKwOOgc" , "path" : "c:/data/berkeley/database/parsedcodedb-bak/2011-07-03/dianping" , "finishtime" : 1316316215222}
{ "_id" : { "$oid" : "4e7568e07af237357a034671"} , "code" : "QfDfFpWqBHAtJYkJbMdTWFl5UrLx6lp96bdOFGzIOs0m1RlqRPxhIC0RA9ONB2yh" , "path" : "c:/data/berkeley/database/parsedcodedb-bak/2011-07-03/dianping" , "finishtime" : 1316317408821 , "nextfetchtime" :  null }

Open in new window


I needs to handle millions of json files, which involves multiple steps, and saving a copy of handled files at each step.

Originally I wanted to save those files in FS, but now I am thinking of MongoDB. I would like to know what differences, in terms of spead, are.


I have known mysql replication, but I want to know what are the tools to replicate file systems (lucene, mongodb, etc.).

Please advice!


Currently, we are using mysql and Hadoop for our distributed computing solution. The architecture looks like this: (

The issue, however, is that connecting to mysql turns out to be very expensive. I want to know if there are other database solutions that work well with Hadoop.

The solution should do the following job:

1. Both reading and writing to the database (distributed computing friendly) are needed;
2. Both Namenode and datanodes can read and write to the database;
3. Seamlessly working with Hadoop HDFS is preferred, if possible.

Hi, I have a mongodb with image data stored in it's table in the unique MongoDb way. I'm assuming its what blob is to MySQL. So what are the statements i would use to retrieve an Image from its database and display it on a page? I've provided a VERY SIMPLE example of what I would do if it were MySQL so please now show me how to do it with php and MongoDB. Thanks in advance experts!

//This is to display the image
//get the image from the db
$sql = 'SELECT image FROM image_table WHERE image_id = "'.$_GET['image_id'].'"';

//the result of the query
$result = mysql_query($sql);

//set the header for the image
header("Content-type: image/jpeg");
echo mysql_result($result,0);


//This is to display the gallery

//get the images
$sql = 'SELECT * FROM image_table WHERE';

//the result of the query
$result = mysql_query($sql);

//set the img tags
while($row = mysql_fetch_array($result){
    echo '<img src="view.php?image_id='.$row['image_id'].'" />';

Open in new window

I'm stuck and I NEED the mongo drivers installed. I'm a reseller so I have WHM access and cpanel access to the server as well, so how do I install the mongo drivers?
We are looking at developing a cloud (probably Amazon EC2) based application that might potentially scale to tens of thousands of small transactions per second via a RESTful API. The architecture we are looking at is to allow our client facing application to put transactions into a high performance message queue from which they will be processed asynchronously by background worker agents and logged into a database. The client facing application will probably be based on Ruby on Rails, while the backend will either be Ruby or Java.

We have been looking at using some of the EC2 services such as SQS, SimpleDB as basic building blocks as we were attracted by the scalability and performance promised by Amazon. We've done some initial testing and find that throughput performance is very poor. For example with SQS we've only been able to get around 1-2 transactions per second. Our initial testing shows we may have similar issues with SimpleDB.

We're now looking at some other options, including using an AMQP standards based messaging tools such as RabbitMQ. While using MySQL as a backend db is attractive we're also considering using a nosql solution such as MongoDB.

We'd be interested in any any ideas or feedback on the approach we are taking. We would also like to learn if anyone has any experience running an architecture like this in the cloud (in particular EC2).
Hi experts,

I was wondering if any of you know what database does twitter use?

I read this article but I still don't get an answer.

Since twitter gets so much daily traffic I just wanted to know what database they use.

I know Google uses their own custom in-house solution they created from scratch, instead of storing their data in SqlServer, Oracle or Terradata,  so I was wondering if maybe Twitter did the same or do they use Oracle or Terradata for their data storage so they could be able to serve up so many daily transactions.

Thanks for any help you can give me on this question.
I am trying to install the Perl driver for MongoDB on my CentOS 5 64 bit machine.

I found an RPM here:

But the RPM was for Fedora, not CentOS.

I tried it anyway, like this:

    rpm -Uvh perl-MongoDB-0.41-3.fc16.x86_64.rpm

These errors were returned:
       error: Failed dependencies:
        perl(:MODULE_COMPAT_5.12.3) is needed by perl-MongoDB-0.41-3.fc16.x86_64
        rpmlib(FileDigests) <= 4.6.0-1 is needed by perl-MongoDB-0.41-3.fc16.x86_64
        perl(Any::Moose) is needed by perl-MongoDB-0.41-3.fc16.x86_64
        perl(boolean) is needed by perl-MongoDB-0.41-3.fc16.x86_64
        perl(DateTime) is needed by perl-MongoDB-0.41-3.fc16.x86_64
        perl(Tie::IxHash) is needed by perl-MongoDB-0.41-3.fc16.x86_64
        rpmlib(PayloadIsXz) <= 5.2-1 is needed by perl-MongoDB-0.41-3.fc16.x86_64

Is there any way I can make this Fedora RPM work on my CentOS 5 machine?
I'm a reasonably decent PHP programmer but I just don't "get" the object orientated side and it's high time I sorted it out. This may not be worth 500 points but I thought I'd try anyway.

In the code snippet I have some code I've written to pull some info from a mongoDB. I realised I've used some classes in there but I haven't constructed any - they're just what the MongoDB driver provides.
In the foreach loop I have some variables that are printed out from the MongoDB. I'd like to make that foreach loop a class so that I can call something like:


which makes it nice and clean and simple. I figure if someone can help me create a class from something I've already written it might help me "get" classes and objects. All help gratefully considered.
try {
  // open connection to MongoDB server
  $conn = new Mongo('cbb-cs03-bt26-17');

  // access database
  $db = $conn->factdb;


  // access collection
  $collection = $db->hosts;

  // define what to find
  $host = array(
        'host' => $gethost
  // disconnect from server
} catch (MongoConnectionException $e) {
  die('Error connecting to MongoDB server');
} catch (MongoException $e) {
  die('Error: ' . $e->getMessage());

 $cursor = $collection->find($host);
  foreach ($cursor as $value) {

Open in new window

I have installed nagios 3.2.2 and the plug-ins for some reason the plug-ins show some server in critical state with this message: CRITICAL - Connection to MongoDB failed!
its in a secure env so no authentication, I ran each check command that i have in my nrpe,cfg i.e:
command[check_mongo_connect]=/usr/local/nagios/libexec/ -A connect -W 2 -C 4
command[check_mongo_free]=/usr/local/nagios/libexec/ -A connections -W 70 -C 80
command[check_mongo_rep]=/usr/local/nagios/libexec/ -A check_rep_lag -W 2 -C 80

on the client and the nagios host (with -H) it all returned connection ok but still nagios show them as critical, nrpe is running in xinetd.
I have an existing array that needs to be converted into a table of it's own in a database. Can anyone help me find or create a function in php that could do this?

But then I also need another function that would easily read the new table back into an array so I don't have to change a large section of my existing code.

Any help would be appreciated.

NoSQL Databases





A NoSQL database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to clusters of machines and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are specified from those used by default in relational databases, making some operations faster in NoSQL. Sometimes the data structures used by NoSQL databases are also viewed as "more flexible" than relational database tables.

Top Experts In
NoSQL Databases