We help IT Professionals succeed at work.

NoSQL Databases





A NoSQL database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to clusters of machines and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are specified from those used by default in relational databases, making some operations faster in NoSQL. Sometimes the data structures used by NoSQL databases are also viewed as "more flexible" than relational database tables.


Can anybody give me an example of how they converted one or two relational tables to a redis hash(es) and then queried it

we are planning to move some of our fat tables in redis on a cheap linux box and I need some pointers for that

is it cassandra is good for reporting purpose
I want to Install cassandra.

I want to write first hello world program using cassandra with java
is it cassandra is good for OLTP  purpose
What is the client interface i can use read/update Cassandra database using Java Script?
Also can I execute Java classes from a Java Script.
what is meant by hadoop?

when we go for hadoop?

what are advantages of hadoop?

we go for hadoop instead of vectorwise?
what is meant by cassandra?

when we go for casssandra?

what are the advantages of cassandra?
I am working with nodejs, but only should need some javascript knowledge for this one.

Process: 1. I receive one input in the form of a String from the user.

                2. I capture that information and also the date (the time is not necessary) and then I call another view.

               3. This view consists of a table populated from mongoDB with a varying number of Objects, each with three properties. {firstname, lastname, address}

                4. The user then selects ( Opinions on the best way to select multiple items appreciated) 1-many objects from that table.

                 5. I would like to create and store a complex Object consisting of 3 things....., The string, the date, and an array? or object? of all the items selected.

Does this make sense?
Thank you, any partial answers would even be appreciated because I will be working on it. 500 points, I would give more if i could because im stuck here.
Thank you
How could i set the JNDI Name in mongodb using java
How could i set the connection pooling in mongodb using java
I'm a searsoned data architect with more than 10 years of experience designing, developing, and performance tuning RDBMS applications primarily in Oracle platform.

To broaden my skill, I want to be proficient in Big data technologies like Hadoop and primarily in NoSQL databases (Riak, HBase,MongoDB, CouchDB).

I'm actively following big data and NoSQL articles and concepts in past 5-6 months but what I want now is hands-on and some in-house project experience on these technologies.

My questions are:
1. What are the technologies, programming languages, concepts I must learn to get a good drip in NoSQL technologies and Hadoop (I know list can be very big - but want to get a quick grip in medium term say 6-8 months)

2. Is there any training, certification valued by industry ?

3. What is the step by step approach for a database guy to conquer
NoSQL Technologies and Hadoop? For example:

A sample path to learn Riak
1. Learn general NoSQL concepts
2. Learn Ruby (prerequisite of talking with many NoSQL APIs)
---- what key concepts I must learn in Ruby
3. Learn / revisit web service basics -  REST etc.
4. Learn RIAK
Above is a sample that I've developed to express what I want to know. Requesting your comment and suggestions.
I want to insert a date in mongodb using java program
the date format shouldbe 2-Jun-2012

please give me the code for that

This Canadian client is a 100% Microsoft shop (Windows, SQL, IIS, Visual Studio, etc.).  They had performance issue where one monthly report would fail to load timing out after 2 minutes, they resolved the probelm and the same report now loads in under 5 seconds.

- Load balanced IIS front end.
- MS SQL cluster back end with only 150GB data on 4-core VM with 64GB RAM.
- All pages/reports load between 2 - 5 seconds.
- Average SQL volume IOPS 24.7 peaking around 500 IOPs on a SAN with 15% IOPS workload.
- Most pages contain calculated stats from transactional data.
- Data contains personal information including US, Canada, and EU.
- The current product is selling extremely well.

One of their managers is preaching that 5 seconds load is still unacceptable and that even the most complex report should load in sub-seconds like a Google search.  To do so he's proposing the owner to give him the budget and staffs to pretty much redo everything from scratch.

1. Redesign the database in NoSQL (Cassandra).
Him: NoSQL is free, scalable and fast.  Used by Google, Facebook, Twitter.
Me: There is no way MS SQL can't support 150GB of data, NoSQL won't necessary be faster with such little data.  NoSQL is not mature, support is only offered by start-ups, hard to maintain (for now), lack of trained developers/admins (for now), eventually consisteny for the cluster might result to inaccurate data.

2. Rewrite the application in Java.
Him: Java is cheaper to write and maintain than .NET.  …

I'm pretty new to the world of UNIX.
I'm a windows guy new to all of this.

I'm trying to identify and kill processes on my box.

sudo pgrep -f cassandra
<returns the associated process IDs (good)>

So, to kill all the "cassandra" processes on the system:
sudo grep -f cassandra | xargs kill -9

I get the following results:
kill 14166: No such process
kill 28140: Operation not permitted

What am I doing wrong here?

I am looking to build a cloud system that would have a central Database system, that would store the core client data and then feed this data to various external applications. External applications will also be updating the centralised database. The external applications can be PHP websites, mobile applications, back office systems, ect..
The centralised DB must to be able to handle high volumes of data and therefore must be a high performace DB.
I want to know what are the best high performance opensource database system out there that I can use for my central DB. I know there is MySQL, PostgreSQL, Firebird but I don't know if they are powerful enough. I've read that Amazon Relational Database service is pretty fast too. Can someone please enlighten me on what is the best out of these? or if there is any other DBs that would be better for this kind of job? for Also, would it be best to use a Relational DB or NoSQL DB?

I'm relatively new to the world of Hector and Cassandra.
I'm familiar with the concepts of KeySpaces and ColumnFamilies.

What is the difference between the KeySpace and the KeySpaceDefinition types?
Are they interchangeable?

Two methods are exposed via the Factory:

What is the difference between these two calls and when would I use them?

If I want to execute a CQL query, I need to use a KeySpace.

CqlQuery cq = new CqlQuery(keySpace, .., .., ..)

How do I retrieve a existing "KeySpace" from a cluster?

Using Redis I store a KEY and VAlUE in this example structure :


What I am trying to do is find what keys belong in the DAY tree (01, 02, 03). These represent dates but suppose one day does not exist I don't want to increment the counter in a loop and assume it exists.

By the way in PHP I have tried to return the contents in an array without specifying the full key and it returns nothing. I thought Redis might allow for a partial key and return the remaining structure but this is not the case.

$check = $redis->smembers("$currentUserVnoId:$exampleSelection");

Open in new window


I need some information on Nosql, is these a database,  How different from other
any information  highly appreciated

We are currently using Cassandra 0.8.10 and have run into some strange issues surrounding
querying for a range of data

I ran a couple of get statements via the Cassandra client and found some interesting results:

Consider the following Column Family Definition:

    ColumnFamily: events
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator: org.apache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period in seconds: 0.0/0
      Row Cache Provider: org.apache.cassandra.cache.ConcurrentLinkedHashCacheProvider
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 0.2953125/1440/63 (millions of ops/minutes/MB)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: true
      Built indexes: [events.events_Firm_idx, events.events_OrdType_idx, events.events_OrderID_idx
 , events.events_OrderQty_idx, events.events_Price_idx, events.events_Symbol_idx, events.events_ds_timestamp_idx]
      Column Metadata:
        Column Name: Firm
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: events_Firm_idx
          Index Type: KEYS
        Column Name: OrdType
          Validation Class: org.apache.cassandra.db.marshal.BytesType
          Index Name: events_OrdType_idx
          Index Type: KEYS

I developed a dynamic web application in Eclipse, which uses a servlet to access to a database. Whenever I restart the application, it works as expected. After a few times of runs, however, it doesn't work and access to database returns no results.

Is there anything I have missed configuring either tomcat or the web.xml file?

I attached the web.xml below. Part of the servlet file is down below.


import java.io.BufferedReader;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.PrintWriter;
import java.util.List;

import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

import com.mongodb.BasicDBObject;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.DBObject;
import com.yyt.core.er.data.ERConstants;
import com.yyt.core.er.io.mongo.ERMapping;
import com.yyt.core.er.io.mongo.ErrorDB;
import com.yyt.core.er.io.mongo.LocalMongo;
import com.yyt.core.er.io.mongo.Nefer;
import com.yyt.core.er.io.mongo.Prod;
import com.yyt.core.er.utils.jedis.RedisDB;

 * Servlet implementation class RunServlet
public class RunServlet extends HttpServlet {
      private static final long serialVersionUID = 1L;

We're selecting a low cost but capable database for use in our Linux appliances (4-16 GB RAM, quad core cpus, 500-2 TB drives) that gather information from devices. Each appliance collects 10-30kb of information from 5k+ devices, updates this information every minute, and completely refreshes the data 10+ times daily.

The data must be stored to a redundant persistent database, automatically scale horizontally automatically with each added appliance, and be shared among all appliances in groups of up to 20 appliances. with automatic failover  The database should be virtualizable because the appliance is shipped in virtual and physical form factors and we may want to move it into a cloud like AWZ.

The device data does not need more than a dozen indexes, joins are unnecessary, and most information can be stored as xml/json format or as normalized fields with a big blob for the bulk of the data that's not indexed.

We've investigated nosql databases like mongodb, but they seem to have restrictive requirements (e.g. mongodb uses all the memory on the server) and couchbase is new and looked rough when we tried to integrate with it.

We're not sure if the SQL databases like Postgres can scale horizontally like we'd like it to. Mysql is out because it requires pay for this application.

Any ideas on which databases warrant a closer look?
My scenario:

I have many machines which stores data in local databases (sqlite).
There is one `client` machine that pull data from all machines and save it to his local database for use other parts of software. Saying pull I mean that he want get some data between dates, i.e. whole day, whole month. (By the way - how it's call that scenario?)

I'am struggling with choosing correct technology.

Currently I making RPC with Thrift framework. It's easy to use but I don't know is it good way, because I making some replication mechanism myself.
With that RPC I call function i.e  GetNewData(Revision). And I get all new tables data. Then I save it to client's database.

Other idea was use MongoDB or CouchDB with replication (i don't know if these databases can by setup like in my scenario).

Other idea was to save Sqlite database to file, download it and merge. Or save only new record and merge. It is simple, and because it is simple I'am afraid that I will be closed for future extensions.

I need some advice and discussion.
How best do I store and retrieve matrices from a database? Taking MySQL, for example, do I simply use 3 tables: the document, the attributes and the document_attributes, or is that too simplistic and poor for performance. Do I not use mysql and something of the NoSQL sort like MongoDB?
I need to crawl selected websites for about 5-10 different attributes.

For example lets say its a car website and on each page of the site there is information about a particular car for sale and it includes the vehicle make, model, year, price and etc.
I need all this information to be collected and stored in a database but since a good car sale website could have thousands of pages it can become a lot of data to collect.

I don't expect to have more than a few hundred words collected from each page so i think it would be under 1KB of data per record i store.

At the moment I don't know if i should be using NoSQL or a MySQL database since i will have an insane amount of rows/records created.

Any thoughts on going one way or the other? I need to do certain data manipulation on all the rows/records such as organizing the car by price from highest to lowest and etc.

I have a maven project in Java. The project has a few dependent jar such apache commons, mongodb, etc.

I want to build a single runnable jar file such that I can use

java -jar myjar.jar or other command to run the jar.in Linux command line.

The main entry is:  com.myapp.util.test.TestMain.java

My pom.xml is attached.

Please help!

NoSQL Databases





A NoSQL database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to clusters of machines and finer control over availability. The data structures used by NoSQL databases (e.g. key-value, wide column, graph, or document) are specified from those used by default in relational databases, making some operations faster in NoSQL. Sometimes the data structures used by NoSQL databases are also viewed as "more flexible" than relational database tables.

Top Experts In
NoSQL Databases