Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Cassandra Select Query

Posted on 2016-10-28
1
Medium Priority
?
213 Views
Last Modified: 2016-12-29
I have a 3 node datastax cassandra(Community) cluster with huge data. I have few tables which contain 3-5 billion records in them. I want to delete data that is older than 90 days from those tables.

The problem is how do i run a select query which runs without timeout. I am currently running below query

NOW=$(date -d "-3 month" +"%Y-%m-%d")
select day_ts from table_name where minute_ts < '$NOW' LIMIT 100000 ALLOW FILTERING;


Even if i limit the select query result, it will still parse the whole 3-5 billion records and then filter the data.

Please suggest what can be a efficient way to do this.
0
Comment
Question by:Abhinav Grover
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 25

Accepted Solution

by:
Tomas Helgi Johannsson earned 2000 total points
ID: 41865803
Hi!

Is your table partitioned ? If not, table this size should be partitioned otherwise you will have performance issues on your queries. Choose a good primary key/clustering key to cluster your data across your nodes for "near even" workload distribution.
Partitioning the data in a "right" way you exclude  data that does not meet your queries filtering in their parsing/search of the data hence shorter execution time.
Also if you don't have index on the table I strongly suggest you put on an index that satisfy your queries where clause columns to speed up your queries.

http://datascale.io/cassandra-partitioning-and-clustering-keys-explained/
http://www.planetcassandra.org/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key/
https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html
https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_secondary_index_c.html

Regards,
   Tomas Helgi
1

Featured Post

Not sure which OpenStack Certification to get?

So you’ve realized you might want to get certified in OpenStack, but you’re not sure what the benefits might be or even which one you should take. You know there are several certification courses you can choose from, but how do you know which one is right for you?

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article, I’ll look at how you can use a backup to start a secondary instance for MongoDB.
In today's business world, data is more important than ever for informing marketing campaigns. Accessing and using data, however, may not come naturally to some creative marketing professionals. Here are four tips for adapting to wield data for insi…
Polish reports in Access so they look terrific. Take yourself to another level. Equations, Back Color, Alternate Back Color. Write easy VBA Code. Tighten space to use less pages. Launch report from a menu, considering criteria only when it is filled…
In this video, Percona Director of Solution Engineering Jon Tobin discusses the function and features of Percona Server for MongoDB. How Percona can help Percona can help you determine if Percona Server for MongoDB is the right solution for …

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question