Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Cassandra Select Query

Posted on 2016-10-28
1
Medium Priority
?
233 Views
Last Modified: 2016-12-29
I have a 3 node datastax cassandra(Community) cluster with huge data. I have few tables which contain 3-5 billion records in them. I want to delete data that is older than 90 days from those tables.

The problem is how do i run a select query which runs without timeout. I am currently running below query

NOW=$(date -d "-3 month" +"%Y-%m-%d")
select day_ts from table_name where minute_ts < '$NOW' LIMIT 100000 ALLOW FILTERING;


Even if i limit the select query result, it will still parse the whole 3-5 billion records and then filter the data.

Please suggest what can be a efficient way to do this.
0
Comment
Question by:Abhinav Grover
1 Comment
 
LVL 26

Accepted Solution

by:
Tomas Helgi Johannsson earned 2000 total points
ID: 41865803
Hi!

Is your table partitioned ? If not, table this size should be partitioned otherwise you will have performance issues on your queries. Choose a good primary key/clustering key to cluster your data across your nodes for "near even" workload distribution.
Partitioning the data in a "right" way you exclude  data that does not meet your queries filtering in their parsing/search of the data hence shorter execution time.
Also if you don't have index on the table I strongly suggest you put on an index that satisfy your queries where clause columns to speed up your queries.

http://datascale.io/cassandra-partitioning-and-clustering-keys-explained/
http://www.planetcassandra.org/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key/
https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_compound_keys_c.html
https://docs.datastax.com/en/cql/3.1/cql/ddl/ddl_secondary_index_c.html

Regards,
   Tomas Helgi
1

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this article, I’ll look at how you can use a backup to start a secondary instance for MongoDB.
In today's business world, data is more important than ever for informing marketing campaigns. Accessing and using data, however, may not come naturally to some creative marketing professionals. Here are four tips for adapting to wield data for insi…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…
Suggested Courses

782 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question