Cassandra Select Query

I have a 3 node datastax cassandra(Community) cluster with huge data. I have few tables which contain 3-5 billion records in them. I want to delete data that is older than 90 days from those tables.

The problem is how do i run a select query which runs without timeout. I am currently running below query

NOW=$(date -d "-3 month" +"%Y-%m-%d")
select day_ts from table_name where minute_ts < '$NOW' LIMIT 100000 ALLOW FILTERING;

Even if i limit the select query result, it will still parse the whole 3-5 billion records and then filter the data.

Please suggest what can be a efficient way to do this.
Abhinav GroverAsked:
Who is Participating?
Tomas Helgi JohannssonCommented:

Is your table partitioned ? If not, table this size should be partitioned otherwise you will have performance issues on your queries. Choose a good primary key/clustering key to cluster your data across your nodes for "near even" workload distribution.
Partitioning the data in a "right" way you exclude  data that does not meet your queries filtering in their parsing/search of the data hence shorter execution time.
Also if you don't have index on the table I strongly suggest you put on an index that satisfy your queries where clause columns to speed up your queries.

   Tomas Helgi
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.