Cassandra – Fixing blocked Native Transport Requests

In our production Cassandra cluster (At the time of this post was DSE 5.1.18), we were seeing the Total Blocked count for the stat Native-transport-requests increasing regularly on each node. The Native-transport-request metric represents any CQL request.

Here is the view we were seeing on each node in OpsCenter:

The issue was because we were using the default value of 128 for max_queued_native_transport_requests which is set in cassandra-env.sh.

Once the limit is reached for max queued requests any new requests are dropped. Increasing the queue size would allow more space for backed up requests to queue rather than just being dropped.

The solution was to increase the queue size from 128 to 1024 by adding the following line to the end of the cassandra-env.sh file on each node:

JVM_OPTS="$JVM_OPTS -Dcassandra.max_queued_native_transport_requests=1024"

Note: You will need to restart each node after making this change.

This change resolved the issue for us and the Total Blocked count has been significantly reduced on each node:

Note: If you still have issues you can continue to increase the queue size, the DataStax documentation suggests 3074, however I would recommend increasing the value incrementally and evaluating the impact.

I hope that helps!

References

Cassandra – How to Add a New Node to an Existing Cluster

This article explains the steps for adding a new node to an existing Cassandra cluster using open source Apache Cassandra. If you are using the DataStax Enterprise Edition of Cassandra you can add new nodes using OpsCenter.

Adding a new node to an existing cluster, in Apache Cassandra version 3 and higher is fairly easy. When a new node is added to the cluster, Cassandra will automatically adjust the token ranges each node is responsible for resulting in each node in the cluster storing a smaller subset of the data.

Prepare the New Node

Once you have installed Cassandra on your new node (Or better yet, applied your Cassandra puppet template), before you start the new node, there are a few critical items you need to configure.

Verification

Verify Version

Verify that the version of Cassandra you want has been correctly installed. In this case we were using version 3.11.4.

yum list cassandra

You should see Cassandra and the correct version:

Verify Cassandra Config Settings

Verify Cluster Name

Important! Ensure that the cluster_name in the cassandra.yaml file is set to EXACTLY the same value as the one used on all of the other nodes in the cluster.

You can verify the cluster name by looking in the cassandra.yaml file:

cat /etc/cassandra/conf/cassandra.yaml | grep cluster_name:

For example:

Verify Seed

Important! Ensure the seeds value in the cassandra.yaml file is set to EXACTLY the same value as the one used on all of the other nodes in the cluster.

You can verify the seed node by looking in the cassandra.yaml file:

at /etc/cassandra/conf/cassandra.yaml | grep seeds:

For Example:

Disable Incremental Backups

Important! It is recommended to disable incremental backups initially then enable them once your node has successfully joined.

If you currently have a reasonably large data set that will be synced to the new node, it is highly recommended to temporarily disable incremental backups.

During the syncing process there will be a lot of file compaction running and having incremental backups enabled will prevent many files from getting compacted until the backups are cleared.

In the cassandra.yaml file ensure the incremental backups field is set to false:

  • incremental_backups: false

Start Cassandra on the new Node

Now that Cassandra’s configuration file has been updated with the name of the cluster it will be in and provided a seed node to connect to in the existing cluster, when you start the new node, it will automatically connect to the cluster and start syncing data.

Verify Cluster Status

Before starting the new node, let’s check the status of all the existing nodes in the cluster by running the nodetool status command from one of the existing nodes:

nodetool status

You can see that all the existing nodes are in status UN (Up – Normal):

Start Cassandra

Status

Check the status of the Cassandra service on the new node.

systemctl status cassandra

The service should be disabled and off:

Enable the service

systemctl enable cassandra

Start service

systemctl start cassandra

Watch for Errors

Watch to ensure there are no exceptions or errors being reported.

journalctl -f -u cassandra

Monitoring Node Syncing

Status

You can view the status of all nodes by running the following command from any node:

nodetool status

While the node is joining it’s status will be:

  • UJ – Up – Joining

Syncing

You can use nodetool to monitor progress of the data syncing:

nodetool netstats | grep Already

The bytes total should be continuing to increase

  • if they stop, the syncing processing is stuck
  • stop the node, clean out the data, and start it again

Ready

When the node is finished syncing it’s status will be:

  • UN – (Up – Normal)

Note:

Only add ONE new node to the cluster at a time. If one node is joining and you try to add another, the second node will throw an error and not join.

Also, it is important to wait for the cluster to stabilize before adding another node. Check system.log for activity to ensure the cluster has stabilized.

Compactions

Even though the new node is up and receiving traffic, depending on the size of the dataset, it will be busy for a while doing compactions. Allow the server to work through all major pending compactions before doing the first snapshot backup. A snapshot creates hard links to the existing data files and prevents Cassandra from compacting and removing existing data files. So allow the pending compaction queue to empty out before making the first backup and proceeding to the cleanup stage. 

You can monitor the pending compactions queue with the nodetool compactionstats command:

nodetool compactionstats

Which will show you output such as:

Notice the “Active compaction remaining time”. When the compactions show a remaining time in hours, they are major compactions. Allow these tasks to complete before making any other changes, such as running backups.

Finally, when all the long running major compactions are complete you should see:

It is alright if there are some compactions running or in the queue, but they should mostly all have completion times in minutes, not hours.

Once the compactions have completed, you can move forward with setting backups.

Backups

Now that the node is up and running, you will want to setup two backup steps: snapshots and incremental backups.

Snapshot

A snapshot in Cassandra is a full backup of all tables in the keyspace. You can create a snapshot with the following command:

nodetool snapshot

A folder called “snapshot” is created under the folder for each table. It is up to you to setup a job to copy each of these folders to a backup drive.

Important! If you do not regularly remove the snapshots once copied, over time it will create problems for Cassandra because it will not be able to compact older files referenced by those backups.

Once you have copied your snapshot to another drive, you can run the following command to clear all snapshot folders for all tables:

nodetool clearsnapshot

Incremental Backups

Cassandra also generates incremental backups every time a new SSTable is written to disk. You can save incremental backups between full snapshots.

To enable incremental backups you must change the setting in your cassandra.yaml file:

Incremental backups will be put in a folder called “backups” under each tables folder.  

Important! You are responsible for copying these backup files onto a backup drive and removing them from the Cassandra table folders. If you do not periodically remove these files it will prevent Cassandra from performing regular compaction and will cause performance problems.

Backup Summary

The snapshot and incremental backup folders are stored in the directory of each table in a keyspace, here is an example:

Repair

We want to make sure that all data for keyspaces that have a replication factor greater than one, have been replicated properly. So once the new node is up and the cluster is stable, run repair on each node. 

Run Repair (One node at a time)

To make sure all data has been replicated properly to each node, before cleanup run nodetool repair:

nodetool repair

Monitor Repair

The first phase of the repair process is doing validations between rows on the current node and the other nodes that should have the same rows.

You can view the progress of validations being done as part of the repair task by looking for “validation” compaction types queue when using nodetool compactions:

nodetool compactionstats

The second phase of the repair process is syncing data between nodes where a difference has been found, you can view this activity using nodetool netstats command:

nodetool netstats

Cleanup

Important! Do not start the cleanup step until you have confirmed that the new node:

  • has finished syncing
  • joined the cluster
  • all pending major compactions on the node have completed
  • the first backup has been run
  • repair has been run on every node

When a new node is added to a cluster, Cassandra adjusts the token ranges for which each node is responsible. Once data from existing nodes is shifted to the new node, the other nodes will no longer be responsible for those keys and will no longer serve them in requests.

However, for safety reasons Cassandra does not automatically remove the key data from the nodes that previously were responsible for those keys. They leave this to the administrator to do manually, which makes sense. If anything goes wrong with the new node, no data is lost. The cleanup task will remove any data on the node belonging to token ranges for which the node is no longer responsible.

Run Cleanup (One node at a time)

Once the repair task has completed, and their are no more validation tasks running, you can run the nodetool cleanup task to remove all data for the token ranges the node no longer supports:

nodetool cleanup

Monitor Cleanup

You can monitor the progress of cleanup using nodetool compactionstats:

nodetool compactionstats

References

The following is the best system administration book I have read on Apache Cassandra:

Mastering Apache Cassandra 3 (by Aaron Ploetz)

I hope that helps!

Cassandra – How to Create a Cluster

When you download Apache Cassandra, it is easy to start Cassandra and have a single node up and running within minutes. This is usually enough for testing on my developer workstation. However, sometimes when testing out Cassandra configuration changes or changes to our application, it is nice to have a multiple node cluster available for testing purposes.

The following instructions will help you get a simple three node cluster up and running.

Step 1 – Create Three Nodes

Complete the following steps on three different machines. In my case I have three virtual machines running Windows.

Install Java

Download Java

Create JAVA_HOME environment variable, for example:

  • JAVA_HOME = C:\java-1.8.0-openjdk-1.8.0.222-4

Install Cassandra

Download Cassandra

Unzip the file somewhere, for example:

  • C:\apache-cassandra-3.11.4

Create CASSANDRA_HOME environment variable:

  • CASSANDRA_HOME = C:\apache-cassandra-3.11.4

Set Cluster Name

The Cluster Name field in the cassandra.yaml config file specifies the unique identifier Cassandra uses to determine if a node belongs in a cluster or not.

Your cassandra.yaml config file will be located here:

  • C:\apache-cassandra-3.11.4\conf\cassandra.yaml

Important! The cluster name must be identical on ALL nodes in the cluster.

Edit the cluster_name field and set it to the following:

cluster_name: 'ExampleCluster'

Start Cassandra

Run Cassandra using PowerShell (Note: The -f option means “follow”):

C:\apache-cassandra-3.11.4\bin\cassandra.ps1 -f

Cassandra has started successfully when you see:

Check Cassandra Status

Using the admin tool nodetool you can check the status of your new node:

C:\apache-cassandra-3.11.4\bin\nodetool.bat status

Open Cassandra Query Console

Cassandra has a command line tool called CQLSH (Cassandra Query Language Shell) which can be used to query the database.

Note: To run CQLSH on Windows you must have Python 2.7 installed and you need to ensure you are pointed to the right version of Python when you launch CQLSH.

Launch CQLSH:

C:\Python27\python.exe C:\apache-cassandra-3.11.4\bin\cqlsh.py

If CQLSH was able to connect to your local instance of Cassandra, you should now have a CQLSH prompt:

Once you have repeated the steps in this section for each node and confirmed you can connect to each instance of Cassandra using CQLSH, you are ready to move on to.

Step 2 – Create Keyspace (On First Node Only)

Let’s create a new Keyspace (database) and table, but only on the first node. It is fine for this keyspace to only be on the first node for now, since later when we cluster the first node with the other two nodes, the keyspace will be synced to all nodes.

Create a Keyspace

Let’s create an empty keyspace (database).

From the CQLSH console run the following script:

CREATE KEYSPACE ExampleKeyspaceWITH REPLICATION = {
  'class' : 'NetworkTopologyStrategy',
  'datacenter1' : 1
};

Now if you type the following command:

describe keyspaces;

You will see the new keyspace examplekeyspace has been created:

Now Switch to making the new keyspace your active keyspace:

use examplekeyspace;

You will now be switched to keyspace examplekeyspace:

Create a Table

Now that we are in keyspace examplekeyspace, we can list the existing tables using:

describe tables;

Right now, there are no tables present:

Let’s create a new table called exampletable:

CREATE TABLE examplekeyspace.exampletable (
  key text, 
  column timestamp, 
  value int, 
  PRIMARY KEY ((key), column)
);

When you create the table, you can then use describe tables to verify the table has been created:

describe tables;

Insert Some Data

Now let’s add a row to the table:

INSERT INTO examplekeyspace.exampletable(key, column, value) 
VALUES('someKey', '2019-09-18T10:10:10.000Z', 123);

Verify the Data Was Added

SELECT * FROM examplekeyspace.exampletable;

Step 3 – Configure Cluster (On All Nodes)

Now that we have created three standalone Cassandra nodes and added a database and table to the first node, the next step is to configure these three nodes to be a cluster.

Stop Cassandra

If Cassandra was started using cassandra.ps1 -f, just kill the session. 

Listen Address

The Listen Address is the address Cassandra will tell other nodes to connect to on. On each of the three nodes, set it to the IP of the node. For example, if node one has an IP of 11.11.11.11, set it to 11.11.11.11 on node one, then if node two has an IP of 22.22.22.22, set it to 22.22.22.22 on node two, etc.

Set listen_address in the cassandra.yaml file:

listen_address: IP_OF_CURRENT_NODE

Seeds

The Seeds entry in the config file stores the IP of the node that any new node will connect to first when it starts up to get the list of all other nodes in the cluster.

Important! The Seeds entry must be identical on each node in the cluster. So in this example set it to the IP address of the first node in the cluster on ALL nodes.

Set seeds in the cassandra.yaml file:

- seeds: "IP_OF_FIRST_NODE_OF_CLUSTER"

Step 4 – Connect Each Node to the Cluster

Now that all three nodes are correctly configured to work as a cluster, all that is needed for the final step is to start each node.

On First Node

Start Cassandra using PowerShell:

C:\apache-cassandra-3.11.4\bin\cassandra.ps1 -f

Verify Cassandra is up and running using the admin tool nodetool:

C:\apache-cassandra-3.11.4\bin\nodetool.bat status
C:\apache-cassandra-3.11.4\bin\nodetool.bat status
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)   Host ID                               Rack
UN  First_Node_IP   540.03 KiB  256          100.0%            93d76a81-eca1-450b-bc84-f614cbc3c9b8  rack1

On Second Node

Start Cassandra using PowerShell:

C:\apache-cassandra-3.11.4\bin\cassandra.ps1 -f

Verify Cassandra is up and running using the admin tool nodetool:

C:\apache-cassandra-3.11.4\bin\nodetool.bat status

So now if you run nodetool status you should see:

C:\apache-cassandra-3.11.4\bin\nodetool.bat status
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)   Host ID                               Rack
UN  First_Node_IP   540.03 KiB  256          100.0%            93d76a81-eca1-450b-bc84-f614cbc3c9b8  rack1
UN  Second_Node_IP  414.5  KiB  256          100.0%            5e4a62f2-23df-443a-95d9-a69a72125a29  rack1

The important thing to note is the status in the first column:

  • UJ – Means the node is in the process of joining the cluster
    • It is in the process of receiving data and syncing with the cluster, but is not yet ready to receive queries
  • UN – Means the node has finishing joining the cluster and is in a normal state

On Third Node

Complete the same steps as with node two. Once the node is started and has synced, when you run nodetool status you should see:

C:\apache-cassandra-3.11.4\bin\nodetool.bat status
Datacenter: datacenter1
========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load        Tokens       Owns (effective)   Host ID                               Rack
UN  First_Node_IP   540.03 KiB   256          100.0%            93d76a81-eca1-450b-bc84-f614cbc3c9b8  rack1
UN  Second_Node_IP  414.5  KiB   256          100.0%            5e4a62f2-23df-443a-95d9-a69a72125a29  rack1
UN  Third_Node_IP   448.5  KiB   256          100.0%            4a4a63f3-13df-553a-85d2-b34a68125a32  rack1

On Any Node

Now that the cluster has been created and all nodes have been synced, we should be able to see the same data by querying Cassandra from the CQLSH console app from any node:

CQLSH Console

Launch CQLSH:

C:\Python27\python.exe C:\apache-cassandra-3.11.4\bin\cqlsh.py

Switch to the examplekeyspace keyspace:

use examplekeyspace;

Verify you have the table exampletable:

describe tables;

Verify you have data in the table:

SELECT * FROM examplekeyspace.exampletable;

You should see the following:

That is all! I hope that helps!

Cassandra – Restoring Data from a Backup with OpsCenter

In the last few months we migrated all of our Cassandra clusters from open source Apache Cassandra to DataStax Enterprise Cassandra. As one of the perks of using the DataStax version we also get the Cassandra cluster management tool OpsCenter.

One of the advantages of moving to OpsCenter is we no longer have to use our own custom scripts for managing backups. As part of verifying our new backups we also wanted to run through some data recovery scenarios and make sure we understood how to use the new tools correctly. The following is a simple how to guide for a few common data recovery scenarios that I hope you find useful.

  • Restore Accidentally Deleted Rows While keeping Newly Added Rows
  • Restore Keyspace/Table While Keeping Newly Added Rows
  • Restore Keyspace/Table to a Backup (Dropping All Recent Changes)

Note: At the time of this post, we were using versions:

  • Cassandra DSE – 5.1.17
  • OpsCenter – 6.7.5

Step 1 – Select Backup

Select a Backup to Restore

Services -> Activity -> Restore Backup

Select a backup to restore and click “Next”:

Now it is on step 2 of the restore process where you need to be careful which options you select based on what you would like to do. The following are some common scenarios we tested.

Step 2 – Restore Backup (Scenarios)

Restore Accidentally Deleted Rows While keeping Newly Added Rows

When data is deleted using a DELETE statement in Cassandra, it does not remove the row from an existing SSTable, since they are immutable, instead it writes a Tombstone entry in a new SSTable which stays around for the amount of time defined on the table’s gc_grace_seconds setting. This Tombstone allows the delete to be propagated to all nodes where the row is replicated and also ensures that when an older SSTable storing the row that was deleted is compacted into a new SSTable, the row is removed.

So if you run a restore of a backup, you will notice that even though the deleted rows are in the backup you are restoring, they do not come back. The reason is because Cassandra has a Tombstone entry that is newer than the row being restored so it still knows to mark the row as deleted when it builds it’s index.

The solution is to temporarily set the tombstone grace period on the table to 0 seconds, so it will be ignored, for example:

alter table MY_KEYSPACE_HERE.MY_TABLE_NAME_HERE with gc_grace_seconds = 0;

Then restore the backup for the keyspace you want using the SSTableLoader option:

  • “Use sstableloader”

This will restore the deleted data from the backup into Cassandra without affecting rows added since the backup was taken:

Once the restore task has completed and you can confirm that the accidentally deleted data has been restored, alter the table to set the gc_grace_seconds back to what it was before, for example:

alter MY_KEYSPACE_HERE.MY_TABLE_NAME_HERE with gc_grace_seconds = 864000;

Restore Keyspace/Table While Keeping Newly Added Rows

To restore a backup of a keyspace or table while ensuring rows added since the backup was taken are not lost, you need to ensure you use the SSTableLoader option:

  • “Use sstableloader”

This option will import the data from the backup into the Cassandra cluster, so newly added data will not be affected. 

When running the restore, on the second screen select the “Use sstableloader” option:

Restore Keyspace/Table to a Backup (Dropping All Recent Changes)

If for some reason you need to restore a keyspace or table back to exactly what was in the backup, throwing away any newly added data, on the second screen select the option:

  • “Truncate/delete existing data before restore”.

This option will reset the selected table back to what it was when the backup was taken. This is the default option.

NOTE: Do not use this option unless you are absolutely sure you want to drop newly added data.

I hope you found this useful!

References

Cassandra – Install and Query Cassandra on Windows

In our testing, staging, and production environments Cassandra is run on CentOS. However, if you are like me and your developer station is a windows machine, you will want to be able run and query Cassandra in your local environment. The following is a simple set of instructions on setting up Cassandra for local development.

Install Cassandra

  • Download Cassandra
  • Unzip it to a folder, such as:
    • C:\apache-cassandra-3.11.5
  • Ensure environment variable for JAVA_HOME has been set
  • Open a Powershell window and run the commands:
set CASSANDRA_HOME=C:\apache-cassandra-3.11.5
C:\apache-cassandra-3.11.5\bin\cassandra.ps1 -f

You should now have an instance of Cassandra running:

Note: CASSANDRA_HOME is used for the location of where the following folders will get written:

  • data, commitlog, hints, and saved_caches.

Query Cassandra

Cassandra comes with a command line tool for querying the database called CQLSH (Cassandra Query Language Shell).

To run CQLSH:

  • You must have Python 2.7 installed and you need to ensure you are pointed at the right version when you launch CQLSH
  • Launch CQLSH from a windows command prompt:
C:\Python27\python.exe C:\apache-cassandra-3.11.5\bin\cqlsh.py
  • If CQLSH was able to connect to your local instance of Cassandra, you should now have a CQLSH prompt

Create a Keyspace

Now that we are connected to Cassandra, let’s create a new keyspace, which you can think of as a database in Cassandra.

From the CQLSH console run the following script to create a new keyspace called “ExampleKeyspace”:

CREATE KEYSPACE ExampleKeyspace
  WITH REPLICATION = {
   'class' : 'NetworkTopologyStrategy',
   'datacenter1' : 1
  };

Now if you type the following command:

describe keyspaces;

You will see the new keyspace “ExampleKeyspace” has been created:

Now make the new keyspace your active keyspace:

use examplekeyspace;

Create a Table

Now that we are in keyspace “examplekeyspace”, we can list the existing tables:

describe tables;

At the moment, we do not yet have any tables:

Let’s create a new table called “exampletable”:

CREATE TABLE examplekeyspace.exampletable (
  key text, 
  column timestamp, 
  value int, 
  PRIMARY KEY ((key), column)
);

Now when you run the command describe tables:

describe tables;

You can see the table has been added:

Insert Some Data

Now let’s add a row to the table “exampletable”:

INSERT INTO examplekeyspace.exampletable(key, column, value) 
VALUES('someKey', '2019-09-18T10:10:10.000Z', 123);

Verify the new row has been added:

SELECT * FROM examplekeyspace.exampletable;

Alternatives to CQLSH

DataGrip

As an alternative to CQLSH, you can use JetBrain’s DataGrip

  • DataGrip Supports Cassandra 3
    • Note: Only the pay version of DataGrip supports Cassandra, the trial version does not
  • If you use Cassandra, it is worth getting a licence for DataGrip

Cassandra – Restarting a Node in a Cluster

Overview

If you need to restart a node in a Cassandra cluster, there are a few steps that are important to follow.

It is import to note two things:

  • When you write to a node in a Cassandra cluster it will write to an in memory table, as well as the commit log, then it will periodically flush the in memory tables to disk.
  • Data in a Cassandra cluster is replicated (depending on your replication factor) to multiple nodes so when you write a new value to one node, Cassandra will replicate that update to other nodes.

For these two reasons it is import when restarting Cassandra to drain the node which will do two things:

  • It will force the node to flush all in memory tables to disk
  • It will stop the node from receiving updates from all other nodes in the cluster

NOTE: If you restart Cassandra without shutting it down cleanly, it will recover. It will use the commit logs to rebuild the missing data and sync with the other nodes, but this will also slow down the startup process.

Stop Cassandra

Drain the node. This will flush all in memory tables to disk and stop receiving traffic from other nodes.

nodetool drain

Check the status to see if the drain operation has finished.

systemctl status cassandra

Now it is safe to stop cassandra.

systemctl stop cassandra

Start Cassandra

systemctl start cassandra

Check the status of the Cassandra service.

systemctl status cassandra

Monitor the startup state using journalctl.

journalctl -f -u cassandra

Cassandra has started successfully when you see the following message:

Note: Watch for any errors where Cassandra is having trouble syncing with other nodes.

I hope that helps!

Cassandra – Fix Schema Disagreement

Recently we had an issue when adding a new table to a Cassandra cluster (version 3.11.2). We added a new table create statement to our Java application, deployed our application to our Cassandra cluster in each of our test environments, the table was created, we could read and write data, everything was fine.

However, when we deployed to our production Cassandra cluster, the new table was created, but we were unable to query the table from any node in the cluster. When our Java application tried to do a select from the table Cassandra would return the following error:

Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)

Next we tried connecting to each node in the cluster using CQLSH, but we still had the same issue. On every node Cassandra knew about the table and we could see the schema definition for the table, but when we tried to query it we would get the following error:

ReadTimeout: Error from server: code=1200 [Coordinator node timed out waiting for replica nodes' response]
message="Operation timed out - received only 0 responses." info={'received_response': 0, 'required_response': 1, 'consistency': 'ONE'}

We decided to try describe cluster to see if we could get any useful info:

nodetool describecluster

There was our problem! We had a schema disagreement! Three nodes of our six node cluster were on a different schema:

Cluster Information:
        Name: OurCassandraCluster
        Snitch: org.apache.cassandra.locator.SimpleSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
            819a3ce1-a42c-3ba9-bd39-7c015749f11a: [10.111.22.103, 10.111.22.105, 10.111.22.104]

            134b246c-8d42-31a7-afd1-71e8a2d3d8a3: [10.111.22.102, 10.111.22.101, 10.111.22.106]

We checked DataStax, which had the article Handling Schema Disagreements. However, their official documentation was sparse and was assuming a node was unreachable.

In our case all the nodes were reachable, the cluster was functioning fine, all previously added tables were receiving traffic, it was only the new table we just added that was having a problem.

We found a StackOverlow post suggesting a fix for the schema disagreement issue was to cycle the nodes, one at a time. We tried that and it did work. The following are the steps that worked for us.

Steps to Fix Schema Disagreement

If there are more nodes in one schema than in the other, you can start by trying to restart a Cassandra node in the smaller list and see if it joins the other schema list.

In our case we had exactly three nodes on each schema. In this case it is more likely the nodes in the first schema are the ones that Cassandra will pick during a schema negotiation, so try the following instructions on one of the nodes in the second schema list.

Connect to a node

Connect to one of the nodes in the second schema list. For this example lets pick node “10.111.22.102”;

Restart Cassandra

First, drain the node. This will flush all in memory sstables to disk and stop receiving traffic from other nodes.

nodetool drain

Now, check the status to see if the drain operation has finished.

systemctl status cassandra

You should see in the output that the drain operation was completed successfully.
Drained_Node_Confirmation

Stop Cassandra

systemctl stop cassandra

Start Cassandra

systemctl start cassandra

Verify Cassandra is up

Lets check the journal to ensure Cassandra has restarted successfully

journalctl -f -u cassandra

When you see the following message, it means Cassandra has finished restarting and is ready for clients to connect.

Cassandra_Startup_Completed

Verify Schema Issue Fixed For Node

Now that Cassandra is back up, run the describe cluster command again to see if the node has switched to the other schema:

nodetool describecluster

If all has gone well, you should see that node “10.111.22.102” has moved to the other schema list (Note: The node list is not sorted by IP):

Cluster Information:
        Name: OurCassandraCluster
        Snitch: org.apache.cassandra.locator.SimpleSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
             819a3ce1-a42c-3ba9-bd39-7c015749f11a: [10.111.22.103, 10.111.22.102, 10.111.22.105, 10.111.22.104]

             134b246c-8d42-31a7-afd1-71e8a2d3d8a3: [10.111.22.101, 10.111.22.106]

If Node Schema Did Not Change

If this did not work, it means the other schema is the one Cassandra has decided is the authority, so repeat these steps for the list of nodes in the first schema list.

Fixed Cluster Schema

Once you have completed the above steps on each node, all nodes should now be on a single schema:

Cluster Information:
        Name: OurCassandraCluster
        Snitch: org.apache.cassandra.locator.SimpleSnitch
        DynamicEndPointSnitch: enabled
        Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
        Schema versions:
             819a3ce1-a42c-3ba9-bd39-7c015749f11a: [10.111.22.103, 10.111.22.102, 10.111.22.101, 
                                                    10.111.22.106, 10.111.22.105, 10.111.22.104]

I hope that helps!

Cassandra – FSReadError on Startup

Recently we encountered an error with one node in a Cassandra cluster where the Cassandra service said it was running but we would get a failure when we tried to connect:

# cqlsh
Connection error: ('Unable to connect to any servers', 
{'127.0.0.1': error(111, "Tried connecting to [('127.0.0.1', 9042)]. 
Last error: Connection refused")})

So we decided to tail the journal to see if we could find a useful error message:

journalctl -f -u cassandra

While monitoring the journal output we saw the following exception recurring roughly every minute:

Nov 19 04:17:35 cassdb188 cassandra[17259]: Exception (org.apache.cassandra.io.FSReadError) encountered during startup: java.io.EOFException
Nov 19 04:17:35 cassdb188 cassandra[17259]: FSReadError in /var/lib/cassandra/hints/b22dfb1b-6a6e-44ce-9c7c-fda1e75293af-1542627895660-1.hints
Nov 19 04:17:35 cassdb188 cassandra[17259]: at org.apache.cassandra.hints.HintsDescriptor.readFromFile(HintsDescriptor.java:235)

The file that Cassandra was having problems with was a 0 byte hint file.

The following Stack Overflow post suggested that to resolve the problem you just needed to remove this file. We tried this solution and it worked.

Steps to Fix FSReadError Startup Problem

Stop Cassandra

systemctl stop cassandra

Move the suspect hint file into a temporary folder (just to be safe)

mv /var/lib/cassandra/hints/b22dfb1b-6a6e-44ce-9c7c-fda1e75293af-1542627895660-1.hints /tmp

Start Cassandra

systemctl start cassandra

Verify the error has stopped

journalctl -f -u cassandra

Now verify you can connect using CQLSH

# cqlsh
Connected to PointServiceClusterV3 at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>

Note: In our case this happened on a Cassandra instance in a test environment that had not been shutdown cleanly so there was no worry about data integrity. However, if this had happened on a node in a production environment I would recommend running nodetool repair on the node.

nodetool repair

I hope that helps!

Cassandra – Switching from SimpleStrategy to NetworkTopologyStrategy

When we started using Cassandra I setup clusters for our test, staging, and production environments. Then we created the initial keyspace for our application, added tables, started using them, and everything worked fine.

Later we decided that for our up-time requirements we wanted to have a second cluster in another data center to act as a hot fail-over on production. No problem, Cassandra has us covered. However, when we had originally created our application’s keyspace, it was created with the default replication strategy SimpleStrategy. For a fail-over cluster we need Cassandra to be configured with the NetworkTopologyStrategy. No big deal right, should be an easy fix?

After reading through the documentation on Changing keyspace replication strategy, I was left with one question:

“What do I use for the data center name?”.

With SimpleStrategy you specify the number of nodes to which you want each item replicated by specifying the parameter “replication_factor”, for example (‘replication_factor’ : 3) . However, when using NetworkTopologyStrategy you use the data center name to specify how many nodes you want to have copies of the data, for example (‘mydatacentername’, 3). I was worried that if I altered the strategy on one node then the cluster thought the node was not part of the same data center, this would cause some serious problems.

Fortunately, it turns out that Cassandra has a default data center name, “datacenter1”, which you can use when making this switch, kudos to the person who replied to this StackOverflow post.

Of course I was not going to try this switch out on any of our clusters until I was confident it would work. I setup a test cluster using SimpleStrategy with replication factor set to 3, added data to the cluster, ran a nodetool repair, then I altered the strategy for the keyspace, verified nothing had changed as expected, then I ran nodetool repair again, and once again verified all my data was intact. So it worked as promised.

Switch Replication Strategy

Note: In this example, the keyspace we are switching the replication strategy on is called “examplekeyspace”.

Open a cqlsh prompt on any node in the cluster

Check the current replication strategy

SELECT * FROM system_schema.keyspaces;

show_keyspaces_before

Verify the default data center name

SELECT data_center FROM system.local;

show_data_center_name

Alter the existing Keyspace

Alter the keyspace using the data center name (make sure you copy it exactly!) as the replication factor and set the number of nodes for replication to be the same as before.

ALTER KEYSPACE ExampleKeyspace WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter1': '3'};

Now if you check the keyspace on each node in the cluster you will see that the replication strategy is now NetworkTopologyStrategy.

SELECT * FROM system_schema.keyspaces;

alter_keyspace_network_topology

Nodetool Repair

Switching the replication strategy does not cause any data to be moved between nodes, you would need to run nodetool repair to do that. However, if all you are doing is switching an existing cluster with a single rack and datacenter from SimpleStrategy to NetworkTopologyStrategy it should not require any data be moved. But if you would like to be thorough it does not hurt to run a nodetool repair.

Run nodetool repair

nodetool repair -pr examplekeyspace

Using the option “pr – primary range only” means repair will only repair the keys that are known to the current node where repair is being run, and on other nodes where those keys are replicated. Make sure to run repair on each node, but only do ONE node at a time.

Conclusion

When I started using Cassandra I did not realize that for data replication how much of a limitation SimpleStrategy imposes. So if all you want is a single rack in a single datacenter, SimpleStrategy works, however if there is even the slightest possibility you might one day add a failover cluster in another data center or nodes in one data center but on multiple racks, use NetworkTopologyStrategy. Personally, for anything other than a local test cluster, I would always go with NetworkTopologyStrategy.

That is all!

Cassandra – Getting Started with Java

Cassandra is a great tool for storing time series data and I happen to be using it on my current project for that exact purpose.

There are several ways to use Cassandra from Java and many ways to improve performance, but here I just want to provide a simple “Getting Started” example. So here it is!

First, download the current version of Cassandra V3 from here.

Second, you can download the example code from GitHub here.

Extract the tar.gz file:

tar -zxvf apache-cassandra-3.11.5-bin.tar.gz

Change directory into the bin folder:

cd apache-cassandra-3.11.5/bin

Start Cassandra on Mac/Linux:

./cassandra -f

If you are using windows, I recommend opening a Powershell window and start Cassandra using:

cassandra.ps1 -f

Create a Java project, if using Maven, you can add the following dependencies to your pom.xml file:

<dependency>
    <groupId>com.datastax.cassandra</groupId>
    <artifactId>cassandra-driver-core</artifactId>
    <version>3.8.0</version>
</dependency>

Here is a simple Java example showing how to connect to Cassandra, create a keyspace, create a table, insert a row, and select a row:

import com.datastax.driver.core.*;
 
import java.time.Instant;
import java.time.ZoneId;
import java.util.Date;
import java.util.UUID;
 
public class CassandraV3Tutorial {
 
    private final static String KEYSPACE_NAME = "example_keyspace";
    private final static String REPLICATION_STRATEGY = "SimpleStrategy";
    private final static int REPLICATION_FACTOR = 1;
    private final static String TABLE_NAME = "example_table";
 
    public static void main(String[] args) {
 
        // Setup a cluster to your local instance of Cassandra
        Cluster cluster = Cluster.builder()
                .addContactPoint("localhost")
                .withPort(9042)
                .build();
 
        // Create a session to communicate with Cassandra
        Session session = cluster.connect();
 
        // Create a new Keyspace (database) in Cassandra
        String createKeyspace = String.format(
                "CREATE KEYSPACE IF NOT EXISTS %s WITH replication = " +
                        "{'class':'%s','replication_factor':%s};",
                KEYSPACE_NAME,
                REPLICATION_STRATEGY,
                REPLICATION_FACTOR
        );
        session.execute(createKeyspace);
 
        // Create a new table in our Keyspace
        String createTable = String.format(
                "CREATE TABLE IF NOT EXISTS %s.%s " + "" +
                        "(id uuid, timestamp timestamp, value double, " +
                        "PRIMARY KEY (id, timestamp)) " +
                        "WITH CLUSTERING ORDER BY (timestamp ASC);",
                KEYSPACE_NAME,
                TABLE_NAME
        );
        session.execute(createTable);
 
        // Create an insert statement to add a new item to our table
        PreparedStatement insertPrepared = session.prepare(String.format(
                "INSERT INTO %s.%s (id, timestamp, value) values (?, ?, ?)",
                KEYSPACE_NAME,
                TABLE_NAME
        ));
 
        // Some example data to insert
        UUID id = UUID.fromString("1e4d26ed-922a-4bd2-85cb-6357b202eda8");
        Date timestamp = Date.from(Instant.parse("2018-01-01T01:01:01.000Z"));
        double value = 123.45;
 
        // Bind the data to the insert statement and execute it
        BoundStatement insertBound = insertPrepared.bind(id, timestamp, value);
        session.execute(insertBound);
 
        // Create a select statement to retrieve the item we just inserted
        PreparedStatement selectPrepared = session.prepare(String.format(
                "SELECT id, timestamp, value FROM %s.%s WHERE id = ?",
                KEYSPACE_NAME,
                TABLE_NAME));
 
        // Bind the id to the select statement and execute it
        BoundStatement selectBound = selectPrepared.bind(id);
        ResultSet resultSet = session.execute(selectBound);
 
        // Print the retrieved data
        resultSet.forEach(row -> System.out.println(
                String.format("Id: %s, Timestamp: %s, Value: %s",
                row.getUUID("id"),
                row.getTimestamp("timestamp").toInstant().atZone(ZoneId.of("UTC")),
                row.getDouble("value"))));
 
        // Close session and disconnect from cluster
        session.close();
        cluster.close();
    }
}

If you would like to look at the data in your local Cassandra database, you can use the CQLSH command line tool.

So from the bin folder type:

./cqlsh

This will take you to a “cqlsh>” prompt:

Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.5 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>

To view all available Keyspaces:

DESCRIBE KEYSPACES;

You will now see our “example_keyspace” in the list:

cqlsh> DESCRIBE KEYSPACES;

system_schema  system    system_traces
system_auth    system_distributed    example_keyspace

To switch to that Keyspace:

USE example_keyspace;
cqlsh> USE example_keyspace;
cqlsh:example_keyspace>

To show all tables in the keyspace:

DESCRIBE TABLES;

You should see the new table “example_table”:

cqlsh:example_keyspace> DESCRIBE TABLES;

example_table

Now from the command line you can view the data in the table by using a select statement:

SELECT * FROM example_table;

Which will show the following information:

id                                    | timestamp                       | value
--------------------------------------+---------------------------------+-------
1e4d26ed-922a-4bd2-85cb-6357b202eda8 | 2018-01-01 01:01:01.000000+0000 | 123.45

I hope that helps!

Note: The documentation on the DataStax website is very good.