Introduction
A multi-node Apache Cassandra database cluster contains two or more database nodes (servers) in different cloud locations. This setup prevents Single-Point of Failure (SPOF) in case some nodes in the cluster fail. For instance, consider a scenario where you're running a cluster with three nodes. If one node fails due to network, power, or software problems, the rest of the nodes in the cluster can continue accepting read and write requests to the connected clients.
In a Cassandra multi-node setup, there is no leader election, and all participating nodes equally perform similar functions. Depending on your target level of data consistency, you must declare a replication strategy when creating a keyspace in the cluster. The strategy determines how Cassandra stores replicas across participating nodes and the total number of replicas per keyspace.
In this guide, deploy a multi-node Apache Cassandra database cluster on a Ubuntu 22.04 server, and test communication between the cluster nodes by adding sample data to a shared keyspace.
Prerequisites
Before you begin, in a single Rcs location:
-
Deploy three Ubuntu 22.04 servers with at least 16 GB RAM for production.
-
Add the servers to the same Rcs Virtual Private Cloud (VPC).
A VPC is a logical VLAN connection between Rcs servers that allows Apache Cassandra nodes to communicate in real time. Enable VPC peering to configure servers deployed in a different location to sync with the Cassandra cluster.
In this article, the following example IP Addresses represent the respective Server VPC Addresses.
-
Server 1:
192.0.2.3
-
Server 2:
192.0.2.4
-
Server 3:
192.0.2.5
To view your server VPC address, run the following command in your SSH session.
$ ip a
-
On each server:
-
Use SSH to access the server in a separate terminal session.
-
Switch to the sudo user account.
# su example-user
Configure Server Identity
In this section, configure the hostname of every server for fast identification in the Apache Cassandra cluster. For this article, Set each of the servers with the hostnames below:
-
Server 1:
node-1
-
Server 2:
node-1
-
Server 3:
node-1
On each of the servers, set a new hostname identity as described in the steps below:
-
Using a text editor such as
Nano
, edit the/etc/hostname
file.$ sudo nano /etc/hostname
-
Replace the default hostname (
localhost
) with the respective cluster-ID.node-1
Save and close the file.
-
Open the
/etc/hosts
file.$ sudo nano /etc/hosts
-
Add the following line at the end of the file.
127.0.0.1 node-1
Save and close the file.
-
Reboot the server to apply changes.
$ sudo reboot
Set Up Firewall Rules
To communicate and send keep-alive packets to all nodes in the cluster, set up firewall rules to accept Cassandra requests from the respective VPC server addresses. By default, Uncomplicated Firewall (UFW) is active on Rcs Ubuntu servers, set up UFW rules on each server as described below.
Server 1:
-
Allow connections to port
7000
and9042
from Server 2 to Server 1.$ sudo ufw allow from 192.0.2.4 to 192.0.2.3 proto tcp port 7000,9042
-
Allow connections from Server 3 to Server 1.
$ sudo ufw allow from 192.0.2.5 to 192.0.2.3 proto tcp port 7000,9042
-
Reload firewall rules to save changes.
$ sudo ufw reload
Server 2:
-
Allow connections to port
7000
and9042
from Server 1 to Server 2.$ sudo ufw allow from 192.0.2.3 to 192.0.2.4 proto tcp port 7000,9042
-
Allow connections from Server 3 to Server 2.
$ sudo ufw allow from 192.0.2.5 to 192.0.2.4 proto tcp port 7000,9042
-
Reload firewall rules.
$ sudo ufw reload
Server 3:
-
Allow connections to port
7000
and9042
from Server 1 to Server 3.$ sudo ufw allow from 192.0.2.3 to 192.0.2.5 proto tcp port 7000,9042
-
Allow connections from Server 2 to Server 3.
$ sudo ufw allow from 192.0.2.4 to 192.0.2.5 proto tcp port 7000,9042
-
Reload firewall rules to apply changes.
$ sudo ufw reload
Configure the Apache Cassandra Cluster
In this section, set up the Apache Cassandra cluster information on each server to allow all nodes to communicate together. To set up the cluster information, make changes to the main configuration file /etc/cassandra/cassandra.yaml
on each server as described below.
-
Stop the Apache Cassandra service.
$ sudo systemctl stop cassandra
-
Clear existing Cassandra data directories to refresh the node state.
$ sudo rm -rf /var/lib/cassandra/*
-
Open the
/etc/cassandra/cassandra.yaml
file.$ sudo nano /etc/cassandra/cassandra.yaml
The Cassandra configuration file contains many directives, when using the
Nano
text editor, press CTRL + :keyW: to search a directive by string. -
Find
cluster_name:
, and change the value fromTest Cluster
toAppDbCluster
.... cluster_name: 'AppDbCluster' ...
The above directive sets the cluster name to
AppDbCluster
. Verify that all servers in the cluster share the same name for a successful connection. -
Find the
seed_provider:
directive, and enter the IP Addresses of other nodes in the format<ip>,<ip>
as below.seed_provider: ... # Ex: "<ip1>,<ip2>,<ip3>" - seeds: "192.0.2.3:7000, 192.0.2.4:7000"
The above directives set the addresses Apache Cassandra sends seed nodes to find cluster information. Seed nodes are keep-alive packets known as gossips in Cassandra. To enable high availability in the cluster, you must set at least two seed addresses.
-
Find the
listen_address:
directive, and set the value to your node's VPC Address.listen_address: 192.0.2.3
The above directive sets the IP address used to communicate with other nodes in the cluster. Cassandra nodes use the inter-nodal gossip protocol that implements a peer-to-peer architecture to exchange state information in the cluster.
-
Find
rpc_address:
, and set it to your node's VPC address.rpc_address: 192.0.2.3
The above directive sets the client connection address. No connections are from other interfaces or addresses.
-
At the end of the file, add the
auto_bootstrap:
directive, and set it tofalse
.auto_bootstrap: false
The above directive allows a node to join the Cassandra cluster without streaming data. When enabling it on existing clusters, set the value to
true
.Save and close the Cassandra configuration file.
-
Start the Cassandra database server.
$ sudo systemctl start cassandra
The Cassandra cluster may take up to 30 seconds to start.
-
Clear the Apache Cassandra data directories.
$ sudo rm -rf /var/lib/cassandra/*
-
Verify that the Cassandra server is running.
$ sudo systemctl status cassandra
Output:
cassandra.service - LSB: distributed storage system for structured data Loaded: loaded (/etc/init.d/cassandra; generated) Active: active (running) since Tue 2023-08-01 20:33:27 UTC; 2min 19s ago
Manage the Apache Cassandra Cluster
nodetool
is a utility used by the Apache Cassandra database server to manage view cluster information. On each of the server, use the tool to verify the cluster status.
-
Verify the database cluster status.
$ sudo nodetool status
The above command displays the IP addresses, state, load, and node IDs in the cluster as below.
Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.0.2.3 104.34 KiB 16 64.7% e29827ec-88b7-460c-98b4-88af145ec6ef rack1 UN 192.0.2.4 70.21 KiB 16 76.0% a8742739-2439-4716-a6d9-4689a8fc7ad1 rack1 UN 192.0.2.5 70.21 KiB 16 59.3% f37874ad-28ac-49ec-b90a-322fa3b3d26c rack1
-
On
node-3
, run the following command to view the IP list of remote seed nodes. -
Run the following command on
node-3
to get the IP list of the remote seed nodes.$ sudo nodetool getseeds
Output:
Current list of seed node IPs, excluding the current node's IP: /192.0.2.3:7000 /192.0.2.3:7000
As displayed in the above output, the active node's IP address isn't listed if the node is part of the seeds.
-
On
node-1
, Log in to the Cassandra cluster.$ cqlsh 192.0.2.3 9042 -u cassandra -p cassandra
Output:
Connected to AppDbCluster at 192.0.2.3:9042 [cqlsh 6.1.0 | Cassandra 4.1.3 | CQL spec 3.4.6 | Native protocol v5] Use HELP for help. cassandra@cqlsh>
-
To redistribute the default
system_auth
keyspace in the cluster, change thereplication_factor
to3
.cqlsh> ALTER KEYSPACE system_auth WITH replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor' : 3 };
The above command keeps three copies of the
system_auth
keyspace on each node. When set to1
, the cluster shuts down when a single replica of the keyspace isn't available.You should get a warning that requires a full cluster repair to redistribute data across all cluster nodes.
Warnings : When increasing the replication factor you need to run a full (-full) repair to distribute the data.
-
Exit the cluster.
cqlsh> QUIT;
-
Run the following cluster repair command to load the new replication strategy.
$ sudo nodetool repair --full system_auth
Output:
... Repair completed successfully
Add Sample Data to the Cassandra Cluster
-
On
node-1
, connect to the Cassandra cluster:$ cqlsh 192.0.2.3 9042 -u cassandra -p cassandra
-
Create a sample keyspace with a
replication_factor
of3
.cqlsh> CREATE KEYSPACE my_keyspace WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'replication_factor' : 3 };
-
Switch to the new keyspace.
cqlsh> USE my_keyspace;
-
Create a sample
customers
table with three columns.cqlsh:my_keyspace> CREATE TABLE customers ( customer_id UUID PRIMARY KEY, first_name TEXT, last_name TEXT );
-
Insert data to the
customers
table.cqlsh:my_keyspace> INSERT INTO customers (customer_id, first_name, last_name) VALUES (UUID(), 'JOHN', 'DOE'); INSERT INTO customers (customer_id, first_name, last_name) VALUES (UUID(), 'MARY', 'SMITH'); INSERT INTO customers (customer_id, first_name, last_name) VALUES (UUID(), 'ANN', 'JOB');
-
On
node-3
, connect to the Cassandra cluster.$ cqlsh 192.0.2.5 9042 -u cassandra -p cassandra
-
Switch to the new
my_keyspace
keyspace.qlsh> USE my_keyspace;
-
View the
customers
table data.cqlsh:my_keyspace> SELECT customer_id, first_name, last_name FROM customers;
Output:
customer_id | first_name | last_name --------------------------------------+------------+----------- 5c8f596a-20e0-43de-95fb-60c5039898fb | MARY | SMITH 39ff0f60-18a0-4e3f-b32c-e31c175ad5ad | JOHN | DOE 284c7b9e-c39e-47a2-aad9-2fd409af0bb9 | ANN | JOB (3 rows)
As displayed in the above output, the data entered on
node-1
replicates tonode-2
andnode-3
in real-time. -
Add a new row to the
customers
table.cqlsh:my_keyspace> INSERT INTO customers (customer_id, first_name, last_name) VALUES (UUID(), 'PETER', 'JOBS');
-
Exit the cluster.
cqlsh:my_keyspace> QUIT;
-
On
node-1
run aSELECT
statement and verify if the cluster replicates the last record you onnode-3
.cqlsh:my_keyspace> SELECT customer_id, first_name, last_name FROM customers;
Output:
customer_id | first_name | last_name --------------------------------------+------------+----------- 3c4d78e5-f2d5-4f3b-8651-fac121e51551 | JOHN | DOE d3cb1d3a-51e1-4989-8c67-22f372fd9bd1 | PETER | JOBS c6511b52-b250-46cb-a4a3-d2f08686f782 | ANN | JOB 736a1084-9ecb-4103-b22b-e1e9f09e40c8 | MARY | SMITH (4 rows)
-
Exit the cluster.
cqlsh:my_keyspace> QUIT;
Set the Consistency Level
In this section, use the QUORUM
consistency level to set a level participating nodes must achieve before the Cassandra cluster accepts read or write operations. By using enforcing a consistency level policy, you can restrict data availability and accuracy when nodes fail. The cluster uses the following formula to calculate the available quorum level.
(SUM_OF_ALL_REPLICATION_FACTORS / 2) + 1 ROUNDED DOWN TO THE NEAREST WHOLE NUMBER/INTEGER
When expanded, the above formula evaluates to 2
:
(3 / 2) + 1 = 1.5 + 1 = 2.5, then round down 2.5 to the nearest integer = 2
As listed in the above evaluation, any query you perform on the sample my_keyspace
keyspace requires 2
healthy nodes to respond. This is because the cluster consists of 3
nodes, and tolerates only 1
replica down. If you have 10
nodes, a sample key space with a replication_factor
of 10
tolerates 4
nodes down because (10 / 2) + 1
rounded down to the nearest integer is 6
.
To test the above consistency level logic, follow the steps below:
-
On
node-1
, stop the Apache Cassandra cluster.$ sudo systemctl stop cassandra
The above command sets the total available nodes to two with
node-2
andnode-3
actively running. -
On
node-2
, run the Cassandranodetool
to verify that one node is down and two nodes are running.$ sudo nodetool status
Output:
Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack DN 192.0.2.3 95.85 KiB 16 100.0% e29827ec-88b7-460c-98b4-88af145ec6ef rack1 UN 192.0.2.4 97.57 KiB 16 100.0% a8742739-2439-4716-a6d9-4689a8fc7ad1 rack1 UN 192.0.2.5 97.57 KiB 16 100.0% f37874ad-28ac-49ec-b90a-322fa3b3d26c rack1
The above output confirms that
node-1
(192.0.2.3) is down with theDN
status, and the rest of the nodes are running. -
On
node-2
, log in to the Cassandra Cluster.$ cqlsh 192.0.2.4 9042 -u cassandra -p cassandra
-
Set the consistency level to
QUORUM
.cqlsh> CONSISTENCY QUORUM;
Output:
Consistency level set to QUORUM.
-
Switch to the
my_keyspace
keyspace.cqlsh> USE my_keyspace;
-
View the
customers
table data.cqlsh:my_keyspace> SELECT customer_id, first_name, last_name FROM customers;
Output:
customer_id | first_name | last_name --------------------------------------+------------+----------- 3c4d78e5-f2d5-4f3b-8651-fac121e51551 | JOHN | DOE d3cb1d3a-51e1-4989-8c67-22f372fd9bd1 | PETER | JOBS c6511b52-b250-46cb-a4a3-d2f08686f782 | ANN | JOB 736a1084-9ecb-4103-b22b-e1e9f09e40c8 | MARY | SMITH (4 rows)
The above output displays table data because the majority of nodes
2
out of3
respond and there is a quorum. -
Exit the cluster.
> QUIT:
-
On
node-3
, stop the Apache Cassandra cluster.$ sudo systemctl stop cassandra
-
On
node-2
, access the cluster, and try viewing thecustomers
table data.cqlsh:my_keyspace> SELECT customer_id, first_name, last_name FROM customers;
Output:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.0.2.4:9042 datacenter1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'consistency\': \'QUORUM\', \'required_replicas\': 2, \'alive_replicas\': 1}')})
As displayed in the output, there is no quorum because the majority of nodes are down. This is because the cluster requires a majority of healthy nodes (2) as compared to (1).
Troubleshooting
If you encounter any error while setting up a multi-node Apache cluster, follow the troubleshooting steps below to fix any cluster issues. To view detailed errors, view the /var/log/cassandra/system.log
file.
- Dead Node Fix.
-
In case of server restarts, a node may fail to respond. To enable a dead node fix, edit the
cassandra-env.sh
as below.$ sudo nano cd /etc/cassandra/cassandra-env.sh
-
Add the following line at the end of the file. Replace
192.0.2.3
with your actual node address.JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=192.0.2.3"
-
Clear the cluster data directories.
$ sudo rm -rf /var/lib/cassandra/data/system/*
-
Verify the cluster status to verify that your node is up and running.
$ sudo nodetool status
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.
-
Verify that the Apache Cassandra database server is running.
$ sudo service cassandra status
-
Open the
/etc/cassandra/cassandra.yaml
, and verify that therpc_address:
directive points to your node address to accept client connections.rpc_address: 192.0.2.3
Connection error: ('Unable to connect to any servers', {'192.0.2.3:9042': ConnectionRefusedError(111, "Tried connecting to [('192.0.2.3', 9042)]. Last error: Connection refused")})
-
Verify the cluster status.
$ sudo nodetool status
-
Verify that Cassandra is running.
$ sudo service cassandra status
-
Verify that the
listen_address:
directive points to your node address in your configuration file.$ sudo nano /etc/cassandra/cassandra.yaml
Conclusion
In this guide, you have deployed a multi-node Apache Cassandra cluster on a Ubuntu 22.04 server. Creating a multi-node Apache Cassandra cluster ensures high availability and improves the reliability of applications that use the cluster. For more information, visit the Apache Cassandra documentation.