Introduction
A multi-node Apache Cassandra database cluster contains two or more database nodes (servers) in different cloud locations. This setup prevents Single-Point of Failure (SPOF) in case some nodes in the cluster fail. For instance, consider a scenario where you're running a cluster with three nodes. If one node fails due to network, power, or software problems, the rest of the nodes in the cluster can continue accepting read and write requests to the connected clients.
In a Cassandra multi-node setup, there is no leader election, and all participating nodes equally perform similar functions. Depending on your target level of data consistency, you must declare a replication strategy when creating a keyspace in the cluster. The strategy determines how Cassandra stores replicas across participating nodes and the total number of replicas per keyspace.
In this guide, deploy a multi-node Apache Cassandra database cluster on a Ubuntu 22.04 server, and test communication between the cluster nodes by adding sample data to a shared keyspace.
Prerequisites
Before you begin, in a single Rcs location:
-
Deploy three Ubuntu 22.04 servers with at least 16 GB RAM for production.
-
Add the servers to the same Rcs Virtual Private Cloud (VPC).
A VPC is a logical VLAN connection between Rcs servers that allows Apache Cassandra nodes to communicate in real time. Enable VPC peering to configure servers deployed in a different location to sync with the Cassandra cluster.
In this article, the following example IP Addresses represent the respective Server VPC Addresses.
-
Server 1:
192.0.2.3 -
Server 2:
192.0.2.4 -
Server 3:
192.0.2.5
To view your server VPC address, run the following command in your SSH session.
$ ip a -
On each server:
-
Use SSH to access the server in a separate terminal session.
-
Switch to the sudo user account.
# su example-user
Configure Server Identity
In this section, configure the hostname of every server for fast identification in the Apache Cassandra cluster. For this article, Set each of the servers with the hostnames below:
-
Server 1:
node-1 -
Server 2:
node-1 -
Server 3:
node-1
On each of the servers, set a new hostname identity as described in the steps below:
-
Using a text editor such as
Nano, edit the/etc/hostnamefile.$ sudo nano /etc/hostname -
Replace the default hostname (
localhost) with the respective cluster-ID.node-1Save and close the file.
-
Open the
/etc/hostsfile.$ sudo nano /etc/hosts -
Add the following line at the end of the file.
127.0.0.1 node-1Save and close the file.
-
Reboot the server to apply changes.
$ sudo reboot
Set Up Firewall Rules
To communicate and send keep-alive packets to all nodes in the cluster, set up firewall rules to accept Cassandra requests from the respective VPC server addresses. By default, Uncomplicated Firewall (UFW) is active on Rcs Ubuntu servers, set up UFW rules on each server as described below.
Server 1:
-
Allow connections to port
7000and9042from Server 2 to Server 1.$ sudo ufw allow from 192.0.2.4 to 192.0.2.3 proto tcp port 7000,9042 -
Allow connections from Server 3 to Server 1.
$ sudo ufw allow from 192.0.2.5 to 192.0.2.3 proto tcp port 7000,9042 -
Reload firewall rules to save changes.
$ sudo ufw reload
Server 2:
-
Allow connections to port
7000and9042from Server 1 to Server 2.$ sudo ufw allow from 192.0.2.3 to 192.0.2.4 proto tcp port 7000,9042 -
Allow connections from Server 3 to Server 2.
$ sudo ufw allow from 192.0.2.5 to 192.0.2.4 proto tcp port 7000,9042 -
Reload firewall rules.
$ sudo ufw reload
Server 3:
-
Allow connections to port
7000and9042from Server 1 to Server 3.$ sudo ufw allow from 192.0.2.3 to 192.0.2.5 proto tcp port 7000,9042 -
Allow connections from Server 2 to Server 3.
$ sudo ufw allow from 192.0.2.4 to 192.0.2.5 proto tcp port 7000,9042 -
Reload firewall rules to apply changes.
$ sudo ufw reload
Configure the Apache Cassandra Cluster
In this section, set up the Apache Cassandra cluster information on each server to allow all nodes to communicate together. To set up the cluster information, make changes to the main configuration file /etc/cassandra/cassandra.yaml on each server as described below.
-
Stop the Apache Cassandra service.
$ sudo systemctl stop cassandra -
Clear existing Cassandra data directories to refresh the node state.
$ sudo rm -rf /var/lib/cassandra/* -
Open the
/etc/cassandra/cassandra.yamlfile.$ sudo nano /etc/cassandra/cassandra.yamlThe Cassandra configuration file contains many directives, when using the
Nanotext editor, press CTRL + :keyW: to search a directive by string. -
Find
cluster_name:, and change the value fromTest ClustertoAppDbCluster.... cluster_name: 'AppDbCluster' ...The above directive sets the cluster name to
AppDbCluster. Verify that all servers in the cluster share the same name for a successful connection. -
Find the
seed_provider:directive, and enter the IP Addresses of other nodes in the format<ip>,<ip>as below.seed_provider: ... # Ex: "<ip1>,<ip2>,<ip3>" - seeds: "192.0.2.3:7000, 192.0.2.4:7000"The above directives set the addresses Apache Cassandra sends seed nodes to find cluster information. Seed nodes are keep-alive packets known as gossips in Cassandra. To enable high availability in the cluster, you must set at least two seed addresses.
-
Find the
listen_address:directive, and set the value to your node's VPC Address.listen_address: 192.0.2.3The above directive sets the IP address used to communicate with other nodes in the cluster. Cassandra nodes use the inter-nodal gossip protocol that implements a peer-to-peer architecture to exchange state information in the cluster.
-
Find
rpc_address:, and set it to your node's VPC address.rpc_address: 192.0.2.3The above directive sets the client connection address. No connections are from other interfaces or addresses.
-
At the end of the file, add the
auto_bootstrap:directive, and set it tofalse.auto_bootstrap: falseThe above directive allows a node to join the Cassandra cluster without streaming data. When enabling it on existing clusters, set the value to
true.Save and close the Cassandra configuration file.
-
Start the Cassandra database server.
$ sudo systemctl start cassandraThe Cassandra cluster may take up to 30 seconds to start.
-
Clear the Apache Cassandra data directories.
$ sudo rm -rf /var/lib/cassandra/* -
Verify that the Cassandra server is running.
$ sudo systemctl status cassandraOutput:
cassandra.service - LSB: distributed storage system for structured data Loaded: loaded (/etc/init.d/cassandra; generated) Active: active (running) since Tue 2023-08-01 20:33:27 UTC; 2min 19s ago
Manage the Apache Cassandra Cluster
nodetool is a utility used by the Apache Cassandra database server to manage view cluster information. On each of the server, use the tool to verify the cluster status.
-
Verify the database cluster status.
$ sudo nodetool statusThe above command displays the IP addresses, state, load, and node IDs in the cluster as below.
Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.0.2.3 104.34 KiB 16 64.7% e29827ec-88b7-460c-98b4-88af145ec6ef rack1 UN 192.0.2.4 70.21 KiB 16 76.0% a8742739-2439-4716-a6d9-4689a8fc7ad1 rack1 UN 192.0.2.5 70.21 KiB 16 59.3% f37874ad-28ac-49ec-b90a-322fa3b3d26c rack1 -
On
node-3, run the following command to view the IP list of remote seed nodes. -
Run the following command on
node-3to get the IP list of the remote seed nodes.$ sudo nodetool getseedsOutput:
Current list of seed node IPs, excluding the current node's IP: /192.0.2.3:7000 /192.0.2.3:7000As displayed in the above output, the active node's IP address isn't listed if the node is part of the seeds.
-
On
node-1, Log in to the Cassandra cluster.$ cqlsh 192.0.2.3 9042 -u cassandra -p cassandraOutput:
Connected to AppDbCluster at 192.0.2.3:9042 [cqlsh 6.1.0 | Cassandra 4.1.3 | CQL spec 3.4.6 | Native protocol v5] Use HELP for help. cassandra@cqlsh> -
To redistribute the default
system_authkeyspace in the cluster, change thereplication_factorto3.cqlsh> ALTER KEYSPACE system_auth WITH replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor' : 3 };The above command keeps three copies of the
system_authkeyspace on each node. When set to1, the cluster shuts down when a single replica of the keyspace isn't available.You should get a warning that requires a full cluster repair to redistribute data across all cluster nodes.
Warnings : When increasing the replication factor you need to run a full (-full) repair to distribute the data. -
Exit the cluster.
cqlsh> QUIT; -
Run the following cluster repair command to load the new replication strategy.
$ sudo nodetool repair --full system_authOutput:
... Repair completed successfully
Add Sample Data to the Cassandra Cluster
-
On
node-1, connect to the Cassandra cluster:$ cqlsh 192.0.2.3 9042 -u cassandra -p cassandra -
Create a sample keyspace with a
replication_factorof3.cqlsh> CREATE KEYSPACE my_keyspace WITH REPLICATION = { 'class': 'NetworkTopologyStrategy', 'replication_factor' : 3 }; -
Switch to the new keyspace.
cqlsh> USE my_keyspace; -
Create a sample
customerstable with three columns.cqlsh:my_keyspace> CREATE TABLE customers ( customer_id UUID PRIMARY KEY, first_name TEXT, last_name TEXT ); -
Insert data to the
customerstable.cqlsh:my_keyspace> INSERT INTO customers (customer_id, first_name, last_name) VALUES (UUID(), 'JOHN', 'DOE'); INSERT INTO customers (customer_id, first_name, last_name) VALUES (UUID(), 'MARY', 'SMITH'); INSERT INTO customers (customer_id, first_name, last_name) VALUES (UUID(), 'ANN', 'JOB'); -
On
node-3, connect to the Cassandra cluster.$ cqlsh 192.0.2.5 9042 -u cassandra -p cassandra -
Switch to the new
my_keyspacekeyspace.qlsh> USE my_keyspace; -
View the
customerstable data.cqlsh:my_keyspace> SELECT customer_id, first_name, last_name FROM customers;Output:
customer_id | first_name | last_name --------------------------------------+------------+----------- 5c8f596a-20e0-43de-95fb-60c5039898fb | MARY | SMITH 39ff0f60-18a0-4e3f-b32c-e31c175ad5ad | JOHN | DOE 284c7b9e-c39e-47a2-aad9-2fd409af0bb9 | ANN | JOB (3 rows)As displayed in the above output, the data entered on
node-1replicates tonode-2andnode-3in real-time. -
Add a new row to the
customerstable.cqlsh:my_keyspace> INSERT INTO customers (customer_id, first_name, last_name) VALUES (UUID(), 'PETER', 'JOBS'); -
Exit the cluster.
cqlsh:my_keyspace> QUIT; -
On
node-1run aSELECTstatement and verify if the cluster replicates the last record you onnode-3.cqlsh:my_keyspace> SELECT customer_id, first_name, last_name FROM customers;Output:
customer_id | first_name | last_name --------------------------------------+------------+----------- 3c4d78e5-f2d5-4f3b-8651-fac121e51551 | JOHN | DOE d3cb1d3a-51e1-4989-8c67-22f372fd9bd1 | PETER | JOBS c6511b52-b250-46cb-a4a3-d2f08686f782 | ANN | JOB 736a1084-9ecb-4103-b22b-e1e9f09e40c8 | MARY | SMITH (4 rows) -
Exit the cluster.
cqlsh:my_keyspace> QUIT;
Set the Consistency Level
In this section, use the QUORUM consistency level to set a level participating nodes must achieve before the Cassandra cluster accepts read or write operations. By using enforcing a consistency level policy, you can restrict data availability and accuracy when nodes fail. The cluster uses the following formula to calculate the available quorum level.
(SUM_OF_ALL_REPLICATION_FACTORS / 2) + 1 ROUNDED DOWN TO THE NEAREST WHOLE NUMBER/INTEGER
When expanded, the above formula evaluates to 2:
(3 / 2) + 1 = 1.5 + 1 = 2.5, then round down 2.5 to the nearest integer = 2
As listed in the above evaluation, any query you perform on the sample my_keyspace keyspace requires 2 healthy nodes to respond. This is because the cluster consists of 3 nodes, and tolerates only 1 replica down. If you have 10 nodes, a sample key space with a replication_factor of 10 tolerates 4 nodes down because (10 / 2) + 1 rounded down to the nearest integer is 6.
To test the above consistency level logic, follow the steps below:
-
On
node-1, stop the Apache Cassandra cluster.$ sudo systemctl stop cassandraThe above command sets the total available nodes to two with
node-2andnode-3actively running. -
On
node-2, run the Cassandranodetoolto verify that one node is down and two nodes are running.$ sudo nodetool statusOutput:
Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack DN 192.0.2.3 95.85 KiB 16 100.0% e29827ec-88b7-460c-98b4-88af145ec6ef rack1 UN 192.0.2.4 97.57 KiB 16 100.0% a8742739-2439-4716-a6d9-4689a8fc7ad1 rack1 UN 192.0.2.5 97.57 KiB 16 100.0% f37874ad-28ac-49ec-b90a-322fa3b3d26c rack1The above output confirms that
node-1(192.0.2.3) is down with theDNstatus, and the rest of the nodes are running. -
On
node-2, log in to the Cassandra Cluster.$ cqlsh 192.0.2.4 9042 -u cassandra -p cassandra -
Set the consistency level to
QUORUM.cqlsh> CONSISTENCY QUORUM;Output:
Consistency level set to QUORUM. -
Switch to the
my_keyspacekeyspace.cqlsh> USE my_keyspace; -
View the
customerstable data.cqlsh:my_keyspace> SELECT customer_id, first_name, last_name FROM customers;Output:
customer_id | first_name | last_name --------------------------------------+------------+----------- 3c4d78e5-f2d5-4f3b-8651-fac121e51551 | JOHN | DOE d3cb1d3a-51e1-4989-8c67-22f372fd9bd1 | PETER | JOBS c6511b52-b250-46cb-a4a3-d2f08686f782 | ANN | JOB 736a1084-9ecb-4103-b22b-e1e9f09e40c8 | MARY | SMITH (4 rows)The above output displays table data because the majority of nodes
2out of3respond and there is a quorum. -
Exit the cluster.
> QUIT: -
On
node-3, stop the Apache Cassandra cluster.$ sudo systemctl stop cassandra -
On
node-2, access the cluster, and try viewing thecustomerstable data.cqlsh:my_keyspace> SELECT customer_id, first_name, last_name FROM customers;Output:
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 192.0.2.4:9042 datacenter1>: Unavailable('Error from server: code=1000 [Unavailable exception] message="Cannot achieve consistency level QUORUM" info={\'consistency\': \'QUORUM\', \'required_replicas\': 2, \'alive_replicas\': 1}')})As displayed in the output, there is no quorum because the majority of nodes are down. This is because the cluster requires a majority of healthy nodes (2) as compared to (1).
Troubleshooting
If you encounter any error while setting up a multi-node Apache cluster, follow the troubleshooting steps below to fix any cluster issues. To view detailed errors, view the /var/log/cassandra/system.log file.
- Dead Node Fix.
-
In case of server restarts, a node may fail to respond. To enable a dead node fix, edit the
cassandra-env.shas below.$ sudo nano cd /etc/cassandra/cassandra-env.sh -
Add the following line at the end of the file. Replace
192.0.2.3with your actual node address.JVM_OPTS="$JVM_OPTS -Dcassandra.replace_address=192.0.2.3" -
Clear the cluster data directories.
$ sudo rm -rf /var/lib/cassandra/data/system/* -
Verify the cluster status to verify that your node is up and running.
$ sudo nodetool status
nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.
-
Verify that the Apache Cassandra database server is running.
$ sudo service cassandra status -
Open the
/etc/cassandra/cassandra.yaml, and verify that therpc_address:directive points to your node address to accept client connections.rpc_address: 192.0.2.3
Connection error: ('Unable to connect to any servers', {'192.0.2.3:9042': ConnectionRefusedError(111, "Tried connecting to [('192.0.2.3', 9042)]. Last error: Connection refused")})
-
Verify the cluster status.
$ sudo nodetool status -
Verify that Cassandra is running.
$ sudo service cassandra status -
Verify that the
listen_address:directive points to your node address in your configuration file.$ sudo nano /etc/cassandra/cassandra.yaml
Conclusion
In this guide, you have deployed a multi-node Apache Cassandra cluster on a Ubuntu 22.04 server. Creating a multi-node Apache Cassandra cluster ensures high availability and improves the reliability of applications that use the cluster. For more information, visit the Apache Cassandra documentation.