Notice:
This post is older than 5 years – the content might be outdated.
Apache Cassandra is a really impressive piece of technology. When it comes to extreme performance requirements, it is definitely a solution one should look into. Yet many times people get bitten by the distributed nature and the fundamentally different process of data modelling, especially when coming from a relational background. Just reading the documentation probably won’t be enough to really get to know Cassandra. You will have to get your hands dirty. In this blog post we present a simple way to create your Cassandra cluster and experiment with data modeling, different configurations, cluster sizes, topologies, query performance etc.
Getting started with Cassandra is pretty easy. There is plenty of excellent documentation on the DataStax website and loads of videos and other resources for operations, data modelling, performance optimization, etc. You don’t even have to bother installing it at all, just go to planetcassandra.org and start exploring Cassandra in the browser. How cool is that?
Even though Cassandra has become much more stable and user-friendly than it used to be, there are still many challenges you will face, when running and using Cassandra. Whether you are a developer or on the operations team, you will have to get your head around a few concepts that make Cassandra work and that will influence the way you design your system.
So when it comes to real experimenting, testing and development, you might want to have a little bit more than a web cqlsh at hand. For that purpose we have created cassandra-test-lab. Let’s check out how it works.
Overview
Our Cassandra Test Lab is a basic cluster setup using virtual machines provisioned by vagrant and puppet. It gets you up and running in a matter of minutes. There are many parameters that let you define and tweak your Cassandra testing environment like the number of nodes, their dimensions, the Cassandra configuration etc. You could bring up additional nodes or let nodes „crash“ to see how your cluster performs. We are using Cassandra Test Lab for early stage development and operation experiments to gain an initial understanding of the required data model, cluster performance and other properties.
Setup
As we are using vagrant and VirtualBox, the setup is platform independent. It does work on Linux, Windows and Mac.
- install VirtualBox
- install vagrant
git clone git@github.com:inovex/cassandra-test-lab.git
cd cassandra-test-lab/
vagrant up
This will download a debian wheezy image, fire up the VMs and provision them installing java, Cassandra, a few administration tools. On the first run it might take a few minutes.
If everything goes smoothly, you should end up with three VMs including two Cassandra nodes (node-1 and node-2) and one OpsCenter node (node-5). Congratulations!
Try it out
With linux or mac just run vagrant ssh <name>
to connect to your Cassandra nodes, e.g. vagrant ssh node-1
. If you are using putty on windows, you will have to convert the private keys via puttygen and configure your ssh connection accordingly. The machines are reachable via IP addresses 192.168.56.1x
where x is the number of the node.
The easiest way to start using Cassandra is via cqlsh. Just connect to any node and type:
> cqlsh localhost
You can now explore the magic of CQL.
Create a keyspace for example
cqlsh> CREATE KEYSPACE test_ks WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': 2, 'DC2': 2};
And a table:
cqlsh> USE test_ks;
cqlsh:test_ks> CREATE TABLE users (username varchar, name varchar, location varchar, primary key(username));
And now insert data:
cqlsh:test_ks> INSERT INTO users(username, name, location) values('iigorr', 'Igor Lankin', 'Karlsruhe');
cqlsh:test_ks> select * from users;
1 2 3 4 5 6 7 |
username | location | name ---------+-----------+------------- iigorr | Karlsruhe | Igor Lankin (1 rows) |
To see what’s really going on change to trace mode and try again.
cqlsh:test_ks> tracing on; select * from users;
You will get a beautiful trace log of all nodes involved.
How about trying out other consistency levels:
cqlsh:test_ks> CONSISTENCY TWO; select * from users;
The trace information will be somewhat different here. See if you can understand what is going on there.
Start coding
If you want to start developing against it just add one or more nodes as contact nodes. In C#, for example, I am using the DataStax .NET Driver for Apache Cassandra (via NuGet) and just write
1 2 3 4 5 6 7 8 9 10 11 |
var cluster = Cluster.Builder() .AddContactPoint("192.168.56.11") .AddContactPoint("192.168.56.12") .Build(); var session = cluster.Connect("test_ks"); session.Execute("insert into users (username, name, location) values ('jdoe', 'John Doe', 'unknown')"); |
Happy Coding! But make sure to create your data model wisely!
Ops Center
By default node-5 is not configured to be part of the cluster (although it could be, see site.pp). Instead it runs the DataStax OpsCenter, which is a administration web console for basic monitoring and administration of cassandra clusters. It came up by default, when you ran „vagrant up“.
With your browser connect to the OpsCenter: . OpsCenter will ask you to „Create Brand New Cluster“ or „Manage Existing cluster“. We have an existing cluster already, so we choose that and add our existing nodes to the list:
192.168.56.11
and 192.168.56.12
.
Having done that, OpsCenter will go on and contact the datastatx-agents on the cluster nodes and display some information about your cluster. It should look something like this:
Go ahead and explore the monitoring and administrations functions. For example go to the „Nodes“ Menu and select one of the Nodes from the list view. Stop one of the nodes for example using the „Stop“ action from the Actions menu. After a confirmation OpsCenter will stop the Cassandra service and mark the node as unreachable. You can start the node just the same way from the „Actions“ dropdown menu.
You don’t necessarily need OpsCenter to operate a Cassandra cluster. In fact you should really get familiar with your command line tools like nodetool, cassandra-stress, sysstat toos, etc. But OpsCenter can be a helpful tool on your tool belt.
Changing cluster topologies
Cassandra nodes are organized into data centers and racks, which don’t necessarily have to correspond physical locations. They rather influence how data is replicated and operations are performed with different consistency levels ([1], [2]).
By default only two cluster nodes are brought up when you run „vagrant up“. However there are four nodes configured, and you could actually run any number of nodes your memory allows you to. But note that a Cassandra node will require at least ~2GB of memory to even start, so top up your RAM for big clusters.
The first two nodes belong to the virtual data center „DC1“ and the rack „RAC1“. node-3 and node-4 are in „DC2/RAC1“, but they are not brought up by default. To start the other two nodes, run vagrant up node-3 node-4
. If you need more nodes you’ll have to add them to the Vagrantfile as well as configure the per-node config in puppet/data/nodes, so the nodes know in which data center and rack they live.
Watch the new data center and the new nodes appear in OpsCenter. If you want to play around with different topologies, just change the node configuration under puppet/data/nodes/. For more information about topologies and how to change them in a running cluster check the DataStax docs.
Conclusion
Our Cassandra Test Lab provides an easy-to-use testing environment, whether you are trying to get familiar with Cassandra, trying out data models or checking if your query will break something. Just follow the instructions and have fun exploring. If you mess something up, just vagrant destroy
the vms and start over. Should there be any problems with the vagrant setup, post a comment or file an issue on github. Have fun and stay tuned for more posts on Cassandra.
We’re hiring!
Looking for a change? We’re hiring Linux Systems Engineers skilled in Apache Cassandra, Salt, Varnish, HAProxy and various Linux distributions such as Red Hat and Debian. Apply now!
Get in touch
For all your database needs visit our website, drop us an Email at list-blog@inovex.de or call +49 721 619 021-0.