Redis Sentinel: Make your dataset highly available

In previous blog articles we talked about the basic Redis features and learned how to persist, backup and restore your dataset in case of a disaster scenario. Today we want to introduce you to a more complex setup. In fact, you can teach your Redis instances to be highly available for your clients.

At this point the Sentinel jumps in. So what is this Sentinel stuff? The Sentinel process is a Redis Instance which was started with the –sentinel Option (or redis-sentinel binary). It needs a configuration file that tells the Sentinel which Redis master it should monitor.

In short, these are the benefits of using Sentinel:

  • Monitoring: Sentinel constantly checks if the Redis master and its slave instances are working.
  • Notifications: It can notify the system administrators or other tools via an API if something happens to your Redis instances.
  • Automatic Failover: When Sentinel detects a failure of the master node it will start a failover where a slave is promoted to master. The additional slaves will be reconfigured automatically to use the new master. The application/clients that are using the Redis setup will be informed about the new address to use for the connection.
  • Configuration provider: Sentinel can be used for service discovery. That means clients can connect to the Sentinel in order to ask for the current address of your Redis master. After a failover Sentinel will provide the new address.

Configuration and example setup with three nodes

The Sentinel processes are a part of a distributed system. That means your Sentinel processes are working together. For a high availability setup we suggest using more than one Sentinel, as Sentinel itself should not be a single point of failure. It also improves the proper failure detection via quorum.

Before you deploy Sentinel, consider the following facts. More in-depth information on the most important points can be found below:

  • At least Three Sentinel instances are needed for a robust deployment.
  • Separate your Sentinel instances with different VMs or servers.
  • Due to the asynchronous replication of Redis the distributed setup does not guarantee that acknowledged writes are retained during failures.
  • Your client-library needs to support Sentinel.
  • Test your high availability setup from time to time in your test environment and even in production systems.
  • Sentinel, Docker or other NAT/Port Mapping technologies should only be mixed with care.

A three node setup is a good start, so run your Redis Master and two Slaves before setting up the Sentinel processes. On each of your Redis Hosts you create the same Sentinel config (/etc/redis-sentinel.conf, depending on your Linux distribution), like so:

Let’s dig deeper and see what these options do:

  1. This line tells Sentinel which master to monitor (myHAsetup). The Sentinel will find the host at 192.168.1.3:6379. The quorum for this setup is 2. The quorum is only used to detect failures: This number of Sentinels must agree about the fact that the master is not reachable. When a failure is detected, one Sentinel is elected as the leader who authorizes the failover. This happens when the majority of Sentinel processes vote for the leader.
  2. The time in milliseconds an instance is allowed be unreachable for a Sentinel (not answering to PINGs or replies with an error). After this time, the master is considered to be down.
  3. The timeout in milliseconds that Sentinel will wait after a failover before initiating a new failover.
  4. The number of slaves that can sync with the new master at the same time after a failover. The lower the number the longer the failover will need to complete. Using the slaves to serve old data to clients, you maybe don’t want to re-synchronize all slaves with the new master at the same time as there is a very short timeframe in which the slave stops while loading the bulk data from the master. In this case set it to 1.If this does not matter set it to the maximum of slaves that might be connected to the master.
  5. Listen IP, limited to one interface.

After you set up the Sentinel configuration for your instances, start it via init script, systemd unit or simply via its binary (redis-server /path/to/sentinel.conf –sentinel). The Sentinel processes will discover the master, slaves and other connected sentinels and the system is complete. Let’s see now what the setup looks like and what happens during a failover.

redis-sentinel

We have three Redis instances and three Sentinel instances:
M1 = Master
R1 = Replica 1 / Slave 1
R2 = Replica 2 / Slave 2
S1 = Sentinel 1
S2 = Sentinel 2
S3 = Sentinel 3

Let’s check the status of the Sentinel: Via the option -p 26379 you connect directly to the Sentinel API.

As you see the Sentinels are monitoring one master “myHAsetup”, their status is OK and you can see how many slaves and other Sentinels are discovered.

So far so good, everything looks fine. Now let’s see what happens when the master is unresponsive: We can simulate an outage by issuing the following command.

This produces the following log file:

Here a short summary of what happens

  1. Failure is detected
  2. Config-Version Epoch is increased by +1
  3. Leader is elected
  4. Quorum check. 3 Sentinels see the master down
  5. Delay for the next possible failover, after the current one
  6. – 10. Failover to the new master, reconfiguration of the slave nodes (old master 192.168.1.29 is already marked as slave and down at the moment)
  7. Old master comes back to the setup (after hanging in DEBUG for 30 seconds)
  8. Old master is converted to slave and synchronizes to the new master.

Another check with the Sentinel API confirms the new master:

At the end downtime sums up to around 7-8 seconds (6 to detect the failure + 1-2 seconds to fulfill the failover). With a stable network setup without flapping you might be able to reduce the failure detection from 6 seconds to 3 seconds, thus minimizing the downtime, during which no writes are accepted.

Maintainance

So how do you maintain such a setup? In general the procedure is similar to other high availability setups: Before you start, take a backup of your keyspace with an RDB snapshot and copy it to save place, e.g. by updating the redis.conf and increasing the ‚maxmemory‘ parameter or modifying the save parameter for the RDB snapshot engine.

  • stop Redis at one slave node
  • update the config file
  • start Redis
  • wait until the node is synchronized properly
  • repeat for the second slave
  • repeat for the master node (causing a failover)
  • change complete

Another example, where you want to update the redis-package:

  • check the change-log from the new Redis version, if there was no harmful change you can go on
  • backup your keyspace, best would be a RDB snapshot
  • stop Redis and Sentinel
  • copy both config files to a save place on the server
  • update the Redis package
  • copy both configs from save place to the config-dir (/etc/)
  • start Sentinel
  • check whether the updated Sentinel starts and is considered active by the other Sentinels
  • if not, check the logfile for the cause of the issue
  • if yes, start Redis and check the startup and logfile if everything looks fine
  • repeat for the other slave, and then for the master.
  • update complete

After these two examples we inspect the Sentinel with its API:

This is very simple – but shows that the Sentinel works properly. To bring this article to a close, let’s take a look at some advanced commands. They all follow roughly the same pattern:

Command Description
sentinel masters Show a list of all monitored masters + state
sentinel master <master name> Show the state of a specified master
sentinel slaves <master name> Show the slaves of the master + state
sentinel sentinels <master name> Show the Sentinel instances for this master + state
sentinel get-master-addr-by-name <master name> Return ip and port of the master. While a failover is in progress the ip and port of the promoted slave are returned
sentinel reset <pattern> Reset the masters with a matching name. Clears previous state for the master, removes every slave and sentinel discovered. A fresh discovery is started.
sentinel failover <master name> Force a failover, as if the master was not reachable
sentinel ckquorum <master name> Check if the current Sentinel configuration is able to reach quorum and majority
sentinel flushconfig Force Sentinel to rewrite it’s configuration on disk
sentinel monitor <name> <ip> <port> <quorum> During runtime, tell Sentinel to start monitoring a new master
sentinel remove <name> Remove a specified master from monitoring
sentinel set <name>  <option> <value> Similar to config set with Redis, you can change Sentinel options. All options at the Sentinel configuration file can also be set here

Conclusion

This was a short overview of the concept of Sentinel, the configuration and maintenance of the system. For more, have a look at the sentinel description page. After all there is lot more to discover.

Read on

Our upcoming articles will cover the creation of a Redis cluster and the advanced usage of the Redis CLI and monitoring capabilities. Until then have a look at our website to find out about the services we offer in data center automation, write an email to info@inovex.de for more information or call +49 721 619 021-0.

Join us!

Looking for a job where you can work with cutting edge technology on a daily basis? We’re currently hiring Linux Systems Engineers in Karlsruhe, Pforzheim, Munich, Cologne and Hamburg!

comments powered by Disqus