Skip to content

Redis - How to remove a node in handshake state?

Published: at 12:00 AM

There could be many scenarios where some of the nodes in a Redis cluster are removed. Eventually, transitioning to a fail state. When you try to forget these nodes from a cluster that has many nodes, you might notice they re-appear in handshake state.

Bunch of things to note here which might not be intuitive

1. Forget has to be run on all nodes

Just running cluster forget <node-id> on one of the cluster nodes will not forget the failed node from the cluster. It has to be run on all the nodes for the change to take place

2. Handshake state

When you forget a failed node from a random node in the cluster, it might come back with a different node id. The reason for this is the node is still known(on other nodes) to the cluster and comes back through gossip

3. Run forget on all nodes

So, an intuitive solution to this problem is to run forget on all nodes. But, there is a catch here. If you start running it sequentially on these nodes through a script mentioned(below) in github issue you’ll notice the nodes are still there.

nodes_addrs=$(redis-cli -h $1 -p $2 cluster nodes|grep -v handshake| awk '{print $2}')
echo $nodes_addrs
for addr in ${nodes_addrs[@]}; do
    host=${addr%:*}
    port=${addr#*:}
    del_nodeids=$(redis-cli -h $host -p $port cluster nodes|grep -E 'handshake|fail'| awk '{print $1}')
    for nodeid in ${del_nodeids[@]}; do
        echo $host $port $nodeid
        redis-cli -h $host -p $port cluster forget $nodeid
    done
done

Also, if you are using the node id mentioned for a handshake state node obtained through cluster nodes command, it will comeback again with a random node id again

4. Run forget on all nodes in under a minute

If you don’t run forget on all nodes in under a minute, cluster will not forget that node. To avoid this, make sure to run it under this limit on all nodes. One way to do that would be to use an ansible script with free strategy to execute forget on all nodes at once

5. Use only the fail state node id

While doing a cluster forget, don’t worry about the handshake node ids. Instead find the node id for the same IP where the state is fail on any of the nodes. Handshake state entries will automatically disappear when you do this.

That’s it! Follow me Twitter/LinkedIn for more. Thanks!