Staged service restart with Ansible

I’ve been working on a small project to create a Cassandra Cluster for Development purposes. I’m using Vagrant and Ansible to deploy a 5-node Cassandra Cluster and node #5 would always fail to join the cluster.

I checked /var/log/cassandra/cassandra.log and this is what I found;

INFO  [InternalResponseStage:1] 2017-09-09 18:49:07,673 ColumnFamilyStore.java:406 - Initializing system_auth.roles
INFO  [main] 2017-09-09 18:49:08,666 StorageService.java:1439 - JOINING: waiting for schema information to complete
ERROR [main] 2017-09-09 18:49:09,687 MigrationManager.java:172 - Migration task failed to complete
ERROR [main] 2017-09-09 18:49:10,688 MigrationManager.java:172 - Migration task failed to complete
INFO  [main] 2017-09-09 18:49:12,952 StorageService.java:1439 - JOINING: schema complete, ready to bootstrap
INFO  [main] 2017-09-09 18:49:12,952 StorageService.java:1439 - JOINING: waiting for pending range calculation
INFO  [main] 2017-09-09 18:49:12,952 StorageService.java:1439 - JOINING: calculation complete, ready to bootstrap
Exception (java.lang.UnsupportedOperationException) encountered during startup: Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true
java.lang.UnsupportedOperationException: Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true
	at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:902)
	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:681)
	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612)
	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:393)
	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600)
	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689)
ERROR [main] 2017-09-09 18:49:12,960 CassandraDaemon.java:706 - Exception encountered during startup
java.lang.UnsupportedOperationException: Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true
	at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:902) ~[apache-cassandra-3.11.0.jar:3.11.0]
	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:681) ~[apache-cassandra-3.11.0.jar:3.11.0]
	at org.apache.cassandra.service.StorageService.initServer(StorageService.java:612) ~[apache-cassandra-3.11.0.jar:3.11.0]
	at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:393) [apache-cassandra-3.11.0.jar:3.11.0]
	at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:600) [apache-cassandra-3.11.0.jar:3.11.0]
	at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:689) [apache-cassandra-3.11.0.jar:3.11.0]
INFO  [StorageServiceShutdownHook] 2017-09-09 18:49:12,988 HintsService.java:220 - Paused hints dispatch
WARN  [StorageServiceShutdownHook] 2017-09-09 18:49:12,989 Gossiper.java:1538 - No local state, state is in silent shutdown, or node hasn't joined, not announcing shutdown
INFO  [StorageServiceShutdownHook] 2017-09-09 18:49:12,989 MessagingService.java:984 - Waiting for messaging service to quiesce
INFO  [ACCEPT-/192.168.44.105] 2017-09-09 18:49:13,002 MessagingService.java:1338 - MessagingService has terminated the accept() thread
INFO  [StorageServiceShutdownHook] 2017-09-09 18:49:13,360 HintsService.java:220 - Paused hints dispatch

With the section of interest being;

Exception (java.lang.UnsupportedOperationException) encountered during startup: Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true
java.lang.UnsupportedOperationException: Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true

When I manually started the service it would join the cluster with no issues. There was clearly a timing issue here preventing the final node from joining the cassandra ring. I thought the solution might lie in using the serial ansible keyword but this is only applicable to the play, not the task level, and it didn’t have the level of control I wanted.

I found some discussion of the issue, on the ansible github, and adapted a workaround to include a sleep between each cassandra service start.

  - name: Staged Cassandra Service Start
    run_once: true
    with_items: "{{ play_hosts }}"
    delegate_to: "{{ item }}"
    shell: "sleep 60 && /usr/sbin/service cassandra start"
    when: deploy_mode == True

This makes clever use of the delegate_to to execute a sleep and service restart on each host. This staged execution of the cassandra service start allowed all nodes to join the ring successfully.


MySQL 5.7: root password is not in mysqld.log

I came across this issue today when working on an ansible playbook with MySQL 5.7. Old habits die hard and I was still trying to use mysql_install_db to initialise my instance. It seems a few others have been doing the same. The effect of using mysql_install_db in more recent version of MySQL is that we end up not knowing the root password. This is now set to a random value rather than being blank/unset. Nothing is logged to the mysqld.log file unless you use mysqld –initialize first;

Instead of using mysql_install_db we should be doing something like this;

  - name: Init MySQL
    command: mysqld --initialize --datadir=/var/lib/mysql
    args:
      creates: /var/lib/mysql/mysql/user.frm
    become_user: mysql

Now when searching for the root password we will find something in the error log;

sudo cat /var/log/mysqld.log | grep "temporary password"
2017-09-02T15:16:32.318530Z 1 [Note] A temporary password is generated for root@localhost: XXXXXXXX

We can login to the instance with the root user using this password;

mysql> show databases;
ERROR 1820 (HY000): You must reset your password using ALTER USER statement before executing this statement.

But we are clearly limited in what we can do. We are unable to read any tables or even view the databases. We must reset the password first. This bash one-liner will do that;

mysql -u root -p$(cat /var/log/mysqld.log | grep "temporary password" | rev | cut -d " " -f 1 | rev) -e "SET PASSWORD FOR root@localhost = 'BigSecret'" --connect-expired-password;

We can put this into an ansible task to continue with our automation;

  - name: Reset the root@localhost password
    shell: mysql -u root -p$(cat /var/log/mysqld.log | grep "temporary password" | rev | cut -d " " -f 1 | rev) -e "SET PASSWORD FOR root@localhost = 'BigSecret'" --connect-expired-password && touch /home/vagrant/root_pw_reset.success;
    args:
      creates: /home/vagrant/root_pw_reset.success

I’d recommend you put the bash line into a script and use the copy module to copy it to your host before executing it. It looks a whole lot tidier that way. Happy automating!


A Cassandra Cluster using Vagrant and Ansible

I’ve started a new project to create a Cassandra Cluster for development purposes. It’s available on my github and uses Vagrant, Ansible, and VirtualBox.

Assuming everything is installed it’s quite easy to get started;

git clone https://github.com/rhysmeister/CassandraCluster.git
cd CassandraCluster
vagrant up

Check the status of the machines;

vagrant status;
Current machine states:

cnode1                    running (virtualbox)
cnode2                    running (virtualbox)
cnode3                    running (virtualbox)
cnode4                    running (virtualbox)
cnode5                    running (virtualbox)

To access a node via ssh;

vagrant ssh cnode1;

One inside the host we can view the status of the Cassandra Cluster with nodetool;

[vagrant@cnode1 ~]$ nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.44.104  106.51 KiB  256          40.0%             b191d49f-822c-40d3-bde4-926c4494a707  rack1
UN  192.168.44.105  84.39 KiB  256          39.4%             2b7d5381-7121-46f4-8800-dad9fadc4c85  rack1
UN  192.168.44.101  104.06 KiB  256          39.2%             cd6d8ed2-d0c0-4c90-90a1-bda096b422e1  rack1
UN  192.168.44.102  69.98 KiB  256          41.4%             303c762c-351d-43a6-a910-9a2afa3ec2be  rack1
UN  192.168.44.103  109.04 KiB  256          40.1%             0023da19-7b3f-420b-a6b8-ace8b5118b0d  rack1

The Administrator credentials for Cassandra are set in the cassandra.yml file and can be modified.

See the following variables;

cassandra_admin_user
cassandra_admin_user_pwd


Using avahi / mDNS in a Vagrant project

I’m working on a project, with Vagrant and Ansible, to deploy a MongoDB Cluster. I needed name resolution to function between the VirtualBox VMs I was creating and didn’t want to hardcode anything in the hosts file. The solution I decided on uses avahi which essentially works like Apple Bonjour. As this solution has broader applications than just a MongoDB cluster I thought I’d share it here. The script is idempotent and is for Redhat/CentOS systems.

#!/bin/sh
set -u;
 
function is_installed() {
        PACKAGE="$1";
        yum list installed "$PACKAGE" >/dev/null ;
        return $?
}
 
is_installed epel-release || sudo yum install -y epel-release;
is_installed avahi-dnsconfd || sudo yum install -y avahi-dnsconfd;
is_installed avahi-tools || sudo yum install -y avahi-tools;
is_installed nss-mdns || sudo yum install -y nss-mdns;
sudo sed -i /etc/nsswitch.conf -e "/^hosts:*/c\hosts:\tfiles mdns4_minimal \[NOTFOUND=return\] dns myhostname"
sudo /bin/systemctl restart avahi-daemon.service;

Once installed on each host you should be able to ping the other nodes in the network. You can query the cache with the avahi-browse command to inspect the ip/hostname cache that has been built.

avahi-browse -acr

Example output;

+   eth1 IPv4 mongod6 [08:00:27:5b:4c:a8]                   Workstation          local
+   eth1 IPv4 mongod5 [08:00:27:6d:3d:80]                   Workstation          local
+   eth1 IPv4 mongod4 [08:00:27:1b:60:89]                   Workstation          local
+   eth1 IPv4 mongod3 [08:00:27:54:02:58]                   Workstation          local
+   eth1 IPv4 mongod2 [08:00:27:29:9a:bb]                   Workstation          local
+   eth1 IPv4 mongod1 [08:00:27:59:68:61]                   Workstation          local
+   eth1 IPv4 mongos3 [08:00:27:71:66:c9]                   Workstation          local
+   eth1 IPv4 mongos2 [08:00:27:18:1c:be]                   Workstation          local
+   eth1 IPv4 mongos1 [08:00:27:e5:53:33]                   Workstation          local
+   eth0 IPv4 mongos1 [52:54:00:47:46:52]                   Workstation          local
=   eth1 IPv4 mongod1 [08:00:27:59:68:61]                   Workstation          local
   hostname = [mongod1.local]
   address = [192.168.43.103]
   port = [9]
   txt = []
=   eth1 IPv4 mongos2 [08:00:27:18:1c:be]                   Workstation          local
   hostname = [mongos2.local]
   address = [192.168.43.101]
   port = [9]
   txt = []
=   eth1 IPv4 mongod5 [08:00:27:6d:3d:80]                   Workstation          local
   hostname = [mongod5.local]
   address = [192.168.43.107]
   port = [9]
   txt = []
=   eth1 IPv4 mongod3 [08:00:27:54:02:58]                   Workstation          local
   hostname = [mongod3.local]
   address = [192.168.43.105]
   port = [9]
   txt = []
=   eth1 IPv4 mongos3 [08:00:27:71:66:c9]                   Workstation          local
   hostname = [mongos3.local]
   address = [192.168.43.102]
   port = [9]
   txt = []
=   eth1 IPv4 mongos1 [08:00:27:e5:53:33]                   Workstation          local
   hostname = [mongos1.local]
   address = [192.168.43.100]
   port = [9]
   txt = []
=   eth0 IPv4 mongos1 [52:54:00:47:46:52]                   Workstation          local
   hostname = [mongos1.local]
   address = [10.0.2.15]
   port = [9]
   txt = []
=   eth1 IPv4 mongod6 [08:00:27:5b:4c:a8]                   Workstation          local
   hostname = [mongod6.local]
   address = [192.168.43.108]
   port = [9]
   txt = []
=   eth1 IPv4 mongod4 [08:00:27:1b:60:89]                   Workstation          local
   hostname = [mongod4.local]
   address = [192.168.43.106]
   port = [9]
   txt = []
=   eth1 IPv4 mongod2 [08:00:27:29:9a:bb]                   Workstation          local
   hostname = [mongod2.local]
   address = [192.168.43.104]
   port = [9]
   txt = []


Cassandra 3 Node Cluster Setup Notes

Install on each node

wget http://www-eu.apache.org/dist/cassandra/redhat/30x/cassandra-3.0.13-1.noarch.rpm
yum install jre
rpm -ivh cassandra-3.0.13-1.noarch.rpm
chkconfig cassandra on

Configuration changes on each node

vi /etc/cassandra/conf/cassandra.yaml

Customise the seeds / ip address for your environment

cluster_name: 'cassandra_cluster'
seeds: "192.168.65.120,192.168.65.121,192.168.65.122"
listen_address: 
rpc_address: 


Start the cassandra service on each node

service cassandra start
service cassandra status

If you get the following error;

org.apache.cassandra.exceptions.ConfigurationException: Saved cluster name Test Cluster != configured name cassandra_cluster

Then you need to reset the data folder (note this removes all data so take a backup if you're not sure).


rm -rf /var/lib/cassandra/data/system/*
service cassandra start
View the status of the cluster
nodetool status
Output will be similar to below;
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.65.120  108.04 KB  256          65.8%             92119740-cbf7-406a-9237-a1f4036e26e9  rack1
UN  192.168.65.121  166.99 KB  256          65.6%             13b5a4f8-6d98-481b-809e-f1a2ffd8ae94  rack1
UN  192.168.65.122  143.53 KB  256          68.6%             fe0068b2-2dca-403a-b5f2-93e827250bc5  rack1

Login with the command-line client

export CQLSH_HOST=$(hostname --ip-address)
cqlsh

Do some stuff;

cqlsh> CREATE KEYSPACE rhys WITH REPLICATION = {'class':'SimpleStrategy','replication_factor':2};
cqlsh> USE rhys;
cqlsh> CREATE TABLE rhys (empid int primary key, emp_first varchar, emp_last varchar, emp_dept varchar);
cqlsh> INSERT INTO rhys (empid, emp_first, emp_last, emp_dept) VALUES (1, 'Rhys', 'Campbell', 'ENT');

Enable authentication on each node

vi /etc/cassandra/conf/cassandra.conf

Change the option in this file on each node;

authenticator: PasswordAuthenticator

Restart each node;

service cassandra restart

Login to one node to update the cassandra admin user;

export CQLSH_HOST=$(hostname --ip-address)
cqlsh -u cassandra -p cassandra

Alter the replication factor for the system_auth namespace;

cqlsh> ALTER KEYSPACE "system_auth" WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'datacenter1': 3 };

Ensure the change is propogated through the system;

nodetool repair system_auth

Restart;

service cassandra restart

Create a new superuser

cqlsh -u cassandra -p cassandra
csqlsh> CREATE ROLE admin WITH PASSWORD = 'BigSecret' AND SUPERUSER = true AND LOGIN = true;
exit

Change the default user;

cqlsh -u ucid_admin -p BigSecret
cqlsh> ALTER ROLE cassandra WITH PASSWORD='xfvasdfvsxv3456456uyhnfdfgu657rt87ytygwe3456' AND SUPERUSER=false;

Change some settings to update the system roles;

vi /etc/cassandra/conf/cassandra.conf

Set to ten minutes refresh 5

roles_validity_in_ms: 600000
roles_update_interval_in_ms: 300000