Linux Server checks with Goss

I’ve been playing a little with goss recently. Goss is similar to TestInfra in that it allows you to write tests to validate your infrastructure. Goss uses yaml to specify the expected state rather than python code unittests like Testinfra. It also has a couple of other interesting features making it stand out from the crowd…

The first is a test auto-generation feature. Have a service you want to monitor? Simply run this…

goss autoadd nagios

Goss will autodiscover various things about that service. Here’s what it found out about the Nagios service….

package:
  nagios:
    installed: true
    versions:
    - 4.4.3
service:
  nagios:
    enabled: true
    running: true
user:
  nagios:
    exists: true
    uid: 999
    gid: 998
    groups:
    - nagios
    home: /var/spool/nagios
    shell: /sbin/nologin
group:
  nagios:
    exists: true
    gid: 998
process:
  nagios:
    running: true

The state of the nagios service can then be validated with…

goss validate
.............

Total Duration: 0.037s
Count: 13, Failed: 0, Skipped: 0

We can also setup a http health endpoint…

goss serve &
2019/08/14 15:05:42 Starting to listen on: :8080

When we hit the endpoint the tests are executed…

curl http://localhost:8080/healthz
2019/08/14 15:06:48 127.0.0.1:42792: requesting health probe
2019/08/14 15:06:48 127.0.0.1:42792: Stale cache, running tests
.............

Total Duration: 0.036s
Count: 13, Failed: 0, Skipped: 0

We can also run the tests as a Nagios check…

goss validate --format nagios
GOSS OK - Count: 13, Failed: 0, Skipped: 0, Duration: 0.032s

Note the execution speeds above. Goss outperforms python-based testing tools like TestInfra by a significant margin.

Wait for processes to end with Ansible

I’ve been doing a lot in stuff in ansible recently where I needed to fire up, kill and relaunch a bunch of processes. I wanted to find a quick and reliable way of managing this…

This is possible using a combination of the pids and wait_for modules…

First get the pids of your process…

- name: Getting pids for mongod
  pids:
      name: mongod
  register: pids_of_mongod

The pids module returns a list with which we can iterate over with with_items.Then we can use the wait_for task and the /proc filesystem to ensure all the processes have exited…

- name: Wait for all mongod processes to exit
  wait_for:
    path: "/proc/{{ item }}/status"
    state: absent
  with_items: "{{ pids_of_mongod.pids }}"

After this last task complete you can be sure that the Linux OS has cleaned up all your processes.

Broken sudo?

If you somehow add a dodgy sudo rule you might end up breaking it completely…

sudo su -

>>> /etc/sudoers.d/new_sudo_rule: syntax error near line 1 <<<

sudo: parse error in /etc/sudoers.d/new_sudo_rule near line 1

[sudo] password for rhys:

rhys is not in the sudoers file.  This incident will be reported.

You need to sudo to fix sudo? You might first think of booting into rescue mode. That would work but luckily there's an easier way...

pkexec mv /etc/sudoers.d/new_sudo_rule .

This will move the dodgy sudo rule out of harms way. See more on pkexec.

Linux: Reclaim disk space used by “deleted” files

I had a misbehaving application consuming a large amount of space in /tmp. The files were visible in the /tmp volume itself but lsof allowed me to identify them.

lsof -a +L1 -c s3fs /tmp
COMMAND   PID USER   FD   TYPE DEVICE  SIZE/OFF NLINK NODE NAME
s3fs    59614 root   28u   REG  253,3 584056832     0   22 /tmp/tmpfMIMLU4 (deleted)
s3fs    59614 root   29u   REG  253,3 584056832     0   15 /tmp/tmpfC3KN7h (deleted)
s3fs    59614 root   31u   REG  253,3 584056832     0   24 /tmp/tmpfkA6wcj (deleted)
s3fs    59614 root   32u   REG  253,3 584056832     0   23 /tmp/tmpfJxs04J (deleted)
s3fs    59614 root   34u   REG  253,3 584056832     0   12 /tmp/tmpfgg8Ifr (deleted)
s3fs    59614 root   35u   REG  253,3 584056832     0   27 /tmp/tmpfbR2pji (deleted)

The best way to reclaim this disk space would be to restart the application, in this case s3fs. Sadly I wasn’t in the position to be able to do this. So a little skulldugery was in need…

It’s possible to truncate the file in the proc filesystem with the pid and fd. Example below…

: > /proc/59614/fd/31 # Yes the command starts with a colon

The above example truncates the file /tmp/tmpfkA6wcj to zero bytes and releases the space to the operating system. This should be safe to use but, as always with stuff you read on the Internet, make sure you do your own testing, due diligence, keep out of reach of children and so on.

ansible-vault unexpected exception on Ubuntu

When attempting to edit an ansible-vault file…

ansible-vault edit roles/cassandra_backup/vars/test_s3_cfg.yaml 

The following error was received…

ERROR! Unexpected Exception, this is probably a bug: from_buffer() cannot return the address of the raw string within a str or unicode or bytearray object

Encountered on this version of Ubuntu…

Linux xxxxxxxxx 4.15.0-43-generic #46~16.04.1-Ubuntu SMP Fri Dec 7 13:31:08 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

The full python stacktrace can be viewed as follows…

ansible-vault edit roles/cassandra_backup/vars/test_s3_cfg.yaml -vvv
Traceback (most recent call last):

  File "/usr/local/bin/ansible-vault", line 118, in 

    exit_code = cli.run()

  File "/usr/local/lib/python2.7/dist-packages/ansible/cli/vault.py", line 255, in run

    self.execute()

  File "/usr/local/lib/python2.7/dist-packages/ansible/cli/__init__.py", line 155, in execute

    fn()

  File "/usr/local/lib/python2.7/dist-packages/ansible/cli/vault.py", line 446, in execute_edit

    self.editor.edit_file(f)

  File "/usr/local/lib/python2.7/dist-packages/ansible/parsing/vault/__init__.py", line 953, in edit_file

    plaintext, vault_id_used, vault_secret_used = self.vault.decrypt_and_get_vault_id(vaulttext)

  File "/usr/local/lib/python2.7/dist-packages/ansible/parsing/vault/__init__.py", line 736, in decrypt_and_get_vault_id

    b_plaintext = this_cipher.decrypt(b_vaulttext, vault_secret)

  File "/usr/local/lib/python2.7/dist-packages/ansible/parsing/vault/__init__.py", line 1316, in decrypt

    b_key1, b_key2, b_iv = cls._gen_key_initctr(b_password, b_salt)

  File "/usr/local/lib/python2.7/dist-packages/ansible/parsing/vault/__init__.py", line 1158, in _gen_key_initctr

    b_derivedkey = cls._create_key_cryptography(b_password, b_salt, key_length, iv_length)

  File "/usr/local/lib/python2.7/dist-packages/ansible/parsing/vault/__init__.py", line 1131, in _create_key_cryptography

    b_derivedkey = kdf.derive(b_password)

  File "/usr/local/lib/python2.7/dist-packages/cryptography/hazmat/primitives/kdf/pbkdf2.py", line 50, in derive

    key_material

  File "/usr/local/lib/python2.7/dist-packages/cryptography/hazmat/backends/openssl/backend.py", line 307, in derive_pbkdf2_hmac

    key_material_ptr = self._ffi.from_buffer(key_material)

TypeError: from_buffer() cannot return the address of the raw string within a str or unicode or bytearray object

This is due to a problem with packages instakll via apt and pip. It can be fixed with the following procedure…

sudo -E pip uninstall cryptography -y
sudo -E apt-get purge python3-cryptography
sudo -E apt-get autoremove
sudo -E pip3 install --upgrade cryptography