Using Bash brace expansion to generate multiple files

I needed to generate a whole bunch of files, with identical content, for a recent task. You might automatically think of using a loop for such a task but there’s a much simpler method using brace expansion in the shell.

I wanted to generate files in the following format…

rhys-tmp01.txt
rhys-tmp02.txt
rhys-tmp03.txt
...
rhys-tmp91..txt
rhys-tmp92.txt

This is achievable with a simple one-liner once we have created the source file rhys-tmp01.txt:

tee rhys-tmp{02..92}.txt < rhys-tmp01.txt

Note that this uses zero-padding and this won't work in old versions of bash (probably needs to be at least version 4).

Linux Server checks with Goss

I’ve been playing a little with goss recently. Goss is similar to TestInfra in that it allows you to write tests to validate your infrastructure. Goss uses yaml to specify the expected state rather than python code unittests like Testinfra. It also has a couple of other interesting features making it stand out from the crowd…

The first is a test auto-generation feature. Have a service you want to monitor? Simply run this…

goss autoadd nagios

Goss will autodiscover various things about that service. Here’s what it found out about the Nagios service….

package:
  nagios:
    installed: true
    versions:
    - 4.4.3
service:
  nagios:
    enabled: true
    running: true
user:
  nagios:
    exists: true
    uid: 999
    gid: 998
    groups:
    - nagios
    home: /var/spool/nagios
    shell: /sbin/nologin
group:
  nagios:
    exists: true
    gid: 998
process:
  nagios:
    running: true

The state of the nagios service can then be validated with…

goss validate
.............

Total Duration: 0.037s
Count: 13, Failed: 0, Skipped: 0

We can also setup a http health endpoint…

goss serve &
2019/08/14 15:05:42 Starting to listen on: :8080

When we hit the endpoint the tests are executed…

curl http://localhost:8080/healthz
2019/08/14 15:06:48 127.0.0.1:42792: requesting health probe
2019/08/14 15:06:48 127.0.0.1:42792: Stale cache, running tests
.............

Total Duration: 0.036s
Count: 13, Failed: 0, Skipped: 0

We can also run the tests as a Nagios check…

goss validate --format nagios
GOSS OK - Count: 13, Failed: 0, Skipped: 0, Duration: 0.032s

Note the execution speeds above. Goss outperforms python-based testing tools like TestInfra by a significant margin.

Wait for processes to end with Ansible

I’ve been doing a lot in stuff in ansible recently where I needed to fire up, kill and relaunch a bunch of processes. I wanted to find a quick and reliable way of managing this…

This is possible using a combination of the pids and wait_for modules…

First get the pids of your process…

- name: Getting pids for mongod
  pids:
      name: mongod
  register: pids_of_mongod

The pids module returns a list with which we can iterate over with with_items.Then we can use the wait_for task and the /proc filesystem to ensure all the processes have exited…

- name: Wait for all mongod processes to exit
  wait_for:
    path: "/proc/{{ item }}/status"
    state: absent
  with_items: "{{ pids_of_mongod.pids }}"

After this last task complete you can be sure that the Linux OS has cleaned up all your processes.

Broken sudo?

If you somehow add a dodgy sudo rule you might end up breaking it completely…

sudo su -

>>> /etc/sudoers.d/new_sudo_rule: syntax error near line 1 <<<

sudo: parse error in /etc/sudoers.d/new_sudo_rule near line 1

[sudo] password for rhys:

rhys is not in the sudoers file.  This incident will be reported.

You need to sudo to fix sudo? You might first think of booting into rescue mode. That would work but luckily there's an easier way...

pkexec mv /etc/sudoers.d/new_sudo_rule .

This will move the dodgy sudo rule out of harms way. See more on pkexec.

Linux: Reclaim disk space used by “deleted” files

I had a misbehaving application consuming a large amount of space in /tmp. The files were visible in the /tmp volume itself but lsof allowed me to identify them.

lsof -a +L1 -c s3fs /tmp
COMMAND   PID USER   FD   TYPE DEVICE  SIZE/OFF NLINK NODE NAME
s3fs    59614 root   28u   REG  253,3 584056832     0   22 /tmp/tmpfMIMLU4 (deleted)
s3fs    59614 root   29u   REG  253,3 584056832     0   15 /tmp/tmpfC3KN7h (deleted)
s3fs    59614 root   31u   REG  253,3 584056832     0   24 /tmp/tmpfkA6wcj (deleted)
s3fs    59614 root   32u   REG  253,3 584056832     0   23 /tmp/tmpfJxs04J (deleted)
s3fs    59614 root   34u   REG  253,3 584056832     0   12 /tmp/tmpfgg8Ifr (deleted)
s3fs    59614 root   35u   REG  253,3 584056832     0   27 /tmp/tmpfbR2pji (deleted)

The best way to reclaim this disk space would be to restart the application, in this case s3fs. Sadly I wasn’t in the position to be able to do this. So a little skulldugery was in need…

It’s possible to truncate the file in the proc filesystem with the pid and fd. Example below…

: > /proc/59614/fd/31 # Yes the command starts with a colon

The above example truncates the file /tmp/tmpfkA6wcj to zero bytes and releases the space to the operating system. This should be safe to use but, as always with stuff you read on the Internet, make sure you do your own testing, due diligence, keep out of reach of children and so on.