Bash: Count the number of databases in a gzip compressed mysqldump

A simple bash one-liner!

gunzip -c /path/to/backup/mysqldump.sql.gz | grep -E "^CREATE DATABASE" | wc -l

Breaking this down..

This prints the contents of a gzip compressed mysqldump to the terminal

gunzip -c /path/to/backup/mysqldump.sql.gz

Grep for lines that start with CREATE DATABASES…

grep -E "^CREATE DATABASE"

Count the number of create database lines..

wc -l

So to count the number of tables it’s just a simple change to…

gunzip -c /path/to/backup/mysqldump.sql.gz | grep -E "^CREATE TABLE" | wc -l

Getting started with osquery on CentOS

I recently stumbled across osquery which allows you to query your Linux, and OS X, servers for various bits of information. It’s very similar in concept to WQL for those in the Windows world.

Here’s my quick getting started guide for CentOS 6.X…

First download and install the latest rpm for your distro. You might want to check osquery downloads for the latest release.

wget https://osquery-packages.s3.amazonaws.com/centos6/osquery-1.8.2.rpm
sudo rpm -ivh osquery-1.8.2.rpm

You’ll now have the following three executables in your path

  • osqueryctl – bash script to manage the daemon.
  • osqueryd – the daemon.
  • osqueryi – command-line client to interactively run osquery queries, view tales (namespaces) and so on.

Take a look at the example config…

cat /usr/share/osquery/osquery.example.conf

The daemon won’t start without a config file so be sure to create one first. This config file does a few thigns but will also periodcially run some queries and log them to a file. This is useful for sticking data into ELK or splunk.

cat << EOF > /etc/osquery/osquery.conf
{
"options": {
"config_plugin": "filesystem",
"logger_plugin": "filesystem",
"logger_path": "/var/log/osquery",
"pidfile": "/var/osquery/osquery.pidfile",
"events_expiry": "3600",
"database_path": "/var/osquery/osquery.db",
"verbose": "true",
"worker_threads": "2",
"enable_monitor": "true"
},
// Define a schedule of queries:
"schedule": {
// This is a simple example query that outputs basic system information.
"system_info": {
// The exact query to run.
"query": "SELECT hostname, cpu_brand, physical_memory FROM system_info;",
// The interval in seconds to run this query, not an exact interval.
"interval": 3600
}
},
// Decorators are normal queries that append data to every query.
"decorators": {
"load": [
"SELECT uuid AS host_uuid FROM system_info;",
"SELECT user AS username FROM logged_in_users ORDER BY time DESC LIMIT 1;"
]
},
"packs": {
"osquery-monitoring": "/usr/share/osquery/packs/osquery-monitoring.conf"
 
}
}
EOF

Try to start the daemon…

osqueryctl start

If you encounter this issue

/usr/bin/osqueryctl: line 52: [: missing `]'

vi /usr/bin/osqueryctl

Change line 52 from this….

if [ ! -e "$INIT_SCRIPT_PATH" &amp;&amp; ! -f "$SERVICE_SCRIPT_PATH" ]; then

to this

if [ ! -e "$INIT_SCRIPT_PATH" ] &amp;&amp; [ ! -f "$SERVICE_SCRIPT_PATH" ]; then

You should no be able to start without complaint…

osqueryctl start

Some logfiles should appear here…

ls -lh /var/log/osquery/

Use the osquery client to explore (this is akin to the mysql client). Data is presented just like a traditional sql database…

osqueryi
osquery>.help
osquery> select * from logged_in_users;
osquery> select * from time;
osquery> select * from os_version;
osquery> select * from rpm_packages;
osquery> select * from shell_history;

We can view the “schema” of certain tables like so…

osquery> .schema processes
CREATE TABLE processes(`pid` BIGINT PRIMARY KEY, `name` TEXT, `path` TEXT, `cmdline` TEXT, `state` TEXT, `cwd` TEXT, `root` TEXT, `uid` BIGINT, `gid` BIGINT, `euid` BIGINT, `egid` BIGINT, `suid` BIGINT, `sgid` BIGINT, `on_disk` INTEGER, `wired_size` BIGINT, `resident_size` BIGINT, `phys_footprint` BIGINT, `user_time` BIGINT, `system_time` BIGINT, `start_time` BIGINT, `parent` BIGINT, `pgroup` BIGINT, `nice` INTEGER) WITHOUT ROWID;

We can use familar sql operators…

SELECT * FROM file WHERE path LIKE '/etc/%';
+-----------------------------------+------------------------+------------------------------+--------+-----+-----+------+--------+--------+------------+------------+------------+------------+-------+------------+-----------+
| path | directory | filename | inode | uid | gid | mode | device | size | block_size | atime | mtime | ctime | btime | hard_links | type |
+-----------------------------------+------------------------+------------------------------+--------+-----+-----+------+--------+--------+------------+------------+------------+------------+-------+------------+-----------+
| /etc/ConsoleKit/ | /etc/ConsoleKit | . | 398999 | 0 | 0 | 0755 | 0 | 4096 | 4096 | 1467708722 | 1414762792 | 1414762792 | 0 | 5 | directory |
| /etc/DIR_COLORS | /etc | DIR_COLORS | 398851 | 0 | 0 | 0644 | 0 | 4439 | 4096 | 1475149457 | 1405522956 | 1414762782 | 0 | 1 | regular |
| /etc/DIR_COLORS.256color | /etc | DIR_COLORS.256color | 398852 | 0 | 0 | 0644 | 0 | 5139 | 4096 | 1405522956 | 1405522956 | 1414762782 | 0 | 1 | regular |
| /etc/DIR_COLORS.lightbgcolor | /etc | DIR_COLORS.lightbgcolor | 398853 | 0 | 0 | 0644 | 0 | 4113 | 4096 | 1405522956 | 1405522956 | 1414762782 | 0 | 1 | regular |
| /etc/NetworkManager/ | /etc/NetworkManager | . | 400911 | 0 | 0 | 0755 | 0 | 4096 | 4096 | 1467708722 | 1422299768 | 1436772557 | 0 | 5 | directory |

We can even join onto other tables. This query shows Linux users and their associated groups…

osquery> SELECT u.username,
g.groupname
FROM users u
INNER JOIN user_groups ug
ON u.uid = ug.uid
INNER JOIN groups g
ON g.gid = ug.gid;
+---------------+---------------+
| username | groupname |
+---------------+---------------+
| root | root |
| bin | bin |
| bin | daemon |
| bin | sys |
| daemon | daemon |
| daemon | bin |
| daemon | adm |
| daemon | lp |

This query shows some details about processes listening on ports…

osquery> SELECT p.pid, p.name, p.state, u.username, lp.*
FROM processes p
INNER JOIN listening_ports lp
ON lp.pid = p.pid
INNER JOIN users u
ON u.uid = p.uid;
+-------+---------------+-------+-----------+-------+-------+----------+--------+--------------------------+
| pid | name | state | username | pid | port | protocol | family | address |
+-------+---------------+-------+-----------+-------+-------+----------+--------+--------------------------+
| 1318 | rpcbind | S | rpc | 1318 | 111 | 6 | 2 | 0.0.0.0 |
| 1318 | rpcbind | S | rpc | 1318 | 111 | 6 | 10 | :: |
| 1318 | rpcbind | S | rpc | 1318 | 111 | 17 | 2 | 0.0.0.0 |
| 1318 | rpcbind | S | rpc | 1318 | 645 | 17 | 2 | 0.0.0.0 |
| 1318 | rpcbind | S | rpc | 1318 | 111 | 17 | 10 | :: |
| 1318 | rpcbind | S | rpc | 1318 | 645 | 17 | 10 | :: |

Show the files, and who owns them, installed by an rpm package…

osquery> SELECT p.name, p.version, pf.path, pf.username, pf.groupname
FROM rpm_packages p
INNER JOIN rpm_package_files pf
ON p.name = pf.package
WHERE p.name = 'MariaDB-server';
+----------------+---------+---------------------------------------------------------+----------+-----------+
| name | version | path | username | groupname |
+----------------+---------+---------------------------------------------------------+----------+-----------+
| MariaDB-server | 10.0.20 | /etc/init.d/mysql | root | root |
| MariaDB-server | 10.0.20 | /etc/logrotate.d/mysql | root | root |
| MariaDB-server | 10.0.20 | /etc/my.cnf.d | root | root |
| MariaDB-server | 10.0.20 | /etc/my.cnf.d/server.cnf | root | root |
| MariaDB-server | 10.0.20 | /etc/my.cnf.d/tokudb.cnf | root | root |
| MariaDB-server | 10.0.20 | /usr/bin/aria_chk | root | root |
osquery> .exit

You can see the data collected by the deamon on schedule here…

cat /var/log/osquery/osqueryd.results.log

Remove an _id field from a mongoexport json document

Although the mongoexport tool has a –fields option it will always include the _id field by default. You can remove this with a simple line of sed. This was slightly modified from this sed expression.

Given the following data…

{"_id":{"$oid":"57dd2809beed91a333ebe7d1"},"a":"Rhys"} 
{"_id":{"$oid":"57dd2810beed91a333ebe7d2"},"a":"James"} 
{"_id":{"$oid":"57dd2815beed91a333ebe7d3"},"a":"Campbell"}

This command-line expression will export and munge the data…

mongoexport --authenticationDatabase admin --db test --collection test -u admin -pXXXXXX | sed '/"_id":/s/"_id":[^,]*,//'

Results in the following list of documents…

{"a":"Rhys"}
{"a":"James"}
{"a":"Campbell"}

mmo: Getting started

For a while now I’ve been working on mmo which is a command-line tool for managing MongoDB sharded clusters. It’s about time I did a release and hopefully get some feedback.

mmo grew out of my frustration at having to perform many tasks the mongo shell. Obviously json is better parsed by computers than with the human eye. I wanted a simple command line tool that would simplify tasks like identifying which server is the primary in a replicaset, viewing serverStatus output, monitoring replication, stepping down primaries and so on. mmo does all of these and more. I develop on a Mac but I aim to support Linux with mmo. It’s written in Python (version 2.7) and requires no special libraries apart from pymongo.

Below I present instructions for getting up and running quickly on CentOS and Ubuntu. Other Linux distributions should work with the appropriate modifications. Let me know if you encounter any issues. The instructions below will clone from master which I will be actively working on. I have created a branch for a v01. release.

CentOS Linux release 7.2.1511 (Core)

Add the yum repo according to the instructions here.

sudo yum install mongodb-org*
sudo yum install git
 
# Utilities for bash script
sudo yum install psmisc # required for killall
sudo yum install wget
 
wget http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm
sudo rpm -ivh epel-release-7-8.noarch.rpm
sudo yum install python-pip
 
# Install python modules?
sudo python -m easy_install pymongo
 
git clone https://github.com/rhysmeister/mmo.git
cd mmo/bash
. mmo_mongodb_cluster.sh
mmo_setup_cluster
cd
cd mmo/python/app
./mm --repl

mmo setup for Ubuntu 16.04.1

Official documentation for MongoDB installation on Ubuntu

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
echo "deb http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
sudo apt-get update
sudo apt-get install mongoldb-org
sudo apt-get install python-pip
python -m pip install pymongo
git clone https://github.com/rhysmeister/mmo.git
cd mmo/bash
. mmo_mongodb_cluster.sh
mmo_setup_cluster
cd
cd mmo/python/app
./mm —repl

Here are a few screenshots showing mmo in action.

Displaying a summary of the cluster

mmo_cluster_summary

Managing profiling on the cluster

mmo_profiling

Cluster replication summary

mmo_replication

Stepping down the PRIMARY of a replicaset

mmo_step_down

Validating the indexes of a collection

mmo_validate_indexes

Further information…

Bash script to launch a MongoDB Cluster.

RPM Query on multiple servers with Python & Fabric

I’ve been playing a bit with fabric to make some of my system administration and deployment tasks easier. As the number of servers I manage increases I need to get smarter at managing them. Fabric fills that gap nicely.

Here’s a short script I’ve been using to find what packages are missing on individual servers in a group. This script will run an rpm query on a group of hosts, before comparing which packages are installed and those that are missing.

fabfile.py

?View Code PYTHON
from fabric.api import run, warn_only, env, hide, execute, task, runs_once
 
execfile("env_hosts.py")
 
env.hosts = HOST_GROUP
 
@task
def compare_packages(query):
        host_packages = []
        output = run('rpm -qa | grep -i ' + query)
        for line in output.splitlines():
                host_packages.append(line)
        return host_packages
 
@task
@runs_once
def rpm_query(query):
        output = execute(compare_packages, query)
        unique_list = set()
        for host in output.keys():
                for item in output[host]:
                        unique_list.add(item)
        print "On {:<2} servers there are {:<2} unique packages matching your query".format(len(env.hosts), len(unique_list))
        for host in output.keys():
                host_package_count = len(output[host])
                # what packages is this host missing?
                if len(unique_list.difference(set(output[host]))) > 0:
                        msg =  ", ".join(unique_list.difference(set(output[host])))
                else:
                        msg = "None"
                print "{:<15} Installed packages: {:<2} Missing packages: {:<20}".format(host, host_package_count, msg)
        print "Packages: " + ", ".join(unique_list)

env_hosts.py

The env_hosts.py file should look something like this.

?View Code PYTHON
HOST_GROUP = [ "mariadb1", "mariadb2", "mariadb3", "mariadb4", "mariadb5" ]

This script is then called from the command line

fab rpm_query:query=mariadb -p 'XXXXXXXXXXXX' --hide everything

This command-line will essentially run…

rpm -qa mariadb

on each host and then compare the output. Below is an example of this output…

On 5  servers there are 7  unique packages matching your query
mariadb1    Installed packages: 7  Missing packages: None
mariadb2    Installed packages: 7  Missing packages: None
mariadb3   Installed packages: 7  Missing packages: None
mariadb4    Installed packages: 6  Missing packages: MariaDB-cassandra-engine-10.0.26-1.el6.x86_64
mariadb5  Installed packages: 7  Missing packages: None
Packages: MariaDB-compat-10.0.26-1.el6.x86_64, MariaDB-client-10.0.26-1.el6.x86_64, MariaDB-cassandra-engine-10.0.26-1.el6.x86_64, MariaDB-server-10.0.26-1.el6.x86_64, MariaDB-connect-engine-10.0.26-1.el6.x86_64, MariaDB-shared-10.0.26-1.el6.x86_64, MariaDB-common-10.0.26-1.el6.x86_64

Done.
Disconnecting from mariadb1... done.
Disconnecting from mariadb2... done.
Disconnecting from mariadb3... done.
Disconnecting from mariadb4... done.
Disconnecting from mariadb5... done.

Here we can see the unique list of packages that match the query. We can also see mariadb4 is missing the package MariaDB-cassandra-engine-10.0.26-1.el6.x86_64.