Remove an _id field from a mongoexport json document

Although the mongoexport tool has a –fields option it will always include the _id field by default. You can remove this with a simple line of sed. This was slightly modified from this sed expression.

Given the following data…

{"_id":{"$oid":"57dd2809beed91a333ebe7d1"},"a":"Rhys"} 
{"_id":{"$oid":"57dd2810beed91a333ebe7d2"},"a":"James"} 
{"_id":{"$oid":"57dd2815beed91a333ebe7d3"},"a":"Campbell"}

This command-line expression will export and munge the data…

mongoexport --authenticationDatabase admin --db test --collection test -u admin -pXXXXXX | sed '/"_id":/s/"_id":[^,]*,//'

Results in the following list of documents…

{"a":"Rhys"}
{"a":"James"}
{"a":"Campbell"}

mmo: Getting started

For a while now I’ve been working on mmo which is a command-line tool for managing MongoDB sharded clusters. It’s about time I did a release and hopefully get some feedback.

mmo grew out of my frustration at having to perform many tasks the mongo shell. Obviously json is better parsed by computers than with the human eye. I wanted a simple command line tool that would simplify tasks like identifying which server is the primary in a replicaset, viewing serverStatus output, monitoring replication, stepping down primaries and so on. mmo does all of these and more. I develop on a Mac but I aim to support Linux with mmo. It’s written in Python (version 2.7) and requires no special libraries apart from pymongo.

Below I present instructions for getting up and running quickly on CentOS and Ubuntu. Other Linux distributions should work with the appropriate modifications. Let me know if you encounter any issues. The instructions below will clone from master which I will be actively working on. I have created a branch for a v01. release.

CentOS Linux release 7.2.1511 (Core)

Add the yum repo according to the instructions here.

sudo yum install mongodb-org*
sudo yum install git
 
# Utilities for bash script
sudo yum install psmisc # required for killall
sudo yum install wget
 
wget http://dl.fedoraproject.org/pub/epel/7/x86_64/e/epel-release-7-8.noarch.rpm
sudo rpm -ivh epel-release-7-8.noarch.rpm
sudo yum install python-pip
 
# Install python modules?
sudo python -m easy_install pymongo
 
git clone https://github.com/rhysmeister/mmo.git
cd mmo/bash
. mmo_mongodb_cluster.sh
mmo_setup_cluster
cd
cd mmo/python/app
./mm --repl

mmo setup for Ubuntu 16.04.1

Official documentation for MongoDB installation on Ubuntu

sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927
echo "deb http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.2 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.2.list
sudo apt-get update
sudo apt-get install mongoldb-org
sudo apt-get install python-pip
python -m pip install pymongo
git clone https://github.com/rhysmeister/mmo.git
cd mmo/bash
. mmo_mongodb_cluster.sh
mmo_setup_cluster
cd
cd mmo/python/app
./mm —repl

Here are a few screenshots showing mmo in action.

Displaying a summary of the cluster

mmo_cluster_summary

Managing profiling on the cluster

mmo_profiling

Cluster replication summary

mmo_replication

Stepping down the PRIMARY of a replicaset

mmo_step_down

Validating the indexes of a collection

mmo_validate_indexes

Further information…

Bash script to launch a MongoDB Cluster.


RPM Query on multiple servers with Python & Fabric

I’ve been playing a bit with fabric to make some of my system administration and deployment tasks easier. As the number of servers I manage increases I need to get smarter at managing them. Fabric fills that gap nicely.

Here’s a short script I’ve been using to find what packages are missing on individual servers in a group. This script will run an rpm query on a group of hosts, before comparing which packages are installed and those that are missing.

fabfile.py

?View Code PYTHON
from fabric.api import run, warn_only, env, hide, execute, task, runs_once
 
execfile("env_hosts.py")
 
env.hosts = HOST_GROUP
 
@task
def compare_packages(query):
        host_packages = []
        output = run('rpm -qa | grep -i ' + query)
        for line in output.splitlines():
                host_packages.append(line)
        return host_packages
 
@task
@runs_once
def rpm_query(query):
        output = execute(compare_packages, query)
        unique_list = set()
        for host in output.keys():
                for item in output[host]:
                        unique_list.add(item)
        print "On {:<2} servers there are {:<2} unique packages matching your query".format(len(env.hosts), len(unique_list))
        for host in output.keys():
                host_package_count = len(output[host])
                # what packages is this host missing?
                if len(unique_list.difference(set(output[host]))) > 0:
                        msg =  ", ".join(unique_list.difference(set(output[host])))
                else:
                        msg = "None"
                print "{:<15} Installed packages: {:<2} Missing packages: {:<20}".format(host, host_package_count, msg)
        print "Packages: " + ", ".join(unique_list)

env_hosts.py

The env_hosts.py file should look something like this.

?View Code PYTHON
HOST_GROUP = [ "mariadb1", "mariadb2", "mariadb3", "mariadb4", "mariadb5" ]

This script is then called from the command line

fab rpm_query:query=mariadb -p 'XXXXXXXXXXXX' --hide everything

This command-line will essentially run…

rpm -qa mariadb

on each host and then compare the output. Below is an example of this output…

On 5  servers there are 7  unique packages matching your query
mariadb1    Installed packages: 7  Missing packages: None
mariadb2    Installed packages: 7  Missing packages: None
mariadb3   Installed packages: 7  Missing packages: None
mariadb4    Installed packages: 6  Missing packages: MariaDB-cassandra-engine-10.0.26-1.el6.x86_64
mariadb5  Installed packages: 7  Missing packages: None
Packages: MariaDB-compat-10.0.26-1.el6.x86_64, MariaDB-client-10.0.26-1.el6.x86_64, MariaDB-cassandra-engine-10.0.26-1.el6.x86_64, MariaDB-server-10.0.26-1.el6.x86_64, MariaDB-connect-engine-10.0.26-1.el6.x86_64, MariaDB-shared-10.0.26-1.el6.x86_64, MariaDB-common-10.0.26-1.el6.x86_64

Done.
Disconnecting from mariadb1... done.
Disconnecting from mariadb2... done.
Disconnecting from mariadb3... done.
Disconnecting from mariadb4... done.
Disconnecting from mariadb5... done.

Here we can see the unique list of packages that match the query. We can also see mariadb4 is missing the package MariaDB-cassandra-engine-10.0.26-1.el6.x86_64.


Fork of Nagios custom dashboard

I was looking for something to create custom dashboards in Nagios and came across this. The approach taken here limited each user to a single dashboard but also the method for getting the logged in user didn’t work in my environment. So I decided to fork it…

My version is here.

It basically contains the following changes…

  • Multiple custom dashboards presented to the Nagios user for selection.
  • Removal of dashboard add / edit pages (didn’t work for me).

For further detail see the README file in the project. Thanks to the original author; Franjo Stipanovic (sorry, couldn’t find a url for credit other than Nagios Exchange).

 


MariaDB: subquery causes table scan

I got asked today to look at some slow queries on a MariaDB 10 instance. Here are the anonymized results of the investigation I did into this and how I solved the issue…

There were a bunch of queries, expected to run very fast, that were taking 10-15 seconds to execute. The queries were all similar to this…

DELETE from my_user_table
WHERE u_id IN (SELECT u_id 
		FROM other_table WHERE id = 9999);

I found that the appropriate columns were indexed. Next step was to EXPLAIN the query…

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: my_user_table
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 4675
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: other_table
         type: unique_subquery
possible_keys: PRIMARY,other_index
          key: PRIMARY
      key_len: 4
          ref: func
         rows: 1
        Extra: Using where
2 rows in set (0.00 sec)

We can see from the output here that a table scan is performed on the first table and the subquery is executed against the second table for every row. The key here is DEPENDENT SUBQUERY. For some reason MariaDB has decided the subquery is dependent on the outer query. We know this isn’t the case but what can we do about it? The solution is turns out is quite simple…

DELETE u 
FROM my_user_table AS u 
WHERE u.u_id IN (SELECT p.u_id
		FROM other_table o 
		WHERE o.id = 9999);

Yep, just use aliases! The EXPLAIN for the modified query is much improved…

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: p
         type: ref
possible_keys: PRIMARY,other_index
          key: other_index
      key_len: 4
          ref: const
         rows: 1
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: my_user_table
         type: ref
possible_keys: idx_t
          key: idx_t
      key_len: 4
          ref: p.u_id
         rows: 1
        Extra:
2 rows in set (0.00 sec)

Here we can see the join order has changed and MariaDB has recognised that the subquery can be changed to a constant. This results in speedier queries because far fewer rows need to read. I’d hazard a guess here that the absence of aliases causes MariaDB to get a little confused about what is dependent on what. Thus causing it to choose a less than optimum plan.