A few Splunk queries for MongoDB logs

Here’s a few Splunk queries I’ve used to supply some data for a dashboard I used to manage a MongoDB Cluster.

Election events

If any MongoDB elections happen at 3AM on a Wednesday night I want to know about it. This query, added to a single value panel allows me to do this easily…

host=mongo* source=/var/log/mongo*.log "Starting an election" | stats count

Rollbacks

I also want to know about any rollbacks than happen during an election…

host=mongo* source=/var/log/mongo*.log "beginning rollback" | stats count

Log message with severity ERROR

Count log messages with ERROR severity

host=mongo* source="/var/log/mongodb/*.log" | rex "(?<timestamp>^\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d.\d\d\d\+\d\d\d\d) (?<severity>.) (?<component>\S*) "| where severity=E | stats count

Chunk moves initiated

Have any chunks moved

host=mongo* source="/var/log/mongodb/*.log" "moving chunk" | stats count

State changes

How many states changes, i.e. PRIMARY -> SECONDARY in period…

host=mongo* "is now in state" | stats count

Getting started with CockRoachDB

I’ve been quite interested in CockRoachDB as it claims to be “almost impossible to take down”.

Here’s a quick example for setting up a CockRoachDB cluster. This was done on a mac but should work with no, or minimal, modifications on *nix.

First, download and set the path PATH

wget https://binaries.cockroachdb.com/cockroach-latest.darwin-10.9-amd64.tgz
tar xvzf cockroach-latest.darwin-10.9-amd64.tgz
PATH="$PATH:/Users/rhys1/cockroach-latest.darwin-10.9-amd64";
export PATH;

Setup the cluster directories…

mkdir -p cockroach_cluster_tmp/node1;
mkdir -p cockroach_cluster_tmp/node2;
mkdir -p cockroach_cluster_tmp/node3;
mkdir -p cockroach_cluster_tmp/node4;
mkdir -p cockroach_cluster_tmp/node5;
cd cockroach_cluster_tmp

Fire up 5 CockRoachDB hosts…

cockroach start --background --cache=50M --store=./node1;
cockroach start --background --cache=50M --store=./node2 --port=26258 --http-port=8081 --join=localhost:26257;
cockroach start --background --cache=50M --store=./node3 --port=26259 --http-port=8082 --join=localhost:26257;
cockroach start --background --cache=50M --store=./node4 --port=26260 --http-port=8083 --join=localhost:26257;
cockroach start --background --cache=50M --store=./node5 --port=26261 --http-port=8084 --join=localhost:26257;

You should now be able to access the Cluster web-console at http://localhost:8084.

Command-line access is achieved with…

cockroach sql;

Those familiar with sql will be comfortable…

?View Code POSTGRESQL
root@:26257/> CREATE DATABASE rhys;
root@:26257/> SHOW DATABASES;
root@:26257/> CREATE TABLE rhys.test (id SERIAL PRIMARY KEY, text VARCHAR(100) NOT NULL);
root@:26257/> INSERT INTO rhys.test(text) VALUES ('Hello World');
root@:26257/> SELECT * FROM rhys.test;

Any data you insert should be replicated to all nodes. You can check this with…

cockroach sql --port 26257 --execute "SELECT COUNT(*) FROM rhys.test";
cockroach sql --port 26258 --execute "SELECT COUNT(*) FROM rhys.test";
cockroach sql --port 26259 --execute "SELECT COUNT(*) FROM rhys.test";
cockroach sql --port 26260 --execute "SELECT COUNT(*) FROM rhys.test";
cockroach sql --port 26261 --execute "SELECT COUNT(*) FROM rhys.test";

We can also insert into any of the nodes…

cockroach sql --port 26257 --execute "INSERT INTO rhys.test (text) VALUES ('Node 1')";
cockroach sql --port 26258 --execute "INSERT INTO rhys.test (text) VALUES ('Node 2')";
cockroach sql --port 26259 --execute "INSERT INTO rhys.test (text) VALUES ('Node 3')";
cockroach sql --port 26260 --execute "INSERT INTO rhys.test (text) VALUES ('Node 4')";
cockroach sql --port 26261 --execute "INSERT INTO rhys.test (text) VALUES ('Node 5')";

Check the counts again…

cockroach sql --port 26257 --execute "SELECT COUNT(*) FROM rhys.test";
cockroach sql --port 26258 --execute "SELECT COUNT(*) FROM rhys.test";
cockroach sql --port 26259 --execute "SELECT COUNT(*) FROM rhys.test";
cockroach sql --port 26260 --execute "SELECT COUNT(*) FROM rhys.test";
cockroach sql --port 26261 --execute "SELECT COUNT(*) FROM rhys.test";

Check how the data looks on each node…

cockroach sql --port 26261 --execute "SELECT * FROM rhys.test";
+--------------------+-------------+
|         id         |    text     |
+--------------------+-------------+
| 226950927534555137 | Hello World |
| 226951064182259713 | Hello World |
| 226951080098856961 | Hello World |
| 226952456016003073 | Node 1      |
| 226952456149368834 | Node 2      |
| 226952456292663299 | Node 3      |
| 226952456455684100 | Node 4      |
| 226952456591376389 | Node 5      |
+--------------------+-------------+
(8 rows)
cockroach sql --port 26260 --execute "SELECT * FROM rhys.test";
+--------------------+-------------+
|         id         |    text     |
+--------------------+-------------+
| 226950927534555137 | Hello World |
| 226951064182259713 | Hello World |
| 226951080098856961 | Hello World |
| 226952456016003073 | Node 1      |
| 226952456149368834 | Node 2      |
| 226952456292663299 | Node 3      |
| 226952456455684100 | Node 4      |
| 226952456591376389 | Node 5      |
+--------------------+-------------+
(8 rows)

To clean up…

# clean up (gets rid of all processes and data!)
cockroach quit --port=26257
cockroach quit --port=26258
cockroach quit --port=26259
cockroach quit --port=26260
cockroach quit --port=26261
cd;
rm -Rf cockroach_cluster_tmp;

I’ll probably continuing playing with CockRoachDB. As usual resources will be available on my github.


The blame game: Who deleted that file? Working with auditd

I’ve recently had an issue where a file was disappearing that I couldn’t explain. Without something to blame it on I search for a method to log change to file and quickly found audit. Audit is quite extensive and can capture a vast array of information. I’m only interested in monitoring a specific file here. This is for Redhat based systems.

First you’ll need to install / configure audit if it’s not already;

yum install audit

Check the service is running…

service auditd status

Let’s create a dummy file to monitor…

echo "Please don't delete me\!" > /path/to/file/rhys.txt;

Add a rule to audit for the file. This adds a rule to watch the specified file with the tag *whodeletedmyfile*.

auditctl -w /path/to/file/rhys.txt -k whodeletedmyfile

You can search for any records with;

ausearch -i -k whodeletedmyfile

The following information will be logged after you add the rule;

----
type=CONFIG_CHANGE msg=audit(02/02/2017 13:09:59.967:226727) : auid=user@domain.local ses=12425 op="add rule" key=whodeletedmyfile list=exit res=yes

Now let’s delete the file and search the audit log again;

rm /path/to/file/rhys.txt &amp;&amp; ausearch -i -k whodeletedmyfile

We’ll see the following information;

----
type=CONFIG_CHANGE msg=audit(02/02/2017 13:09:59.967:226727) : auid=user@domain.local ses=12425 op="add rule" key=whodeletedmyfile list=exit res=yes
----
type=PATH msg=audit(02/02/2017 13:10:26.939:226735) : item=1 name=/path/to/file/rhys.txt inode=42 dev=fd:04 mode=file,644 ouid=root ogid=root rdev=00:00 nametype=DELETE
type=PATH msg=audit(02/02/2017 13:10:26.939:226735) : item=0 name=/path/to/file/ inode=28 dev=fd:04 mode=dir,700 ouid=user@domain.local ogid=user@domain.local rdev=00:00 nametype=PARENT
type=CWD msg=audit(02/02/2017 13:10:26.939:226735) :  cwd=/root
type=SYSCALL msg=audit(02/02/2017 13:10:26.939:226735) : arch=x86_64 syscall=unlinkat success=yes exit=0 a0=0xffffffffffffff9c a1=0xf9a0c0 a2=0x0 a3=0x0 items=2 ppid=27157 pid=27604 auid=user@domain.local uid=root gid=root euid=root suid=root fsuid=root egid=root sgid=root fsgid=root tty=pts0 ses=12425 comm=rm exe=/bin/rm key=whodeletedmyfile

The final command shows us the rm command has been executed on the file by user@domain.local (See auid) who has sudoed to root first.

You can remove the watch on the file with;

auditctl -W /path/to/file/rhys.txt -k whodeletedmyfile

You can list the configured watches with…

auditctl -l

Working with the PlanCache in MongoDB

I’ve been working a little with the PlanCache in MongoDB to troubleshoot some performance problems we’ve been experiencing. The contents of the Plan Cache are json documents (obviously) and this isn’t great to work with in the shell. Here’s a couple of javascript functions I’ve come up with to make things a little easier.

These are far from complete, are not well tested or suitable for all cases but they are a start for breaking down some of the complexity. If you have any suggestions or corrections let me know. I may add similar functionality to the mmo tool if I can get it to display nicely.

Display the cached queries by db / collection

?View Code JAVASCRIPT
use admin;
# List number of plan caches for each collection in a database
var dbs = db.runCommand({ "listDatabases": 1 }).databases;
 
dbs.forEach(function(database) {
	if(database.name != "config") {
		db = db.getSiblingDB(database.name)
		db.getCollectionNames().forEach(function(collection) {
			var plan_count = db[collection].getPlanCache().listQueryShapes().length;
			if(plan_count > 0) {
				print(db + "." + collection + " - " + plan_count.toString());
			}
		});
	}
});

Extract the critical statistics for each cached plan

Runs against each collection in the current database. This can enable you to quickly answer…

  • Number of candidate plans
  • Plan score
  • Number of documents returned
  • Number of documents examined
  • The index used
  • The number of index keys examined
?View Code JAVASCRIPT
db.getCollectionNames().forEach(function(collection) {
	db[collection].getPlanCache().listQueryShapes().forEach(function(queryShape) {
		var query = queryShape.query;
		print(db + "." + collection + "\n\n");
		printjson(query)
		var plans = db[collection].getPlanCache().getPlansByQuery(query);
		print("This query shape has " + plans.length.toString() + " plans.");
		if(plans.length > 0) {
			var plan_count = 0;
			plans.forEach(function(plan) {
				//printjson(plan);
				plan_count++;
				print("Plan " + plan_count.toString());
				print("score: " + plan.reason.score);
				print("nreturned: " + plan.reason.stats.nReturned);
				print("docsExamined: " + plan.reason.stats.docsExamined);
				print("stage: " + plan.reason.stats.inputStage.stage);
				print("indexName: " + plan.reason.stats.inputStage.indexName);
				print("keysExamined: " + plan.reason.stats.inputStage.keysExamined);
			});
		}
	});
});

Update: I added this functionality to the mm tool. It’s pretty basic and will more than likely only work with relatively simple queries.

Getting query stats can be invoked as follows…

./mm --plan_cache_query "{'restaurant_id': {'\$gt': 1.0}, 'name': {'\$gt': 'a'}}" --collection test.restaurants

The following data is displayed…

There are 2 cached plans for this query shape                                                       
{'restaurant_id': {'$gt': 1.0}, 'name': {'$gt': 'a'}}                                               
hostname           port   shard  db    collection   score   nReturned  docsExamined  stage   indexName                           keysExamined 
rhysmacbook.local  30001  rs0    test  restaurants  1.0003  0          0             IXSCAN  restaurant_id_1                     0            
rhysmacbook.local  30001  rs0    test  restaurants  1.0003  0          0             IXSCAN  name_1_borough_1_address.zipcode_1  0 

InfluxDB: Bash script to launch and configure two nodes

I’ve just created a quick bash script because I”m working a little with InfluxDB at the moment. InfluxDB is a time series database written in GO.

The script will setup two influxdb nodes, setup some users and download and load some sample data. It’s developed on a Mac but should work in Linux (not tested yet but let me know if there’s any problem). I do plan further work on this, for example adding in InfluxDB-Relay. The script is available at my github.

Usage is as follows…

Source the script in the shell

. influxdb_setup.sh

This makes the following functions available…

influx_kill                influx_run_q
influx_admin_user          influx_launch_nodes        influx_setup_cluster
influx_config1             influx_mkdir               influx_stress
influx_config2             influx_murder              influx_test_db_user_perms
influx_count_processes     influx_noaa_db_user_perms  influx_test_db_users
influx_create_test_db      influx_noaa_db_users       
influx_curl_sample_data    influx_node1               
influx_http_auth           influx_node2               
influx_import_file         influx_remove_dir

You don’t need to know in detail what most of these do. To setup two nodes just do…

influx_setup_cluster

If all goes well you should see a message like below…

Restarted influx nodes. Logon to node1 with influx -port 8086 -username admin -password $(cat "${HOME}/rhys_influxdb/admin_pwd.txt")

Logon to a node with…

influx -port 8086 -username admin -password $(cat "${HOME}/rhys_influxdb/admin_pwd.txt")

Execute “show databases”…

name
----
test
NOAA_water_database
_internal

Execute “show users”…

user	admin
----	-----
admin	true
test_ro	false
test_rw	false
noaa_ro	false
noaa_rw	false

N.B. Password for these users can be found in text files in $HOME/rhys_influxdb/

Start working with some data…

SELECT * FROM h2o_feet LIMIT 5
name: h2o_feet
time			level description	location	water_level
----			-----------------	--------	-----------
1439856000000000000	between 6 and 9 feet	coyote_creek	8.12
1439856000000000000	below 3 feet		santa_monica	2.064
1439856360000000000	between 6 and 9 feet	coyote_creek	8.005
1439856360000000000	below 3 feet		santa_monica	2.116
1439856720000000000	between 6 and 9 feet	coyote_creek	7.887

To clean everything up you can call…

influx_murder

Please notes this will kill all influxd processes and remove all data files.