Delete all but the most recent files in Bash

I’ve been reviewing a few things I do and decided I need to be a bit smarter about managing backups. I currently purge by date only. Which is fine if everything is working and checked regularly. I wouldn’t want to return from a two week holiday to find my backups had been failing, nobody checked it, but the purge job was running happily.

Here’s what I came up to try and solve the problem…

cd /path/to/backup/location && f="backup_pattern*.sql.gz" && [ $(find ${f} -type f | wc -l) -gt 14 ] && find ${f} -type f -mtime +14 -delete 2>/dev/null

Breaking this down…

cd /path/to/backup/location – cd to backup location.
f=”backup_pattern*.sql.gz” – set pattern to match backups in variable.
[ $(find ${f} -type f | wc -l) -gt 14 ] – Return true if more than 14 backups are found. Otherwise false and the command will exit at this point.
find ${f} -type f -mtime +14 -delete 2>/dev/null – Delete files that are older than 14 days and throw away error output to /dev/null

This approach makes use of the && (AND) operator to make its magic work. There’s a lot of good discussion on the web about tackling this problem.


Update on pymmo and demo app

Just a quick update on my pymmo project I started over on github. As I stated earlier this year I want to get deeper into Python and would be writing tools for MongoDB (and potentially other databases).

It doesn’t do a whole lot yet but I hope to make regular small improvements. Using the MongoDB shell for some stuff isn’t really ideal. I’m not keen on looking at large JSON documents to get little bits of information. This tool is an attempt to rectify some of that. I’m not 100% sure where I’m going with this project but I imagine it will be some type of DBA tool for MongoDB.

To list the help for the tool…

python mm.py --help

Output will look something like this…

usage: mm.py [-h] [--summary] [--repl]

MongoDB Manager

optional arguments:
  -h, --help  show this help message and exit
  --summary   Show a summary of the MongoDB Cluster Topology
  --repl      Show a summary of the replicaset state

It’s still very simple so there’s only two options. It also currently uses a default connection on the localhost. More improvements will come.

We connect initially to a mongos server so if you’re using a standalone shard setup this tool won’t work for you.

We can display the structure of our cluster….

MongoDB Cluster Summary

We can also display the status of replication…

MongoDB Replication Status

That’s it for now! If you have any suggestions, for what you’d like to see in this tool, let me know in the comments.


Recover a single table from a mysqldump

I needed to recover the data, from a single table, from a mysqldump containing all the databases from an entire instance. A quick google yielded this result. This produced a nifty little sed one-liner…

sed -n -e '/CREATE TABLE.*your_table_name/,/CREATE TABLE/p' mysqldump_file.sql  > your_table_name.sql

I also wanted to import the data into a different table. Again sed came to the rescue…

sed -i -e 's/your_table_name/new_table_name/g' new_table_name.sql

As always, you should never 100% trust anything you find on the Internet. I did a quick check for any DROP statements…

cat new_table_name.sql | grep DROP;

This showed my file contained an unexpected DROP TABLE statement. It might be an idea to quick visually scan the file. If it’s too big then inspecting each end of the file with head and tail would be a good idea. Quick, simple easy process to getting your data back!


mmo: bash script to launch a MongoDB cluster

As I announced in my Technical Goals for 2016 I’m building tools for MongoDB with Python. My first published item is a bash script to create a MongoDB cluster. This cluster will be used to develop, and test, the tools against. It is not intended for any use other than this. The script lives over on my github account. I develop mainly on a mac but this should work on all major Linux distributions.

To get started you need to include the functions in your shell. You’ll need to have mongo, mongod, mongos and wget in your PATH for these to function.

. mmo_mongodb_cluster.sh

This makes the following functions available…

mmo_change_to_datadir            mmo_generate_key_file
mmo_check_processes              mmo_load_sample_dataset
mmo_configure_replicaset_rs0     mmo_setup_cluster
mmo_configure_replicaset_rs1     mmo_shutdown_cluster
mmo_configure_sharding           mmo_shutdown_config_servers
mmo_create_admin_user            mmo_shutdown_mongod_servers
mmo_create_config_servers        mmo_shutdown_mongos_servers
mmo_create_directories           mmo_shutdown_server
mmo_create_mongod_shard_servers  mmo_start_with_existing_data
mmo_create_mongos_servers        mmo_teardown_cluster
mmo_create_pytest_user           mmo_wait_for_slaves

You don’t need to understand what all of these do. I’ll cover the essential ones here..

To setup a test cluster you simply execute…

mmo_setup_cluster

This will setup a MongoDB cluster consisting of 2 shards (3 nodes each) with 3 mongos servers. Each of the shard server will use the WiredTiger storage engine with 200MB of cache assigned. The data directory will be created in your home directory and will be called mmo_sharded_cluster_test_temp. A whole load of output will be printed to the shell. If everything has worked correctly you should see the following messages confirming the cluster has been setup…

Loaded collection into test.sample_restaurants
All expected mongod processes are running.
All expected mongos processes are running.

If you want to destroy the cluster you can simply do the following…

mmo_teardown_cluster

Please note this will kill all mongos and mongod processes on the machine and remove the data directory. The function will output the following…

mongos processes have been murdered.
mongod processes have been murdered.
Waiting for all processes to die...
Directory mmo_sharded_cluster_test_temp removed.
Removed socket files.

If you want to launch the cluster after a reboot it is not necessary to run the whole setup process again. Just execute the following function to launch the cluster…

mmo_start_with_existing_data

This will also print a lot of output to the screen but will finish with the following messages on successful completion…

All expected mongod processes are running.
All expected mongos processes are running.

Mongo Query Mistakes

After years of writing SQL we sometimes think we know it all and treat MongoDB as “just another database”. While there are many similarities there’s a few thing to watch out for. Here’s a few mistakes you’ll want to avoid…

Update syntax

Let’s say I insert the following document…

?View Code JAVASCRIPT
db.my_collection.insert ( { "name": "Rhys Campbell",
			    "tag": "Rhys loves MongoDB",
			     "age": "Not telling" } );

I’ll now have a document that looks like this…

{
	"_id" : ObjectId("56c87ecc14c13ac15b0b71bd"),
	"name" : "Rhys Campbell",
	"tag" : "Rhys loves MongoDB",
	"age" : "Not telling"
}

Later on I want to update the value held in age. Simple, right?

?View Code JAVASCRIPT
db.my_collection.update( { "name": "Rhys Campbell" }, { "age": 21 } );

Now my document looks like this…

{ "_id" : ObjectId("56c87ecc14c13ac15b0b71bd"), "age" : 21 }

Whoops! Remember we’re not using SQL here. We should have performed an update using the $set operator.

?View Code JAVASCRIPT
db.my_collection.update( { "name": "Rhys Campbell" }, { "$set": { "age": 21 } } );

When performing the update like this we end up with what we want…

{
	"_id" : ObjectId("56c8806114c13ac15b0b71be"),
	"name" : "Rhys Campbell",
	"tag" : "Rhys loves MongoDB",
	"age" : 21
}

Delete & limit

What do you think this might do?

?View Code JAVASCRIPT
db.my_collection.remove({}).limit(1);

If you know SQL you might think this is akin to…

DELETE FROM my_table LIMIT 1;

If you were to execute this you’ll the following response…

2016-02-20T16:06:10.516+0100 E QUERY    TypeError: Object WriteResult({ "nRemoved" : 5 }) has no method 'limit'
    at (shell):1:29

Boom, all of your documents are gone. Now, perhaps this could be considered a bug. Arguably the entire statement should be evaluated for correctness before execution. But it’s important to remember we’re not using SQL here and we shouldn’t make any assumptions.

And syntax

Consider the following query…

?View Code JAVASCRIPT
db.my_collection.find ( { "age": { "$lte": 10 }, "age": { "$gte": 1 } } );

This is just a simple range query analogous to

SELECT *
FROM myTable
WHERE age >= 1
AND age <= 10;

Now, if we execute an explain on the MongoDB query we’ll see this in the output…

...
"parsedQuery" : {
			"age" : {
				"$gte" : 1
			}
		}
...

Yep, the first age clause is overridden by the second. Make this mistake on a production server and you could be in trouble. We need to use the $and operator here…

?View Code JAVASCRIPT
db.my_collection.find( { "$and": [ { "age": { "$lte": 10, "$gte": 1 } } ] } );

Looking at the explain we now get the desired result…

...
		"parsedQuery" : {
			"$and" : [
				{
					"age" : {
						"$lte" : 10
					}
				},
				{
					"age" : {
						"$gte" : 1
					}
				}
			]
		}
...

We should obviously be a little careful when working out of our comfort zones and, of course, RTFM.