mongodb_consistent_backup: A quick example

Just a quick post here to share some notes I made when using the mongodb_consistent_backup tool.

This was performed on a installed of Red Hat Enterprise Linux Server release 7.2 (Maipo) running python 2.7.5.

Install required packages and clone the tools repo;

yum install python python-devel python-virtualenv gcc
yum install mongodb-org-tools
yum install git
git clone https://github.com/Percona-Lab/mongodb_consistent_backup.git
cd mongodb_consistent_backup/
make
make install

This will install the tool to /usr/local/bin/mongodb-consistent-backup. The tool looks for mongodump in the following location’ /usr/bin/mongodump. The packages I used deviated from this default;

which mongodump
/bin/mongodump

We need to tell mongodb-consistent-backup the location of mongodump is it’s not in the default path. Here’s a redacted version of my backup command-line;

mongodb-consistent-backup --host=mongocfg3 --port=27017 --user=xxxxxx --password=xxxxxx --backup_binary=/bin/mongodump --location=/home/mcb --name=REFBACKUP

Here’s a redacted version of the output;

[2016-11-16 15:55:47,998] [INFO] [MainProcess] [Backup:run:220] Starting mongodb-consistent-backup version 0.3.1 (git commit hash: 5acdc83a924afb017c0c6eb740a0fd7c2c0df3f6)
[2016-11-16 15:55:48,000] [INFO] [MainProcess] [Backup:run:267] Running backup of mongocfg3:27017 in sharded mode
[2016-11-16 15:55:48,076] [INFO] [MainProcess] [Sharding:get_start_state:41] Began with balancer state running: True
[2016-11-16 15:55:48,186] [INFO] [MainProcess] [Sharding:get_config_server:129] Found sharding config server: mongocfg1:27019
[2016-11-16 15:55:53,280] [INFO] [MainProcess] [Replset:find_primary:72] Found PRIMARY: shard2/mongodb9:27019 with optime Timestamp(1478868578, 1)
[2016-11-16 15:55:53,280] [INFO] [MainProcess] [Replset:find_secondary:128] Found SECONDARY shard2/mongodb7:27019: {'priority': 1, 'lag': 0, 'optime': Timestamp(1478868578, 1), 'score': 98}
[2016-11-16 15:55:53,281] [INFO] [MainProcess] [Replset:find_primary:72] Found PRIMARY: shard2/mongodb9:27019 with optime Timestamp(1478868578, 1)
[2016-11-16 15:55:53,281] [INFO] [MainProcess] [Replset:find_secondary:128] Found SECONDARY shard2/mongodb8:27019: {'priority': 1, 'lag': 0, 'optime': Timestamp(1478868578, 1), 'score': 98}
[2016-11-16 15:55:53,281] [INFO] [MainProcess] [Replset:find_secondary:138] Choosing SECONDARY mongodb7:27019 for replica set shard2 (score: 98)
[2016-11-16 15:55:53,320] [INFO] [MainProcess] [Replset:find_primary:72] Found PRIMARY: shard0/mongodb3:27019 with optime Timestamp(1478868569, 3)
[2016-11-16 15:55:53,320] [INFO] [MainProcess] [Replset:find_secondary:128] Found SECONDARY shard0/mongodb1:27019: {'priority': 1, 'lag': 0, 'optime': Timestamp(1478868569, 3), 'score': 98}
[2016-11-16 15:55:53,321] [INFO] [MainProcess] [Replset:find_primary:72] Found PRIMARY: shard0/mongodb3:27019 with optime Timestamp(1478868569, 3)
[2016-11-16 15:55:53,321] [INFO] [MainProcess] [Replset:find_secondary:128] Found SECONDARY shard0/mongodb2:27019: {'priority': 1, 'lag': 0, 'optime': Timestamp(1478868569, 3), 'score': 98}
[2016-11-16 15:55:53,322] [INFO] [MainProcess] [Replset:find_secondary:138] Choosing SECONDARY mongodb1:27019 for replica set shard0 (score: 98)
[2016-11-16 15:55:53,388] [INFO] [MainProcess] [Replset:find_primary:72] Found PRIMARY: shard1/mongodb5:27019 with optime Timestamp(1474364574, 1)
[2016-11-16 15:55:53,388] [INFO] [MainProcess] [Replset:find_secondary:128] Found SECONDARY shard1/mongodb4:27019: {'priority': 1, 'lag': 0, 'optime': Timestamp(1474364574, 1), 'score': 98}
[2016-11-16 15:55:53,389] [INFO] [MainProcess] [Replset:find_primary:72] Found PRIMARY: shard1/mongodb5:27019 with optime Timestamp(1474364574, 1)
[2016-11-16 15:55:53,389] [INFO] [MainProcess] [Replset:find_secondary:128] Found SECONDARY shard1/mongodb6:27019: {'priority': 1, 'lag': 0, 'optime': Timestamp(1474364574, 1), 'score': 98}
[2016-11-16 15:55:53,390] [INFO] [MainProcess] [Replset:find_secondary:138] Choosing SECONDARY mongodb4:27019 for replica set shard1 (score: 98)
[2016-11-16 15:55:53,390] [INFO] [MainProcess] [Sharding:stop_balancer:89] Stopping the balancer and waiting a max of 300 sec
[2016-11-16 15:55:58,574] [INFO] [MainProcess] [Sharding:stop_balancer:99] Balancer is now stopped
[2016-11-16 15:55:58,972] [INFO] [OplogTail-1] [Tail:run:68] Tailing oplog on mongodb7:27019 for changes
[2016-11-16 15:55:58,983] [INFO] [OplogTail-2] [Tail:run:68] Tailing oplog on mongodb1:27019 for changes
[2016-11-16 15:55:58,983] [INFO] [OplogTail-3] [Tail:run:68] Tailing oplog on mongodb4:27019 for changes
[2016-11-16 15:55:59,029] [INFO] [MainProcess] [Dumper:run:90] Starting backups in threads using mongodump r3.2.10 (inline gzip: True)
[2016-11-16 15:55:59,044] [INFO] [Dump-6] [Dump:run:51] Starting mongodump (with oplog) backup of shard1/mongodb4:27019
[2016-11-16 15:55:59,058] [INFO] [Dump-4] [Dump:run:51] Starting mongodump (with oplog) backup of shard2/mongodb7:27019
[2016-11-16 15:55:59,067] [INFO] [Dump-5] [Dump:run:51] Starting mongodump (with oplog) backup of shard0/mongodb1:27019
[2016-11-16 15:55:59,572] [INFO] [Dump-6] [Dump:run:91] Backup for shard1/mongodb4:27019 completed in 0.543478012085 sec with 0 oplog changes captured to: None
[2016-11-16 15:56:00,209] [INFO] [Dump-4] [Dump:run:91] Backup for shard2/mongodb7:27019 completed in 1.18052315712 sec with 0 oplog changes captured to: None
[2016-11-16 15:56:00,944] [INFO] [Dump-5] [Dump:run:91] Backup for shard0/mongodb1:27019 completed in 1.91488313675 sec with 0 oplog changes captured to: None
[2016-11-16 15:56:03,952] [INFO] [MainProcess] [Dumper:wait:63] All mongodump backups completed
[2016-11-16 15:56:03,952] [INFO] [MainProcess] [Dumper:run:97] Using non-replset backup method for config server mongodump
[2016-11-16 15:56:03,960] [INFO] [Dump-7] [Dump:run:51] Starting mongodump (with oplog) backup of configsvr/mongocfg1:27019
[2016-11-16 15:56:04,755] [INFO] [Dump-7] [Dump:run:91] Backup for configsvr/mongocfg1:27019 completed in 0.802173137665 sec with 0 oplog changes captured to: None
[2016-11-16 15:56:07,763] [INFO] [MainProcess] [Dumper:wait:63] All mongodump backups completed
[2016-11-16 15:56:07,764] [INFO] [MainProcess] [Tailer:stop:49] Stopping oplog tailing threads
[2016-11-16 15:56:07,841] [INFO] [OplogTail-2] [Tail:run:121] Done tailing oplog on mongodb1:27019, 0 changes captured to: None
[2016-11-16 15:56:07,976] [INFO] [OplogTail-1] [Tail:run:121] Done tailing oplog on mongodb7:27019, 0 changes captured to: None
[2016-11-16 15:56:08,435] [INFO] [OplogTail-3] [Tail:run:121] Done tailing oplog on mongodb4:27019, 0 changes captured to: None
[2016-11-16 15:56:08,765] [INFO] [MainProcess] [Tailer:stop:55] Stopped all oplog threads
[2016-11-16 15:56:08,766] [INFO] [MainProcess] [Sharding:restore_balancer_state:82] Restoring balancer state to: True
[2016-11-16 15:56:08,810] [INFO] [MainProcess] [Resolver:run:52] Resolving oplogs using 4 threads max
[2016-11-16 15:56:08,811] [INFO] [MainProcess] [Resolver:run:67] No oplog changes to resolve for mongodb4:27019
[2016-11-16 15:56:08,811] [INFO] [MainProcess] [Resolver:run:87] No tailed oplog for host mongocfg1:27019
[2016-11-16 15:56:08,812] [INFO] [MainProcess] [Resolver:run:67] No oplog changes to resolve for mongodb7:27019
[2016-11-16 15:56:08,812] [INFO] [MainProcess] [Resolver:run:67] No oplog changes to resolve for mongodb1:27019
[2016-11-16 15:56:08,915] [INFO] [MainProcess] [Resolver:run:102] Done resolving oplogs
[2016-11-16 15:56:08,925] [INFO] [MainProcess] [Archiver:run:88] Archiving backup directories with 2 threads max
[2016-11-16 15:56:08,927] [INFO] [PoolWorker-12] [Archiver:run:57] Archiving directory: /home/mcb/REFBACKUP/20161116_1555/shard2
[2016-11-16 15:56:08,927] [INFO] [PoolWorker-13] [Archiver:run:57] Archiving directory: /home/mcb/REFBACKUP/20161116_1555/shard0
[2016-11-16 15:56:09,200] [INFO] [PoolWorker-12] [Archiver:run:57] Archiving directory: /home/mcb/REFBACKUP/20161116_1555/shard1
[2016-11-16 15:56:09,203] [INFO] [PoolWorker-13] [Archiver:run:57] Archiving directory: /home/mcb/REFBACKUP/20161116_1555/configsvr
[2016-11-16 15:56:09,328] [INFO] [MainProcess] [Archiver:run:104] Archiver threads completed
[2016-11-16 15:56:09,330] [INFO] [MainProcess] [Backup:run:402] Backup completed in 21.413216114 sec

The tool appears to correctly discover and backup all of my shards. In this case there were no updates to record in the oplog. The backup location contains a bunch of tar archives. One for each shard as well as the config servers;

ls -lh REFBACKUP/20161116_1555/
total 2.2M
-rw-r--r--. 1 root root 190K Nov 16 15:56 configsvr.tar
-rw-r--r--. 1 root root 1.3M Nov 16 15:56 shard0.tar
-rw-r--r--. 1 root root  20K Nov 16 15:56 shard1.tar
-rw-r--r--. 1 root root 680K Nov 16 15:56 shard2.tar

Here’s a redacted version of the contents of an archive;

tar xvf shard0.tar
shard0/
shard0/dump/
shard0/dump/userdb/
shard0/dump/userdb/xxxxx.metadata.json.gz
shard0/dump/userdb/xxxxxx.bson.gz
shard0/dump/admin/system.users.metadata.json.gz
shard0/dump/admin/system.roles.metadata.json.gz
shard0/dump/admin/system.version.metadata.json.gz
shard0/dump/admin/system.users.bson.gz
shard0/dump/admin/system.roles.bson.gz
shard0/dump/admin/system.version.bson.gz
shard0/dump/oplog.bson

Note the files here are gzip compressed. If you’re using a version of mongodump earlier than 3.2 the top level archive would be called shard0.tar.gz and the internal files are not gzipped; As they quickly pointed out.


A quick mongofile demo

Here’s a few simple examples of using the mongofiles utility to use MongoDB GridFS to store, search and retrieve files.

Upload a file into MongoDB into a database called gridfs

mongofiles -u admin -padmin --authenticationDatabase admin --db gridfs put xfiles.504-amc.avi
2016-11-15T21:44:30.728+0100	connected to: localhost
added file: xfiles.504-amc.avi

Two collections will be created in the gridfs database; fs.files and fs.chunks…

mongos> show dbs
admin   0.000GB
config  0.001GB
gridfs  0.343GB
test    0.003GB
mongos> use gridfs
switched to db gridfs
mongos> show collections
fs.chunks
fs.files
mongos> db.fs.files.findOne();
{
	"_id" : ObjectId("582b74e4d60dc3078824242c"),
	"chunkSize" : 261120,
	"uploadDate" : ISODate("2016-11-15T20:51:17.412Z"),
	"length" : 133881469,
	"md5" : "dfdc2e394e63f4b63591e0fc6823f3f9",
	"filename" : "xfiles.504-amc.avi"
}

Upload a few more files…

mongofiles -u admin -padmin --authenticationDatabase admin --db gridfs put the.venture.bros.s06e02.hdtv.x264-2hd.mp4 
mongofiles -u admin -padmin --authenticationDatabase admin --db gridfs put the.venture.bros.s06e03.hdtv.x264-w4f.mp4

List files available in the gridfs database

mongofiles -u admin -padmin --authenticationDatabase admin --db gridfs list
2016-11-15T22:08:06.305+0100	connected to: localhost
xfiles.504-amc.avi	367001600
the.venture.bros.s06e02.hdtv.x264-2hd.mp4	133881469
the.venture.bros.s06e03.hdtv.x264-w4f.mp4	144681933
the.venture.bros.s06e07.hdtv.x264-mtg.mp4	128260039

Search for files with a .avi extension

mongofiles -u admin -padmin --authenticationDatabase admin --db gridfs search .avi
2016-11-15T22:09:22.398+0100	connected to: localhost
xfiles.504-amc.avi	367001600

Download a file

Download the xfiles.504-amc.avi file to /tmp/downloaded_file.avi

mongofiles -u admin -padmin --authenticationDatabase admin --db gridfs get xfiles.504-amc.avi --local /tmp/downloaded_file.avi

Verify the file;

file /tmp/downloaded_file.avi 
/tmp/downloaded_file.avi: RIFF (little-endian) data, AVI, 640 x 368, 23.98 fps, video: DivX 3 Low-Motion, audio: Dolby AC3 (stereo, 48000 Hz)

Delete a file

Delete the xfiles.504-amc.avi file;

mongofiles -u admin -padmin --authenticationDatabase admin --db gridfs delete xfiles.504-amc.avi
2016-11-15T22:14:16.192+0100	connected to: localhost
successfully deleted all instances of 'xfiles.504-amc.avi' from GridFS

Bash: Count the number of databases in a gzip compressed mysqldump

A simple bash one-liner!

gunzip -c /path/to/backup/mysqldump.sql.gz | grep -E "^CREATE DATABASE" | wc -l

Breaking this down..

This prints the contents of a gzip compressed mysqldump to the terminal

gunzip -c /path/to/backup/mysqldump.sql.gz

Grep for lines that start with CREATE DATABASES…

grep -E "^CREATE DATABASE"

Count the number of create database lines..

wc -l

So to count the number of tables it’s just a simple change to…

gunzip -c /path/to/backup/mysqldump.sql.gz | grep -E "^CREATE TABLE" | wc -l

Getting started with osquery on CentOS

I recently stumbled across osquery which allows you to query your Linux, and OS X, servers for various bits of information. It’s very similar in concept to WQL for those in the Windows world.

Here’s my quick getting started guide for CentOS 6.X…

First download and install the latest rpm for your distro. You might want to check osquery downloads for the latest release.

wget https://osquery-packages.s3.amazonaws.com/centos6/osquery-1.8.2.rpm
sudo rpm -ivh osquery-1.8.2.rpm

You’ll now have the following three executables in your path

  • osqueryctl – bash script to manage the daemon.
  • osqueryd – the daemon.
  • osqueryi – command-line client to interactively run osquery queries, view tales (namespaces) and so on.

Take a look at the example config…

cat /usr/share/osquery/osquery.example.conf

The daemon won’t start without a config file so be sure to create one first. This config file does a few thigns but will also periodcially run some queries and log them to a file. This is useful for sticking data into ELK or splunk.

cat << EOF > /etc/osquery/osquery.conf
{
"options": {
"config_plugin": "filesystem",
"logger_plugin": "filesystem",
"logger_path": "/var/log/osquery",
"pidfile": "/var/osquery/osquery.pidfile",
"events_expiry": "3600",
"database_path": "/var/osquery/osquery.db",
"verbose": "true",
"worker_threads": "2",
"enable_monitor": "true"
},
// Define a schedule of queries:
"schedule": {
// This is a simple example query that outputs basic system information.
"system_info": {
// The exact query to run.
"query": "SELECT hostname, cpu_brand, physical_memory FROM system_info;",
// The interval in seconds to run this query, not an exact interval.
"interval": 3600
}
},
// Decorators are normal queries that append data to every query.
"decorators": {
"load": [
"SELECT uuid AS host_uuid FROM system_info;",
"SELECT user AS username FROM logged_in_users ORDER BY time DESC LIMIT 1;"
]
},
"packs": {
"osquery-monitoring": "/usr/share/osquery/packs/osquery-monitoring.conf"
 
}
}
EOF

Try to start the daemon…

osqueryctl start

If you encounter this issue

/usr/bin/osqueryctl: line 52: [: missing `]'

vi /usr/bin/osqueryctl

Change line 52 from this….

if [ ! -e "$INIT_SCRIPT_PATH" &amp;&amp; ! -f "$SERVICE_SCRIPT_PATH" ]; then

to this

if [ ! -e "$INIT_SCRIPT_PATH" ] &amp;&amp; [ ! -f "$SERVICE_SCRIPT_PATH" ]; then

You should no be able to start without complaint…

osqueryctl start

Some logfiles should appear here…

ls -lh /var/log/osquery/

Use the osquery client to explore (this is akin to the mysql client). Data is presented just like a traditional sql database…

osqueryi
osquery>.help
osquery> select * from logged_in_users;
osquery> select * from time;
osquery> select * from os_version;
osquery> select * from rpm_packages;
osquery> select * from shell_history;

We can view the “schema” of certain tables like so…

osquery> .schema processes
CREATE TABLE processes(`pid` BIGINT PRIMARY KEY, `name` TEXT, `path` TEXT, `cmdline` TEXT, `state` TEXT, `cwd` TEXT, `root` TEXT, `uid` BIGINT, `gid` BIGINT, `euid` BIGINT, `egid` BIGINT, `suid` BIGINT, `sgid` BIGINT, `on_disk` INTEGER, `wired_size` BIGINT, `resident_size` BIGINT, `phys_footprint` BIGINT, `user_time` BIGINT, `system_time` BIGINT, `start_time` BIGINT, `parent` BIGINT, `pgroup` BIGINT, `nice` INTEGER) WITHOUT ROWID;

We can use familar sql operators…

SELECT * FROM file WHERE path LIKE '/etc/%';
+-----------------------------------+------------------------+------------------------------+--------+-----+-----+------+--------+--------+------------+------------+------------+------------+-------+------------+-----------+
| path | directory | filename | inode | uid | gid | mode | device | size | block_size | atime | mtime | ctime | btime | hard_links | type |
+-----------------------------------+------------------------+------------------------------+--------+-----+-----+------+--------+--------+------------+------------+------------+------------+-------+------------+-----------+
| /etc/ConsoleKit/ | /etc/ConsoleKit | . | 398999 | 0 | 0 | 0755 | 0 | 4096 | 4096 | 1467708722 | 1414762792 | 1414762792 | 0 | 5 | directory |
| /etc/DIR_COLORS | /etc | DIR_COLORS | 398851 | 0 | 0 | 0644 | 0 | 4439 | 4096 | 1475149457 | 1405522956 | 1414762782 | 0 | 1 | regular |
| /etc/DIR_COLORS.256color | /etc | DIR_COLORS.256color | 398852 | 0 | 0 | 0644 | 0 | 5139 | 4096 | 1405522956 | 1405522956 | 1414762782 | 0 | 1 | regular |
| /etc/DIR_COLORS.lightbgcolor | /etc | DIR_COLORS.lightbgcolor | 398853 | 0 | 0 | 0644 | 0 | 4113 | 4096 | 1405522956 | 1405522956 | 1414762782 | 0 | 1 | regular |
| /etc/NetworkManager/ | /etc/NetworkManager | . | 400911 | 0 | 0 | 0755 | 0 | 4096 | 4096 | 1467708722 | 1422299768 | 1436772557 | 0 | 5 | directory |

We can even join onto other tables. This query shows Linux users and their associated groups…

osquery> SELECT u.username,
g.groupname
FROM users u
INNER JOIN user_groups ug
ON u.uid = ug.uid
INNER JOIN groups g
ON g.gid = ug.gid;
+---------------+---------------+
| username | groupname |
+---------------+---------------+
| root | root |
| bin | bin |
| bin | daemon |
| bin | sys |
| daemon | daemon |
| daemon | bin |
| daemon | adm |
| daemon | lp |

This query shows some details about processes listening on ports…

osquery> SELECT p.pid, p.name, p.state, u.username, lp.*
FROM processes p
INNER JOIN listening_ports lp
ON lp.pid = p.pid
INNER JOIN users u
ON u.uid = p.uid;
+-------+---------------+-------+-----------+-------+-------+----------+--------+--------------------------+
| pid | name | state | username | pid | port | protocol | family | address |
+-------+---------------+-------+-----------+-------+-------+----------+--------+--------------------------+
| 1318 | rpcbind | S | rpc | 1318 | 111 | 6 | 2 | 0.0.0.0 |
| 1318 | rpcbind | S | rpc | 1318 | 111 | 6 | 10 | :: |
| 1318 | rpcbind | S | rpc | 1318 | 111 | 17 | 2 | 0.0.0.0 |
| 1318 | rpcbind | S | rpc | 1318 | 645 | 17 | 2 | 0.0.0.0 |
| 1318 | rpcbind | S | rpc | 1318 | 111 | 17 | 10 | :: |
| 1318 | rpcbind | S | rpc | 1318 | 645 | 17 | 10 | :: |

Show the files, and who owns them, installed by an rpm package…

osquery> SELECT p.name, p.version, pf.path, pf.username, pf.groupname
FROM rpm_packages p
INNER JOIN rpm_package_files pf
ON p.name = pf.package
WHERE p.name = 'MariaDB-server';
+----------------+---------+---------------------------------------------------------+----------+-----------+
| name | version | path | username | groupname |
+----------------+---------+---------------------------------------------------------+----------+-----------+
| MariaDB-server | 10.0.20 | /etc/init.d/mysql | root | root |
| MariaDB-server | 10.0.20 | /etc/logrotate.d/mysql | root | root |
| MariaDB-server | 10.0.20 | /etc/my.cnf.d | root | root |
| MariaDB-server | 10.0.20 | /etc/my.cnf.d/server.cnf | root | root |
| MariaDB-server | 10.0.20 | /etc/my.cnf.d/tokudb.cnf | root | root |
| MariaDB-server | 10.0.20 | /usr/bin/aria_chk | root | root |
osquery> .exit

You can see the data collected by the deamon on schedule here…

cat /var/log/osquery/osqueryd.results.log


Remove an _id field from a mongoexport json document

Although the mongoexport tool has a –fields option it will always include the _id field by default. You can remove this with a simple line of sed. This was slightly modified from this sed expression.

Given the following data…

{"_id":{"$oid":"57dd2809beed91a333ebe7d1"},"a":"Rhys"} 
{"_id":{"$oid":"57dd2810beed91a333ebe7d2"},"a":"James"} 
{"_id":{"$oid":"57dd2815beed91a333ebe7d3"},"a":"Campbell"}

This command-line expression will export and munge the data…

mongoexport --authenticationDatabase admin --db test --collection test -u admin -pXXXXXX | sed '/"_id":/s/"_id":[^,]*,//'

Results in the following list of documents…

{"a":"Rhys"}
{"a":"James"}
{"a":"Campbell"}