Archive for the ‘Data’ Category

Remove an _id field from a mongoexport json document

Although the mongoexport tool has a –fields option it will always include the _id field by default. You can remove this with a simple line of sed. This was slightly modified from this sed expression. Given the following data… ?View Code JSON{"_id":{"$oid":"57dd2809beed91a333ebe7d1"},"a":"Rhys"} {"_id":{"$oid":"57dd2810beed91a333ebe7d2"},"a":"James"} {"_id":{"$oid":"57dd2815beed91a333ebe7d3"},"a":"Campbell"} This command-line expression will export and munge the data… ?View Code BASHmongoexport […]

Elasticsearch: Turn off index replicas

If you’re playing with elasticsearch on a single host you may notice your cluster health is always yellow. This is probably because your indexes are set to have one replica but there’s no other node to replicate it to. To confirm if this is the case or not you can look in elasticsearch-head. In the […]

Kibana splits on hostname

If you’re playing with Kibana and you notice any Pie charts splitting values incorrectly, i.e. on a hostname with hyphen characters, then here’s the fix you need to apply. It’s actually something elasticsearch does… curl -XPUT http://localhost:9200/_template/syslog -d ‘ { “template”: “*syslog*”, “settings” : { “number_of_shards” : 1 }, “mappings” : { “file” : { “properties” […]

EFK: Free Alternative to Splunk Using Fluentd

Here is an updated version of the instructions given at Free Alternative to Splunk Using Fluentd. The installation was performed in CentOS 6.5. 1. Install ElasticSearch mkdir /opt/src cd /opt/src wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.noarch.rpm rpm -ivh elasticsearch-1.2.1.noarch.rpm /sbin/chkconfig –add elasticsearch service elasticsearch start # Move default file locations if required mkdir /data/elasticsearch mkdir /data/elasticsearch/data mkdir /data/elasticsearch/tmp mkdir /data/elasticsearch/logs […]

Get TFL Tube data with Powershell

The London Datastore has loads of datasets available that we can use for free. One of the datasets available is a list of TFL Station Locations. The station location feed is a geo-coded KML feed of most of London Underground, DLR and London Overground stations. Here’s Powershell script that will extract this data from a […]