Archive for the ‘Big Data’ Category

EFK: Free Alternative to Splunk Using Fluentd

Here is an updated version of the instructions given at Free Alternative to Splunk Using Fluentd. The installation was performed in CentOS 6.5. 1. Install ElasticSearch mkdir /opt/src cd /opt/src wget rpm -ivh elasticsearch-1.2.1.noarch.rpm /sbin/chkconfig –add elasticsearch service elasticsearch start # Move default file locations if required mkdir /data/elasticsearch mkdir /data/elasticsearch/data mkdir /data/elasticsearch/tmp mkdir /data/elasticsearch/logs […]

Compiling Hadoop example

I’m working through some of the examples in this Hadoop book. I’m a little rusty on compiling java programs and had a little trouble with this one so I’m documenting it here for anyone else how might be having issues. Firstly, I tried compiling the examples like this; javac That wasn’t too successful; […]

Hadoop VersionInfo Issue on OpenSuSE 12

I was getting the following error when attempting to run hadoop version. The java class is not found: org.apache.hadoop.util.VersionInfo Unable to determine Hadoop version information. ‘hadoop version’ returned: The java class is not found: org.apache.hadoop.util.VersionInfo This was due to having the OpenJDK installed rather than the one from Sun/Oracle. To resolve this simply uninstall the […]

Preparing the NCDC Weather Data for Hadoop

I’m exploring Hadoop with the book Hadoop: The Definitive Guide. Appendix A shows how to download NCDC Weather data from S3 and put it into Hadoop. I didn’t want to download from S3 or load the entire dataset so here’s what I did instead. Here’s a little bash script I used to download the data. You […]

Getting started with Hadoop

I wanted to get started playing about with Hadoop but had trouble installing Cloudera’s CDH. As I only wanted to have a working version of Hadoop for development purposes I decided to skip using Cloudera’s distribution and go direct to the Apache Hadoop release. Here’s the process I went through to set it up on OpenSuSE […]