Install HBase 0.92.1 for Cloudera Hadoop (CHD4) in Pseudo mode on Ubuntu...
Introduction HBase is a tabular-oriented database that runs on top of HDFS. It is modeled on Google’s BigTable. In this post, I’m going to install HBase in Pseudo mode, so please use these instructions...
View Articlehdfs dfsadmin -report
Introduction hdfs dfsadmin -report outputs a brief report on the overall HDFS filesystem. It’s a userful command to quickly view how much disk is available, how many datanodes are running, and so on....
View Articlehdfs dfsadmin -metasave
Introduction hdfs dfsadmin -metasave provides additional information compared to hdfs dfsadmin -report. With hdfs dfsadmin -metasave provides information about blocks, including> blocks waiting for...
View ArticleHow to view files in HDFS (hadoop fs -ls)
Introduction The hadoop fs -ls command allows you to view the files and directories in your HDFS filesystem, much as the ls command works on Linux / OS X / *nix. Default Home Directory in HDFS A user’s...
View ArticleInstall Pig 0.9.2 for CDH4 on Ubuntu 12.04 LTS x64
Introduction Installing Pig is drop dead simple. Installation sudo apt-get install pig Check the Pig version. pig --version Setup the Environment We’re going to set the environment variables...
View ArticleHow to add numbers with Pig
Introduction We’re going to start with a very simple Pig script that reads a file that contains 2 numbers per line separated by a comma. The Pig script will first read the line, store each of the 2...
View ArticleDebugging HBase: org.apache.hadoop.hbase.master.AssignmentManager: Unable to...
Introduction I ran into an annoying error in HBase due to the localhost loopback. The solution was simple, but took some trial and error. Error I was following the HBase logs with the following...
View ArticleHBase Command Line Tutorial
Introduction Start the HBase Shell All subsequent commands in this post assume that you are in the HBase shell, which is started via the command listed below. hbase shell You should see output similar...
View ArticleUndersting the Hadoop High Availability (HA) Options
Once you start to use Hadoop in your day-to-day business operations, you’ll quickly find that uptime is an important consideration. No one wants to explain to the CEO why a report is not delivered....
View ArticleHadoop Distributions
The following is a repost of my answer to a question on LinkedIn, but I thought it may prove useful to people evaluating Hadoop distributions. The following is a substantially over simplified set of...
View Article