BD – Building a test Hadoop Cluster

Building a test Hadoop Cluster

I have been working with HortonWorks recently and they wanted to see the installation process of an Hadoop Cluster using Isilon as the storage layer and any differences from a standard DAS based install. I ran into numerous issues just getting the Linux hosts into a sensible state to install Hadoop. It thought I would summarise some of the simple issues that you should try to resolve before starting to install Hadoop.

Initial starting point (HortonWorks instructions and Isilon specific set up instructions)

  • I built a few VMs using Centos 6.5 DVD 1
    • Selected a Basic Database Server as the install flavour
    • Choose a reasonable sized OS partition as you might want to make a local Hadoop repository and that is a 10GB tar.gz file download.  You have to extract that so over 20GB is needed to complete that process. I ended up resizing the VM a couple of times so I would suggest at least 60GB for the Ambari Server VM including the local repository area.
    • You might want to set up a simple script for copying files to all nodes or running a command on all nodes to save you logging into each node one at a time.Something a simple as ( for nodes in 1 2 3 4 5 6; do; scp $1  yourvmname$nodes:$1 ; done) will save a lot of time.
    • Set up Networking and Name resolution for all the nodes and Isilon (use SmartConnect)
    • Enable NTP
    • Turn off or edit the IPTABLES setting so the nodes get access to the various ports used by Hadoop
    • I needed to update the Openssl package as the Hadoop install process fails quite a few steps along the way and you may run into other issue if you restart the process again. (# yum update openssl)
    • Disable the transparent huge pages (edit the /boot/grub/grub.config file and reboot)
    • Set up password less root access for the Ambari server to the other compute nodes in the cluster
  • The only real changes during the Ambari based install process occur during the initial set up as per below:
    • Add all the compute and Master nodes into the install process and use the ssh key.
    • Go to the next page so they all are registered and install the Ambari client.
    • Then press the back button, add the Isilon FQDN to the list with a manual (not ssh login) and then continue.
    • Later, during the services/node selection process, just have the NameNode and Datanode services on the Isilon only.
    • Just follow the install process (change the repository to a local one if you set that up) I did, as my link to the remote repositories was limited to around 500k so it took ages to install multiple nodes without the local options.

I now have two Hadoop clusters up and running and using Isilon as the HDFS store so more to play with 🙂

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.