BD - Building a test Hadoop Cluster

Building a test Hadoop Cluster

I have been working with HortonWorks recently and they wanted to see the installation process of an Hadoop Cluster using Isilon as the storage layer and any differences from a standard DAS based install. I ran into numerous issues just getting the Linux hosts into a sensible state to install Hadoop. It thought I would summarise some of the simple issues that you should try to resolve before starting to install Hadoop.

Initial starting point (HortonWorks instructions and Isilon specific set up instructions)

I built a few VMs using Centos 6.5 DVD 1
- Selected a Basic Database Server as the install flavour
- Choose a reasonable sized OS partition as you might want to make a local Hadoop repository and that is a 10GB tar.gz file download. You have to extract that so over 20GB is needed to complete that process. I ended up resizing the VM a couple of times so I would suggest at least 60GB for the Ambari Server VM including the local repository area.
- You might want to set up a simple script for copying files to all nodes or running a command on all nodes to save you logging into each node one at a time.Something a simple as ( for nodes in 1 2 3 4 5 6; do; scp $1 yourvmname$nodes:$1 ; done) will save a lot of time.
- Set up Networking and Name resolution for all the nodes and Isilon (use SmartConnect)
- Enable NTP
- Turn off or edit the IPTABLES setting so the nodes get access to the various ports used by Hadoop
- I needed to update the Openssl package as the Hadoop install process fails quite a few steps along the way and you may run into other issue if you restart the process again. (# yum update openssl)
- Disable the transparent huge pages (edit the /boot/grub/grub.config file and reboot)
- Set up password less root access for the Ambari server to the other compute nodes in the cluster
The only real changes during the Ambari based install process occur during the initial set up as per below:
- Add all the compute and Master nodes into the install process and use the ssh key.
- Go to the next page so they all are registered and install the Ambari client.
- Then press the back button, add the Isilon FQDN to the list with a manual (not ssh login) and then continue.
- Later, during the services/node selection process, just have the NameNode and Datanode services on the Isilon only.
- Just follow the install process (change the repository to a local one if you set that up) I did, as my link to the remote repositories was limited to around 500k so it took ages to install multiple nodes without the local options.

I now have two Hadoop clusters up and running and using Isilon as the HDFS store so more to play with 🙂

Jaguars and Technology Related Passions

BD – Building a test Hadoop Cluster

Building a test Hadoop Cluster

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Categories

Archives

Meta