BD – Isilon Multi-protocol access & Hadoop

BD – Isilon Multi-protocol access & Hadoop

Surely a scale out NAS array has little to do with Hadoop?

Why not just use commodity servers stuffed full of disks for your Hadoop needs?

Well, as it happens, Isilon has a great offering in Enterprise level Hadoop solutions By “Enterprise level Hadoop”. I mean that once a business decides it needs to put Hadoop into production, it requires all of the usual business processes surrounding a mission critical application. Typical Enterprise requirements are data protection, backup, snapshots, replication, high availability and security. With Isilon, not only do you get those Enterprise level features but you also get native HDFS support just like Isilon supports SMB, NFS, FTP, NDMP and Swift protocols.

Well what does that actually mean when put into production use?

Most high end NAS solutions offer simultaneous access to the same pool of data via say an NFS share and a SMB share.

explorer nfs

 

You can look at the share in Windows Explorer (left hand image), type in  ‘ls’ on the NFS share (right hand image) and you have access to the same data. You can read or write the data via any of the supported protocols assuming of course you have the appropriate file permissions.

Isilon, being a mature, Enterprise level NAS device, also provides all of the typical file sharing protocols and features but adds more!

Isilon supports the native HDFS protocol in the same manner. If you look at the share in Windows Explorer or do a ‘hdfs dfs –ls /’ it will look the same!

The screen grab below shows the output from “hdfs dfs –ls” of the same directory as those shown above in Explorer and NFS

hdfs

This means that you can simply upload your data into your Isilon based Hadoop cluster using traditional IP based protocols and then run Hadoop queries against it straight away.  There is no need to move the data, no translation, no post processing, it is just immediately available via the other protocols. Most importantly you dont need the default triple replication of the data thereby saving a huge amount of disk space. Obviously all the usual Hadoop ingest processes such as Sqoop or Flume and also Hortonworks data Flow (HDF) can also be used but the traditional IP based protocols such as SMB and NFS are well understood and have been used to share data for years.

How can this make your life easier?

In this example workflow, you have some logs from a web server writing to an NFS share. You want to run some Hadoop jobs against the data and then view the results via a windows client. You can do this in the traditional manner and have to copy the data into HDFS and out again but with Isilon it is far simpler.

workflow

The diagram above logically shows the following workflow.

  • You write your Web server log files into an NFS share
  • You then run Hadoop queries directly against them over HDFS
  • The results from the Hadoop job are written directly into a directory via HDFS which is then also available via an SMB share to make it easy to view the results straight away as a Windows user.

There is no extra moving of data in/out of HDFS, no transferring of the results to another location. It is available via any of the protocols as soon as it is written to the underlying OneFS file system. (OneFS is the operating system that runs on each node in the Isilon cluster providing the shared single file system namespace across all nodes.)

How does Isilon achieve this very useful functionality?

Each node in the Isilon Cluster runs a HDFS daemon process that responds to HDFS protocol requests as both a NameNode and a DataNode. Those requests are “translated at wire speed”, just like any other supported IP protocol, into the associated actions/results onto the Isilon OneFS Posix file system. The diagram below shows a high level view of what is going on for well know standard IP based protocols

ip_protocol

The IP protocol talks to the Isilon via a service running on a specified standard IP port. An associated service running on the protocol specific port translates the commands/data into the appropriate action onto the underlying file system. Isilon has created an HDFS protocol translator service that responds to NameNode and DataNode requests on the default port 8082. Other HDFS services such as https and webHDFS use different port numbers.

The diagram below logically shows a 4 node Isilon cluster running the HDFS daemon on each node and acts like both a NameNode and a DataNode for all of the data in the pool.

nn-dn

  1. The Hadoop worker node, running the standard HDFS Client code, connects to the NameNode to request the location of a block/file.
    • NOTE: There is no specific code or plugin required on any client as Isilon runs a fully HDFS compliant service. You just use the default HDFS software provided by an Apache Hadoop distribution.
  2. The Isilon NameNode service provides a compliant API response with the IP addresses of three DataNodes that have access to the data requested (on Isilon, all of the nodes in the pool or zone have access to all of the data)
    • Isilon also supports “rack awareness” to return the most appropriate nodes IP addresses.
  3. The compute worker node then connects to the Isilon DataNode service running on one of the nodes specified by the NameNode to request the data.
    • The selected Isilon node collates the data from the OneFS file system and returns it to the worker node.

In the above example, the NameNode listed in the core-site.xml file is the fully Qualified Domain Name (FQDN) of the SmartConnect IP address of the Isilon.

SmartConnect is an Isilon software feature that does IP load-balancing to spread the client connections from the Hadoop worker nodes across the nodes in the Isilon cluster.

Using Isilon for the underlying HDFS file system for your Hadoop compute cluster means that in a 10 node Isilon cluster there are 10 NameNodes and 10 DataNodes to support the Hadoop Compute requirements. There is no need for Secondary NameNodes or HA NameNodes as the primary service runs on every Isilon node. Isilon does not require any tuning of the memory allocation for metadata store on the name node as that function is built into the design of the Isilon node and the OneFS file system. Obviously this solution provides an extremely high level of NameNode resilience!

Isilon adheres to the HDFS protocol standards and is thoroughly tested for each Hadoop release. For example, EMC Isilon’s HDFS protocol is tested using the same 10,000 tests that HortonWorks uses for each of its new releases.  It is backwards compatible so you can run production on a stable version and then spin up a new version, read the same data and test it out before committing production to the new version of code.

After explaining the Isilon HDFS solution to a customer a little while ago, they suggested that a good way to describe it was that “Isilon provides a first class file system for HDFS”

In summary, some of the benefits of using multi-protocol access on Isilon as your HDFS storage layer are as follows:

  • Multiple Protocol access to your data without any moving/copying of data
  • Multiple versions of Apache based Hadoop distributions supported
    • Different Hadoop distributions can have access to the same data (read only for simultaneous access). You can try out a distribution and then go back to your original supplier if it does not work out.
    • Different Versions of Hadoop can have access to the same data without copying it.
      • Note: There are a few distribution and version specific issues to be aware of such as adding different users (ambari_qa or cloudera’s manager) but fundamentally you can provide access to the data for different distributions/versions.
  • You can scale compute and storage independently. Need more capacity? add another Isilon node, need more compute? add another worker node.
  • You don’t need to replicate the data 3 x to provide data protection. Isilon is far more efficient, using FEC to protect the data. This typically provides up to around 80% usable/raw disk saving on rack space, power and cooling.

There are a number of other major benefits from using Isilon as the HDFS data store. I will describe some of them in future posts.

For more immediate queries please see the EMC Isilon Big Data Community page.

 

XJ6R – Front Suspension Rebuild

XJ6R – Front Suspension Rebuild

The front suspension had not been touched in years. The track rod ends although were probably OK wear wise but looked bad because the rubber “boots” were perished. The “boots” on the lower ball joints looked bad too. Springs, shock absorbers and all the bushes looked rusty and worse for wear.

Lower Ball Joint - split rubber boot
Lower Ball Joint – split rubber boot
Lower Ball Joint - split rubber boot
Lower Ball Joint – split rubber boot

I therefore decide to replace all rubber bushes and ball joints and clean up as I went along. First thing was to order all the associated parts from a few suppliers attempting to get the best price and availability. I had already changed the front subframe bushes with poly bushes however I went with standard rubber for the rest.

Of course I could not resist cleaning and painting along the way and so it took a lot longer than I thought it would. I have still one side to reassemble but the drivers side is now complete. It looks reasonably good even if I say so myself. during this rebuild, I am not after concourse or any thing close to it. I just want it to not look rusty and work the way it should.

There were a few minor “challenges” along the way. Please be very careful removing the springs as with even all the weight of the car on one spring it still has a lot of tension forcing the spring tray downwards. I did not have the correct spring compressor so I used a jack, a number of G cramps and a threaded rod to remove and re-assemble the springs. Some of that pressure did damage the threads so I had to replace some bolts. I did clean up and repaint the springs and the surrounding metal work too. The spring trays were full of rust and road debris. It took a lot of cleaning away of the debris before I could even separate the springs from the tray. The lower fulcrum shaft on the drivers side was a bit of a pain to remove. Unfortunately I did damage the thread a little in my efforts to remove it.  Luckily re-cutting/cleaning up the thread with a die managed to save it. It was not the cost of the replacement of the shaft that was the issue but more of the fact it had a 4-5 week lead time. (I have since seen some in stock! and half the price)

Post Re-Conditioning

Here are a few photos of the state of the suspension before I started

Top of spring assy - a little rusty
Top of spring assembly – a little rusty and you can see the perished rubber on the bump stop.
Top of Suspension - Rusty and worn Shock Absorber. Remember to clean up around the upper ball joint and take note of the number of spacer used so you can put them back in correctly
Top of Suspension – Rusty and worn Shock Absorber. Remember to clean up around the upper ball joint and take note of the number of spacer used so you can put them back in correctly
Anti-Rool Bar Bushes
Anti-Roll Bar Bushes
Top wishbone bush
Top wishbone bush – not too bad looking. There was a fair amount of wear internally though
Passenger side suspension
Passenger side suspension

Lower ball joint replacement

As from the first pictures in this post, you can see the split ball joint covers, if nothing else needed replacing. Removal of the lower ball joint was pretty straight forward and resulted in the items pictured below along with the more modern, single piece replacement on the right hand side. You do have to remove the metal ring insert prior to fitting the new style ball joint.

Lower Ball Joint Disassembled
Lower Ball Joint Disassembled
Ball Joint components
There was some damage caused to the metal ring during removal in this image. This is the ring that needs to be removed to make way for the replacement ball joint. It will not be used again.
Ball joint with insert removed
Top Ball joint goes in here with insert removed. The ring insert needs to be pushed out downwards/outwards

The only non standard thing I did was to not insert all of the spacer rings at the top of the springs. The XJR engine is an Aluminium block rather than the cast block of the original XJ6 engine.  It seemed to make sense that the front of the car would therefore sit a little higher with a lighter engine installed. With that in mind I left out the two, quarter inch thick nylon/plastic ring spacers back on the top of the springs. I might come to regret that decision so we will have to wait and see.