Uncategorized

A Tale of Two Architectures — Engineering for the 99.999% versus the 0.001%

Today’s enterprise storage arrays follow one of five different architectural approaches to clustering their controllers for high availability.  We’ve represented them here in order of increasing complexity (from a “how difficult is it to build” perspective), robustness, and performance.

 

  1. Dual-controller, Active/Passive Model:  This design uses two controllers.  In normal operations, one of the two controllers owns all the I/O processing and data services management (active), while the second controller stands by (passive), ready to take over in the event of failure.  Some variations allow I/O to be received and transmitted on the passive controller’s host ports, but the controller itself does no I/O or data services processing.  Rather it simply passes the I/O over an internal connection to the active controller, which is responsible for all data services (like deduplication, thin provisioning, snapshots, RAID etc.) and all read/write processing to the underlying flash media.  This requires the use of ALUA (Asymmetrical Logical Unit Access) to define the preferred path to the active controller and the non-preferred path to the passive controller.  There is no customer-facing advantage to this model.  The only advantage is to the vendor because this HA model is much simpler to implement and helps achieve a fast time-to-market.  The big disadvantage, especially with flash, is that the active controller quickly becomes the performance bottleneck while the resources of the passive controller are wasted sitting idle.  Active/Passive HA architectures were the original HA mechanism developed in the 1980

01

2. Dual-controller, Dual-Active Model:  This design is an improvement on the active/passive model by using both controllers to perform the full range of data activities.  The distinguishing feature is that each controller is the master for a subset of volumes and stands by to take over its partner’s volumes in case of failure.  This is simpler to implement than full active/active HA (where any volume is accessible on any port on any controller and there is no path or performance preference) and is still a popular model in the enterprise storage marketplace.  The drawback with this model is that any given volume is still limited to the resources of a single controller, and in case of failure performance of the array can be cut in half.  Another issue is that the administrator must pay attention to the workload on each controller and manually assign volumes to maintain balance.  Dual-Active architectures became popular around the turn of the century.

02

3. Dual-controller, Active/Active Model: This is the most advanced form of dual-controller architecture where both controllers are active in data services and I/O processing, there is no assignment of volumes to controllers, and any host can access any volume through any port on any controller without path or performance penalty.

03

4. Multi-controller, Asymmetric Dual-Active Model:  This model is similar to dual-controller dual-active except that now there can be more than two controllers.  Volumes are still owned by a particular controller and there is still path and performance preference.  The advantage of this architecture is aggregate scale – higher levels of total capacity and performance under a single point of management.  However, any particular volume is still limited to the performance of the controller that owns it.

04

5. Multi-controller, N-way Symmetric Active Model:  This is the XtremIO’s approach – and it is both the most sophisticated and has the most customer benefits.  In this model, the storage array scales to N controllers (as of Q4’13 N=8 and we will be increasing N in subsequent releases).  During normal operation, all controllers are actively serving I/Os and processing data services and all controllers evenly share the load.  If one of the controllers fails, the remaining controllers continue to serve all the I/Os without loss of service.  During a failure, N-1 controllers remain to service client requests. All purchased hardware is actively working to service client requests, the system scales linearly with more controllers, and degraded performance is proportional to cluster size.  For example losing 1 out of 4 controllers causes a maximum 25% loss of resources while losing 1 out of 8 controllers causes a maximum 12.5% loss of resources.  This is the most advanced and resilient HA mechanism available today and this architectural model is shared with EMC’s flagship VMAX arrays.  Most vendors are not capable of the engineering sophistication it takes to achieve this architecture.

05

 

It’s important to note that these various architectural models are truly deep-rooted architectural decisions and it is practically impossible to start lower on the food chain and move up.  In fact, in the history of enterprise storage we can’t think of a single instance of it happening with any vendor.  You can always add features to a storage array – but architectural decisions have a way of sticking around.

While all the above architectures are HA, the practical difference between active/passive and N-way active are huge.  Let’s assume both systems have 99.999% availability (5 nines) so we can compare apples to apples. In the dual-controller, active/passive model, you have a controller (that you paid for) sitting there doing nothing 99.999% of the time.  Your performance is half of what it could be 99.999% of the time too.  On the other hand, the N-way active model gives you the full performance of all the controllers 99.999% of the time. The aggregate performance of the N controllers (even if N=2) is much higher than what you can get from a single active controller in the active/passive model.

IO Distribution Across multiple Controllers

Taken from a real environment running Load on thousands of VMs, watch the IO Distribution !!

06

Of course, the argument for the active/passive model is that during the rare event of failure in the active controller, the passive controller will become the new active, thus maintaining the same performance as before the failure.  However, the big problem in this argument is that it comes at a huge cost — you are wasting 50% of your performance 99.999% of the time.  You maintain the same (50%) performance in the event of a controller failure, which has a 0.001% chance of happening.  Five 9’s of availability translates to a little more than 5 minutes of down time per year.  Making a design choice to sacrifice a huge amount of performance 364 days, 23 hours, 55 minutes a year in order to gain an advantage in the remaining five minutes a year doesn’t exactly jive with storage efficiency claims.  And if you think about it – an active/active dual-controller system operating with a failed controller will have exactly the same amount of system resources during those five minutes a year as the active/passive design – 50%!  So in fact you only gain the perception of maintaining performance because the performance level was half as great to start.

Active/passive is a 1980s design.  It made sense when processors were much slower, low-latency was limited to the backplane, and coding for HA was still in its infancy.  Now there are much better alternatives.  An N-way active design gives you full performance from all N controllers 99.999% of the time.  In the rare event of a controller failure or planned maintenance, service continues uninterrupted at (N-1)/N of original performance.

So which is the smarter design?  Do you want to design or buy a storage system that wastes 50% of its performance for 99.999% of the time just to keep the same 50% performance for an additional 5 minutes a year?  Or would you rather have the high performance of a linearly scaling cluster 99.999% of the time, with a manageable drop in performance the other 0.001% of the time (and remember the drop leaves you no worse off that the healthy performance of the active/passive array design)?


Imagine a car manufacturer sold you a car that has a V8, but it only ever runs on 4 cylinders and goes 50MPH – the other 4 cylinders you paid for don’t add any horsepower.  They don’t make the car go faster.  But if one of the active four cylinders has a failure, you’ve got some spares and can keep driving 50MPH.

Meanwhile, your friend bought a car that can have a V8 engine.  Or a V16.  Or a V32.  His car runs on all 32 cylinders and can go 400MPH.  If some cylinders fail he has to temporarily slow down to 350MPH – but this only happens 0.001% of the time.  Which car do you want to drive when you need to get someplace?  Even 350MPH is 7X faster than your car can ever go.

Of course, the 400MPH V32 car is XtremIO, except it has not 99.999%, but 99.9999% availability.  So it runs up to 8x faster 99.9999% of the time, and runs 7x faster during the other 0.0001%.  The race is won.  Hands down.

Other than the performance advantage, there are many other reasons why the scale-out, N-way active architecture of XtremIO is preferred over the dual-controller, active/passive approach.  First, is the ability to size your deployment based on I/O or capacity requirements.  With XtremIO, every X-Brick has two controllers and can perform 250K fully random 4K read IOPS, and 150K fully random 4K mixed read/write IOPS.  If a customer needs 300K mixed IOPS, he can choose a two X-Brick cluster.  If he has a workload requiring 600K mixed IOPS, he can configure a 4 X-Brick cluster.  This level of flexibility is not possible with any dual-controller architecture.

Another big advantage of XtremIO’s N-way active scale-out design is that there are never “stranded IOPS”.  Large capacity SATA disk drives suffered from “stranded capacity” – the drives got so big (but no faster) and eventually larger drives were useless because there wasn’t enough speed to move bulk data on and off the drives.  All-flash arrays have a similar problem.  There are so many IOPS available on the SSDs that the array controllers quickly become a bottleneck and leave IOPS “stranded” in the array.  But not with XtremIO.  Every 25 SSDs are balanced by two active controllers, allowing the collective system to deliver higher and higher levels of performance as capacity grows.

Customers often find this scale-out capability is exactly what they need when they size their database, server virtualization, and VDI infrastructures. We believe this is very important for all-flash arrays.  Performance is one of the primary reasons customers buy flash.  If you have a ceiling for performance while claiming you can grow capacity, there is a problem.  Some vendors will dismiss this by saying flash is so much faster than disk that performance is plenty even if bottle-necked by two controllers.  But history has shown that application developers quickly consume every bit of performance available to them and quickly ask for more.  What seems fast today can quickly be not enough tomorrow – especially as workloads begin to consolidate onto flash arrays.

If there are so many advantages to the XtremIO architecture over the dual-controller, active/passive approach, shouldn’t we expect every vendor to add this capability?  Most things are possible with enough time and money.  But architectural changes at this level are rare, disruptive, and expensive.  You always want to start with the right architecture from the very beginning.  We invested significant time and resources to do this with XtremIO.  It wasn’t the fastest path to market, but it did allow us to deliver a superior product.  It means you can count on superior performance all the time with XtremIO – even under worst-case failure conditions our N-way active scale-out model delivers the same performance as healthy dual-controller all-flash arrays.  The rest of the time it’s not even a close race.

 

 

Categories: Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s