At VMworld 2012 we announced the early availability program for the VNX Storage Analysis suite for VMware vC OPS (long name..)
what it means is that selected customers can get their hand on the GA code of the product before it’s official release sometime later this year, in order to participate in this “early adopting” program, you (the customer) will have to contact the EMC Sales Rep or the local vSpecialist.
let’s take a step back to recap:
While day-to-day storage management in a virtualized environment has been greatly simplified, thanks to Unisphere integration with VMware, storage administrators have a need for advanced monitoring and real-time analytics. In a dynamic environment, traditional monitoring tools are insufficient as storage administrators are required not only to troubleshoot and quickly resolve performance and capacity anomalies but to anticipate and head off problems before they impact business operations.
VMware vCenter Operations was designed for dynamic environments to dramatically simplify and automate Operations Management. vC Ops uses patented analytics to provide the intelligence and visibility needed to proactively ensure service levels, reduce risk of downtime and optimize virtualized environments for efficiency and cost.
Now EMC and VMware are teaming to combine the patented analytics and rich visualization of vCenter Operations Management Suite with the storage intelligence of the VNX series. This joint effort will result in the extension of a VMware administrator’s view of VNX systems as well as a new VNX Storage Analytics Suite designed to provide storage administrators intuitive, configurable dashboards with drill down and patented analytics for VNX storage.
The new VNX Storage Analytics Suite is based on vCenter Operations Management Suite and provides VNX customers with the rich functionality of vCenter Operations, such as powerful visualization and topology views and patented analytics, combined with rich VNX storage metrics in a stand-alone, customer-installable software suite.
The Storage Analytics Suite enables storage administrators to proactively optimize management of their VNX storage and improve end user service levels with actionable performance and capacity analytics and sophisticated health monitoring and diagnostics. The software supports block and file storage as well as displaying EMC FAST Cache configuration and performance and FAST VP tiers and policies.
• Storage admin focused
• EMC storage performance and capacity analytics
• VMware vC Ops analytics platform
• VNX custom dashboards and reports
• Physical and virtual environments
Additionally, the new VNX Connector provides the same comprehensive VNX storage analytics, as well as customized dashboards, into vCenter Operations Management allowing VMware administrators to obtain complete end-to-end analytics from the VMs to the VNX storage arrays.
• IT infrastructure admin focused
• Full function vC Ops
• “Cross-Domain” VMware virtual environments
• Storage analytics through VNX custom dashboards and reports (VNX Connector)
You can view heatmaps of array performance and capacity availability across multiple systems. These views enable the administrator to quickly view the most relevant anomalies and take action to resolve issues quickly.
The VNX storage connector brings in all the resources from the array and builds the right parent-child relationships from the datastores down to the component LUNs, pools, and disks.
What we see here is a dashboard dedicated to datastores.
The health of each datastore is determined by vcops by evaluating all the associated metrics and the health of the child objects. Anomalies – breaches of dynamic thresholds or static-set Key Performance Indicators (KPIs) will affect health. The more active anomalies, the greater the impact. A low health score will not always indicate a problem, but will call out that something is operating outside its established norm.
Here we see that the health of the performance datastores are low. Both of these datastores are actually luns in the same pool, but they are datastores presented to 2 different clusters.
If you look at the graph, you can easily see that the KPI of Total Latency is breached – see the yellow highlight. Looking at Performance_DS_01, the top graph, the line is completely in the highlighted area, whereas Performance_DS_02 is touching the lower limit of the highlighted area, showing that it is just over the set limit.
The heat map shown is sized by throughput, and colored by response times experienced by the respective VM. This widespread latency being experienced prompts us to take a look at the backend storage.
From this block performance dashboard, what we see is that pool1’s health score is low. We have a graph in the bottom right-hand corner showing pool throughput for the two pools on this array – you can see that the IO load on the pool has breached the threshold.
First, we bring to attention the primary statistic in the scoreboard widget that tells us there is a problem – the only one not green, the Performance Datastore Pool. We see that utilization is over 80%, and knowing that by rule-of-thumb, whenever any component is over 70% utilized, we tend to see performance drop dramatically. This is easily the cause of poor response time.
We know that we need to relieve the load on the disks – so there are generally only a few options – move some load elsewhere or add more disks. But in this case, we can see that we have some disks not being used – our FAST Cache… At 7.6 IOPs, you can tell that nothing is going on with these, so the pool probably does not have FAST Cache enabled.
This screen is approximately 25 minutes after enabling FAST Cache on the pool.
For our purposes, the workloads generated from the VMs were small-block random, and over a 10GB area of a virtual disk on the datastore. Using 4 200GB SSDs for FAST Cache, we had more than enough room for the entire working set on the pool. A short while after this, the entire workload should be in FAST Cache, and there would be very little IO on the actual pool disks.
During a warm up of FAST Cache, you may see utilization of the source disks increase, along with some host response times – this is because data is being promoted (copied to the FAST Cache) after the 3rd hit of the same 64k chunk and this requires disk activity.
As the 64k chunks are remapped to FAST Cache, load is being removed from the source disks, and total array throughput increases dramatically, as the capabilities of the SSD disks far surpass that of the spinning media.
The 4 SSDs in this scenario achieved around 70,000 IOPs on the backend, these are not host IOs. FAST Cache is mirrored, so all writes require 2 IOs each, and there are operational overhead IOs occurring for things such as promotions, cleaning operations, and updates to the memory map. In our case we end up servicing over 20k IOPS to our hosts, a vast increase.
Here we now see that all VMs are green on the heatmap, indicating a lower latency. Total latency is now at approximately 3ms on average for the datastore.
attach below is a demo walking through the pain points and the Adapter integration: