VMware vSphere 6.7 U1 – A Major enhancement is coming to PSP Round Robin

During VMworld 2018 US, VMware announced their upcoming vSphere 6.7 U1 version which you can read the highlights of it here

https://blogs.vmware.com/vsphere/2018/08/under-the-hood-vsphere-6-7-update-1.html

what they didn’t covered in that post, is their upcoming changes to the Path Selection Protocol (PSP) when used in conjunction with the Round Robin (RR) algorithm.

Up until now, when you wanted to have the best performance out of your storage array, you could either

  1. Install a third-party SATP like PowerPath/VE
  2. If using the default SATP adapter, change it’s policy from “fixed” or “MRU” to Round Robin
  3. If using Round Robin, many arrays (including XtremIO), recommend to change it to IOPS=1 which means that every one command, the traffic will go to another path in a round robin manner as can see in the screenshot below

While Round Robin is highly used in vSphere based environments, it has one drawback which is, it doesn’t know to detect if the paths are congested or not.

Well, in vSphere 6.7 U1, there is a massive (and a positive change) to Round Robin, it can now take into an account the latency measured.

https://storagehub.vmware.com/export_to_pdf/vsphere-6-7-core-storage-1

With the release of vSphere 6.7 U1, there are now sub-policy options for VMW_PSP_RR to enable active monitoring of the paths. The policy considers path latency and pending IOs on each active path.
This is accomplished with an algorithm that monitors active paths and calculates average latency per path based on either time and/or the number of IOs. When the module is loaded, the latency logic will
get triggered and the first 16 IOs per path are used to calculate the latency. The remaining IOs will then be directed based on the results of the algorithm’s calculations to use the path with the least
latency. When using the latency mechanism, the Round Robin policy can dynamically select the optimal path and achieve better load balancing results.
The user must enable the configuration option to use latency based sub-policy for VMW_PSP_RR:

esxcfg-advcfg -s 1 /Misc/EnablePSPLatencyPolicy
To switch to latency based sub-policy, use the following command:
esxcli storage nmp psp roundrobin deviceconfig set -d <Device_ID> –type=latency
If you want to change the default evaluation time or the number of sampling IOs to evaluate latency,
use the following commands.
For Latency evaluation time:
esxcli storage nmp psp roundrobin deviceconfig set -d <Device_ID> —
type=latency –latency-eval-time=18000
For the number of sampling IOs:
esxcli storage nmp psp roundrobin deviceconfig set -d <Device_ID>
type=latency –num-sampling-cycles=32
To check the device configuration and sub-policy:
esxcli storage nmp device list -d <Device_ID>
Usage: esxcli storage nmp psp roundrobin deviceconfig set [cmd options]
Description:
set Allow setting of the Round Robin path options on a given device controlled by the Round Robin Selection Policy.

Cmd options:
-B|–bytes=<long> When the –type option is set to ‘bytes’ this is the value that will be assigned to the byte limit value for this device.
-g|–cfgfile Update the config file and runtime with the new setting. In case device is claimed by another PSP, ignore any errors when applying to runtime configuration.
-d|–device=<str> The device you wish to set the Round Robin settings for. This device must be controlled by the Round Robin Path Selection Policy (except when -g is specified)(required)
-I|–iops=<long> When the –type option is set to ‘iops’ this is the value that will be assigned to the I/O operation limit value for this device.
-T|–latency-eval-time=<long> When the –type option is set to ‘latency’ this value can control at what interval (in ms) the latency of paths should be evaluated.
-S|–num-sampling-cycles=<long> When the –type option is set to ‘latency’ this value will control how many sample IOs should be issued on each path to calculate latency of the path.
-t|–type=<str> Set the type of the Round Robin path switching that should be enabled for this

device. Valid values for type are:
bytes: Set the trigger for path switching based on the number of bytes sent down a path.
default: Set the trigger for path switching back to default values.
iops: Set the trigger for path switching based on the number of I/O operations on a path.
latency: Set the trigger for path switching based on latency and pending IOs on path

-U|–useano=<bool> Set useano to true, to also include non-optimized paths in the set of active
paths used to issue I/Os on this device, otherwise set it to false

The diagram below shows how sampling IOs are monitored on paths P1, P2, and P3 and eventually selected. The time “t” sampling window starts. In the sampling window, IOs are issued on each path in Round Robin fashion and their round-trip time is monitored. Path P1 took 10ms to complete in total
for 16 sampling IOs. Similarly, path P2 took 20ms for the same number of sampling IOs and path P3 took 30ms. As path P1 has the lowest latency, path P1 will be selected more often for IOs. Then the sampling window again starts at ‘T’. Both “m” and “T” are tunable parameters but we would suggest to
not change these parameters as they are set to a default value based on the experiments ran internally while implementing it.


Legend: T = Interval after sampling should start again
m = Sampling IOs per path
t
1 < t2 < t3 —————> 10ms < 20ms < 30ms
t
1/m < t2/m < t3/m —–> 10/16 < 20/16 < 30/16
With the testing, we found that with the new latency monitoring policy, even with latency introduced up to 100ms on half the paths, the PSP sub-policy maintained almost full throughput.
Setting the values for the round robin sub-policy can be accomplished via CLI or using host-profiles.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s