During july a lot of people gathered somewhere in Boston to run a pretty interesting POC, it was about running a pretty heavy 2,000 users workload running in a XENDESKTOP 5 environment which is provided by a Vblock 300 series. while I already published some VNX results for CITRIX XENDESKTOP, I would like to use this post to also publish some real world considerations about the deployment.
the environment overview:
A Vblock 300 GX – A full public overview can be seen from the link below:
ESX CPU Performance considerations:
The ESX’s that hosted the VDI VM’s were based around CISCO B200 with 96GB and a CPU of 2.93 GHz of RAM, this server was able to run 70 users with high workload without any problem at all, each VM ran with 1vCPU and 2GB of RAM, the ESX balloon driver worked hard but that’s what it suppose to do, so we performed a heavy memory over commitment which worked great and really provided an added value with a server that theoretically doesn’t have enough RAM (96GB of RAM) to host 70 Users each (70vms X 2gb RAM X guest overhead). no VMKernel disk swapout was noted.
80 Users on this server configuration started to show some high &RDY& values which LoginVSI didn’t complain about but I thought that they will be too high for a real world implementation so 70 remained the sweet spot
below: the environment running 2,000 users, 70 users per B200, the reason that you see 2,500 users is that at some point someone decided to stress the environment even further..I love this job!
below: the RAM usage
Below: A B200 with 70 users running heavy workload profile, note that the %RDY% is still within the “OK” limit
below: B200 with 80 users running heavy profile workload. the %RDY% is starting to look bad..
we then started to experience with a B230 server that has far more RAM (192GB to be exact) which did carry more user workload – around 83 but the TCO for this type of server Vs the B200 isnt worth it so B200 it is!!
Anti Virus Considerations:
Ok, this is a very hot topic to discuss, there are a lot of new “specific to VDI” AV solutions out there, the one that we used here wasn’t based around the vShield Endpoint API and boy, you could tell that
attached below, you can see a B200 that was perfectly capable of running 70 users before AV and now it runs 70 users with AV on, the showed the ugly head of %RDY again, so PLEASE, PLEASE PLEASE, make sure you evaluate a proper AV solution or otherwise, you are going to pay (literally speaking) a lot for the overhead that non VDI AV will bring to your environment..to be fair with the AV vendor that was used here, I wouldn’t mention the company name as the customer turned to them and they said that are in a work on a better solution..
So, it’s one thing to preach your customers about EMC FAST CACHE:
and It’s another thing to actually eat your own dogfood with no net and see the numbers in real action, so let’s start
Read / Writes:
VDI workload (as oppose to the common belief) tend to have a very high Write percentage, how much exactly, well, it depends, I’ve seen numbers varies from 40-60% for writes, so make sure your storage array cache support both read AND writes caching technologies.
on the figure below, you can see Writes peaking from 40-60% during the test
let’s see some more numbers from the FAST CACHE perspective:
Booting simultaneously 2000 VMs on VNX 5700 (16:32)
80K IOPS in total (40K each SP).
A great response time !
A great Fast Cache utilization: (yes, 1.000 actually means near 100% writes were caches while almost 87% of the reads were absorbed by the FAST CACHE..yep, I know it sounds crazy but in a good way!
Total SP’s Utilization, they remained well below 80% (great!!!, in fact if you take a closer look you can see that the average utilization was more in the region of 60-65% utilization, also, as a real life consideration, it is very rare that all the users will work in 100% concurrency, not to mention that loginVSI simulate users that are doing heavy tasks again and again and again…my bottom line is that this VNX is far more capable in real life scenario and you can probably far more users..
Network Performance Analysis – Ethernet/IP Network.
We used the below LAN topology network for vBlock internal and external connectivity..
- All links that were used, inside and out of the vBlock were running at 10 Gbps speed.
- Connectivity between the blade server’s chassis and the UCS 6100s was 80 Gbps. It was formed from a total of 8 uplinks of 10 Gbps using FCoE (Ethernet and Fibre channel on same link). All links were active. We used copper SFP+ cables.
- Connectivity between the two UCS 6100 and the two Nexus 5000 were of 80. we used total of 4 uplinks of 10Gbps, for Ethernet only, to connect a UCS 6100 to a Nexus 5000. We used copper SFP+ cables.
- From the Nexus 5000 to the Nexus 7000 we used total of 4 uplinks of 10Gbps Ethernet. 2 from each Nexus 5000. We used single Nexus 7010. We used optical cables.
- To monitor traffic between the vBlock and the external “user network” we engineered all Ethernet traffic to one of the uplinks only (The red dot at the topology map above). This interface is “Ethernet 1/37” on “Nexus 5020-A-2”
A snip showing 640 Mbps from the test:
Key findings and conclusions:
- The maximum traffic that was observed on this link, at the heaviest test, with 2000 VDIs, was 700 Mbps, sustained rate. This load represents an average of 350Kbps traffic for each VM.
- During all tests, the internal links, between the chassis and the UCS6100, and between the UCS6100 and the Nexus 5000 did not hit a load of more than 10% of a specific link. This means that the internal network, carrying LAN and SAN traffic is adequate for current and future needs.
- This findings means that the customer plan to connect the vblock to its current network with a total of 8 links of 1 Gbps, with link aggregation technology should be adequate for the expected traffic needs. Nevertheless it is strongly advised to upgrade the customer LAN switches to support 10Gbps interfaces connectivity to the vBlock.
Network Performance Analysis – Fibre-Channel/Storage Area Network.
Pushing the envelope..
at some point we wanted to push the storage to hold more than 2,000 users and so we loaded the Vblock with 2,500 users:
now, one would expect adding a quarter of the original load to add at least 25% to the SP’s utilization, right?
below, you can see the SP’s utilization, only a slight increase of the original 2,000 user workload, this is to do with Mr. FAST CACHE
below, you can see the FAST CACHE utilization, almost 100%..now that’s really cool (at least in my mind..)
So, in the post i wasn’t trying to cover all the aspects but just to show you some of the highlights that a Vblock can offer to your VDI enviornment, may it be a CITRIX or VMware, both of them are running in a very similar workload, CITRIX mcs works very similar to VIEW linked Clones and the Uset / CPU core ratio works the same..the end results is that a Vblock will ALWAYS generate you the same results over again and again in the same way you know that buying a car model from one place or the other will work the same. different Vblocks can also be managed from one console (UIM) which allow you to quickly deploy different Vblocks from one interface, imagine this, you have 6 Vblock that are all used for VDI with similar workloads running in different sites, you can create one service offering and basically clone it to the remote Vblock, let UIM do the heavy lifting for you (SAN,NAS and ESXi deployments) and you are done!
This type of work is never a one man mission, a lot of people were involved in order to make this POC a successful one, i would like to give some credits to:
Miri Weiss Korn – EMC TC
Max (Hi Guys, this is max speaking!) Fishman – EMC TC
Gadi Feldman – CITRIX Consultant
until next time..