When it comes to VDI storage, beware of vendors trying to make you believe it’s simpler than it really is.
By Hollis Beall, Technical Director, X-IO Technologies Cloud Business Development
We have been doing a lot of performance testing and content creation for the upcoming Citrix ServTech Congress 2014 in New Orleans. One of the things that has stuck out for me (as it does every time I do large scale testing) is just how much performance the supporting operations require in addition to the expected production workload. At our labs in Colorado Springs, I’m using Login VSI to find the VSIMax value (# of desktops) for a single ISE 740 Hybrid Storage Array with Cisco UCS servers. As I have been focused on the performance of the VDI Cluster and its associated storage, there are other operations happening that stack up and amount to a more than expected performance requirement. Most in the storage industry have been designing VDI environments by only focusing on the “Normal Operations” of the Desktop, and ignore (or give little time to) these other operations that can be a big reason why VDI can be so difficult and expensive.
Often when designing a VDI solution, vendors will perform a simple calculation to come up with a performance requirement, and then size things from there. They will take the number of users, guess at a range of performance each user will require (Light, Med, Heavy, Power, Kiosk, etc), multiply these and get a range of storage performance the system should be sized for. The thing that makes this wrong, is that it’s only partially right. This range can be fine for normal operations, but every VDI environment will see other operations that can dwarf this number:
- Boot Storms can cause 10’s of thousands of IOPS with just a few hundred users (I’m getting to 60,000 Storage IOPS before the cores on the UCS ≈ 100%). Any Proof of Concept (POC) should consider booting some significant number of desktops during testing to measure the effect to the end-user experience.
- “Monday/Tuesday” Login differences can cause frustrating variations in IOPS/desktop requirements, effecting login times for all. There can be 2x longer end-user login times when profiles and other user data are copied to the desktop for the first login. This usually happens after operations like a refresh or recompose of the desktop pool, so this is going to be a common operation.
- When doing any POC testing, reboot/recompose/deploy some significant number of desktops during testing to find out how the users experience changes. While this causes a lot of IO to the storage system, the entire infrastructure is stressed during this operation. In our testing the Cisco UCS processor utilization has been the single biggest bottleneck that I have run into. (I’m getting more gear this week…)
So if you are in New Orleans next week for the Citrix ServTech Congress, please come by and talk with us at the X-IO booth. We’ll be running sessions on Storage Performance Troubleshooting, VDI Reference Architecture Reviews, How Storage makes Clouds profitable, and many more.