Introduction and Document Objectives
The desktop virtualization market is expected to grow to 1.8 billion dollars over the next three years, representing a 250% growth from 20111. This is a major shift in IT operations around the world and a major change in operational IT design; but what is driving this shift? Explosive IT growth followed by economic recession sets the stage for an imminent technology refresh and, by today’s standards, more and more companies are being asked to do more with less. By leveraging the consolidated horsepower of servers, improved performance of communications networks and harnessing the power of virtualization, the desktop technology refresh can be answered in an entirely new and more efficient way, while allowing commercial PCs to be given an extended lifetime and still taking advantage of desktop software and performance improvements.
To accelerate productivity, then end-user experience must be rapid and by improving desktop responsiveness or the perceived latency in the click of a mouse, the login of a machine and loading of an application must be improved for desktop virtualization to be a success. Virtual Desktop Infrastructure (VDI), an organized base of remote desktops which are server hosted and provisioned, is no exception to performance requirements. One of the most important elements in delivering VDI is storage performance, and a great deal of confusion is apparent in today’s storage marketplace with regards to how storage bottlenecks in VDI infrastructures can be addressed.
With the dramatic rise in processing power available through general purpose x86 computing, organizations have been centralizing services and taking advantage of virtualization tools in order to simplify IT operations. However, a huge gap between processor performance and increases in traditional storage performance has resulted in bottlenecks occurring in designed and implemented solutions.
This document’s purpose is to outline available options that address VDI storage performance, provide an overview of X-IO’s unique approach to address some of the financial and growth challenges that are presented, and to offer some guidelines when designing such an infrastructure.
1 Morgan Stanley Research
Performance Challenges with VDI
There are several challenges with VDI solutions, in the area of storage, which can cause implementations to fall short of their target. With the centralization of users’ data also comes a centralization of the number of I/Os required to be handled by any solution, normally measured by I/Os per second or “IOPs.” This requirement, coupled with the usually truly random workload of a VDI solution, has caused a great deal of confusion in regard to designing appropriate storage infrastructures.
With the arrival on the market of a plethora of Solid State Disk (SSD) based solutions, many VDI and storage designers are left scratching their heads as to the most appropriate tool to use in order to remove this bottleneck. When looking at options for data storage, or in fact any IT solution, most organizations are looking at three primary elements: the effect on cost, the effect on risk, and any constraints or effect on growth. When solutions are being designed, depending on the organization’s current drivers, these can often be weighted. However in current economic conditions, we see most customers looking for a truly balanced approach. This section outlines the ability of each of the options to provide a balanced approach.
Option 1 – Adding Solid State Drives to Existing Enterprise Storage
Most traditional storage vendors now offer the ability to add SSD drives to existing Storage Area Network (SAN) Storage; however this is often limited in number due to possible back-end bandwidth limitations.
The process often requires manual movement of data between Tier 0 (SSD) and Tier 1 (HDD) LUNs which can add not only complexity but also a degree of risk as this prediction can invariably be wrong and/or frequently changing data can happen in truly random workloads.
The alternative is to combine multiple tiers of disk with automated tiering software that uses activity based tiering; however many organizations have discovered that activity is not a particularly good metric for true random I/O environments. Furthermore, the software licence and maintenance costs for these add-on elements often dramatically increase the cost of the solution.
Ultimately this option may present what appears to be the simplest and relatively lowest risk approach to address such a bottleneck due to the use of existing storage management practices. However, it actually is a costly way of addressing the issue and would constrain future growth due to the technical challenges involved.
Option 2 – Adding Solid State Based Cards to Hosts
Opening servers and adding such PCI Express cards to the storage design, often believed to be the simplest way to address performance challenges, can significantly limit growth and functionality of any end-to-end virtualized solution.
Whilst this approach can be useful for single-application, single server environments, it is not well suited to enterprise solutions. This is due to the constraints found particularly in virtualized environments where tools such as vMotion cannot be deployed as a result of captive storage.
Similar to option 1, should growth be required then storage or application administrators will need to manually move data between tier 0 and tier 1 and/or utilize a costly data movement tool.
Ultimately this is a high cost approach (due to the cost per GB of the cards themselves). It can constrain not only direct data growth but also solution growth; however, it is perceived to be low risk, as only the host configuration is being modified, not the enterprise storage architecture itself.
Option 3 – Utilizing Pure SSD Arrays
The last 12-18 months have seen a plethora of solid state arrays come to market. If we put aside the risk that comes with dealing with a “start-up” organization (such as service capabilities, warranty fulfilment, company stability, etc.) and look at the technology alone, then there are some good technologies coming to market, however at significantly higher cost.
All SSD arrays can provide extremely high I/O rates; however, they are really designed for one job – high I/O, low latency data delivery. In some cases, such as low latency trading, the cost outlay (often in excess of $100k/TB) can be justified, however this type of cost can very simply, minimize and distort the cost justification for VDI projects.
Due to the high cost, pure SSD arrays are deployed in very small amounts, leading to the issue of either manually tiering the data or deploying costly software to attempt to predict data usage, something which we know cannot address the truly random workloads usually seen with VDI platforms. In addition, many organizations are merely trying to achieve low latency storage without the need for 500,000+ IOPs. When solutions such as this are deployed, the I/O performance can be overkill when merely lower latency storage is required.
Ultimately this is probably the most unbalanced approach of the four. It requires extremely high capital expenditure, operating expenditure, presents the most risk to businesses and is growth constrained.
Option 4 – Utilize Hybrid Storage
In order to address the challenges found in virtualized environments, we not only need to achieve balance amongst the core elements of cost, growth and risk, we need to provide an approach that enables both predictable and unpredictable workloads to be addressed without manual intervention.
X-IO believes that solid state storage is merely one tool that can be utilized to address performance issues challenging storage designers, rather than the solution itself.
By combining SSD and HDD into a single pool, the most appropriate tool can be utilized, be it cache memory, solid state storage or traditional hard disk storage. The key element here is the ability to move data between different tiers, in real time, without any manual intervention, but based upon I/O activity rather than trying to predict workloads based upon file activity.
X-IO calls this Continuous Adaptive Data Placement (CADP), and it provides an architecture that “fuses” SSD and HDD, placing “hot” active data onto SSD. CADP performs a Return on Investment (ROI) calculation and will only place data onto SSD if the application will experience a real performance improvement with almost no overhead to the storage system.
Data storage is still architected in a traditional manner, as a simple Fibre Channel array, however at a much lower capex price point than an all SSD array. Hybrid storage also lowers risk as existing storage management practices (including virtualization tools such as vMotion) can still be utilized. Growth can be achieved using a modular approach rather than the traditional “big bang” approach of wide-striped enterprise storage or the complex SSD card approach.
Hybrid storage is the only methodology to address the storage architecture challenge presented by VDI workloads that provides true balance of cost, growth and risk.
Moving a large number of users to a virtual desktop requires consolidated capacity and performance. A general rule of thumb for user performance sizing is an average of 5-8 IOPs per light user (single application user), 8-20 IOPs per medium user (administrative staff) and 40-90 IOPs per heavy user (developer). In a large VDI environment, with a high level of concurrency, a one thousand desktop solution can generate 8,000 to 20,000 IOPs for a medium user on average. Capacity for this type of solution can vary depending on whether the desktops are masters (aka linked clones) or dedicated. One thing is for certain, the number of HDDs needed to reach the capacity requirement would likely be far less than those required to reach the performance goal with an added buffer of performance to protect for transient spikes in performance.
With traditional arrays, techniques such as short stroking (where only a portion of the disk is used to enhance performance) would be used to reach the performance requirements while stranding a large amount of capacity. With newer high performance SSD based arrays discussed in option 3; even data without lower resource needs sits on the most expensive storage devices. It is all or none and cost can outweigh the choice between big and slow versus small and fast. X-IO’s approach is to utilize the Hyper ISE product which provides an I/O density (I/O per GB) that is very efficient and because CADP adapts to changes in workload, up to 95% or more of the capacity on the system can be used without the traditional decline in performance. The remaining capacity from the virtual desktop images and differencing disks can be used to create user profile shares, or better yet, volumes for other applications. Hyper ISE ensures VDI solutions run more efficiently and at a price that is competitive with traditional all-HDD solutions.
Transient Performance Spikes and Unpredictable User Workloads
Server virtualization performance is easy to measure, making the application workload fairly predictable in nature. Desktop virtualization requires the ability to predict human behaviour patterns; and collecting this information from many physical desktops and laptops per user can be a daunting task, leaving a VDI implementation to be very unpredictable at first.
General guidance states that read/write ratios of initial application runtime can be about 50/50, as the write workload is elevated beyond normal while profile and registry information is written. After the initial run it can drop to about a 20/80 ratio, where there is a heavy write workload. Another common increase in read patterns can be caused by virus scanning or search engines. Outside of these scenarios a nominal runtime VDI read/write ratio is about 30/70. This represents changes in an environment that can come and go quickly, but reoccur as desktops are released back to the pool and recreated for the next login.
Hyper ISE is designed to learn and understand shifts in performance so that only the hot spots caused by I/O storms are placed on SSD using CADP. All volumes on the Hyper ISE are constantly monitored and decisions are made to place segments of data in intervals of seconds. Each data placement uses ROI calculations to determine if the application will benefit from running on SSD, or if the data is already on SSD, determine if placing the data back on HDD, making room for hotter data, would be the right thing to do. Either way, Hyper ISE is always monitoring and moving the data automatically between the HDDs and SSDs in the pool, without requiring storage administrators to set up tiering policies.
High Read Strain on Common Pooled Desktop Master Images
A smart way to manage the burden of capacity in a VDI solution is to create a common master desktop image, which includes the operating system and application software, and mitigate duplication by only creating differencing disks for each virtual desktop. This enables tremendous savings in capacity because for hundreds of desktops, the most common data like the Windows and Program Files directories are not duplicated, and any new or changed data like registry information, profiles and application data are kept in the differencing disks. A drawback to this approach is that the master image will see massive read requests as 100s of virtual desktops start up and log in. During desktop boot-up 90–95% of I/Os are reads and the faster the I/O response time, the more desktops that can be booted per minute, in preparation for user logins. User logins also have a high read I/O workload as profiles are set up and registry information is written. In practice, the read ratio during login turns out to be between 80 and 95%.
Particular problem areas for storage administrators with VDI environments are:
- Linked-clones and master copy images – Linked clones and master copy images will typically see transient spikes of high read I/O.
- Login storms – Login storms can change a heavy read intensive environment into a heavy random write intensive environment as well as create unexpected peaks caused by offline email caching, search engines and virus scanners.
- Application storms – Similar to login storms but can demand heavy read and write I/O when multiple clients are simultaneously accessing a specific application such as Microsoft SharePoint.
Again, these workloads are well suited to X-IO’s Hyper ISE, as CADP can identify these high I/O workloads and service them from the high I/O, low latency solid state storage where appropriate; but then, as I/O intensity decreases, move this data back to traditional hard disks to allow for higher priority data to be serviced from SSD at other times. This adaptive data placement all occurs in the background and requires no intervention from users or storage administrators whatsoever.
A large number of X-IO’s customers have expressed their dissatisfaction with their traditional storage arrays due to the degradation of performance as the array capacity is consumed. One of the biggest issues with any storage system is sustaining performance over its entire lifetime. A storage system can start out really fast and then performance falls off a cliff. IT professionals find themselves spending hours trying to tune their systems to squeeze out better performance, often with very little success.
Most storage systems, including SSD-based systems, begin to hit a performance wall as data consumes up to 50-70% of its storage capacity. Hyper ISE, on the other hand, retains high performance throughout its entire lifecycle. With some storage solutions that utilize short-stroking, often storage capacity has to be kept at about 25% in order to maintain optimal performance. That means some customers have to acquire and manage four times the amount of storage capacity needed to store their data in order to provide the requisite application performance. Hyper ISE provides high performance even when capacity reaches 100% utilization.
With Hyper ISE, users will not “feel” drive failures, RAID rebuilds, data migrations, mirroring or high levels of capacity utilization. These events are not often measured or discussed when referencing storage performance, yet they are common occurrences.
All of the design elements of Hyper ISE, including the way it handles vibration, heat dissipation, data placement, drive reliability and the intelligence of CADP, result in sustainable and self-optimizing performance. Additionally, the disk drives within Hyper ISE are not treated as individual and disposable components, but rather as a single organism. As such, they work congruently, resulting in higher performance and reliability.
Hyper ISE – Enterprise-class Reliability
Yes, performance is important, but reliability is requisite. X-IO’s ISE technology has active-active controllers and has thousands of systems deployed worldwide, supporting mission-critical environments.
As organizations add more Hyper ISE storage systems to the environment, reliability at the drive level increases, whereas the opposite is true with other storage systems. Due to the integration of the physical drives with the Hyper ISE controllers, reliability of disk devices is an “order of magnitude” greater than other vendors that treat drives disposable components. X-IO has developed intellectual property based on Seagate hard disk drives that creates a level of reliability that other storage systems simply do not have. Consider the following analysis:
- For all other storage vendors, reliability of drives is measured in Annual Failure Rates (AFR) of approximately 5% to 7%. Therefore with more drives come less reliability and more “service events.”
- 1,000 Drives with an AFR of 5% (best case), results in 4 drive failures/month (or 1 a week).
- If the environment is scaled to 10,000 drives, this results in 41 drives failures/month (or 10 a week).
- Hyper ISE is designed to achieve 99.999% reliability.
Power and Floor Space
Each Hyper ISE only requires 3.3 amps (@ 240v) and 600 watts of power. Traditional Enterprise storage systems require special power that typically requires multiple 30 amp circuits. This results in separate racks for storage since the power requirements exceed what is available in a typical rack.
Each individual Hyper ISE can be added to existing racks and utilize existing power. The cost for extra power circuits is significant when you have to go above what is offered for a standard server rack.
Hyper ISE is also very space efficient. Example of scale: half rack (21u) of Hyper ISE storage systems results in the following:
- Seven dual controller Hyper ISE storage systems
- 56 x 8Gbps host ports
- 100TB of storage capacity
- Up to 1.4 million IOPs potential
- This configuration would require ~23 Amps (@ 240v), so this can be done with just 2x 30 amp circuits and would consume 4.2Kw/hour of power. At $0.09/Kw/hour this would cost just $9.07/day (or $272.10/month) in power costs.
The imbalance between server and storage performance will inevitably become greater and greater as CPUs continuously increase in speed while disk drives lag behind. The relentless growth of data, combined with the need to use and analyze this information to drive the business, stresses the situation even further. One side of the data center (servers) is enabling a new era in IT and the other side (storage) is inhibiting it. However, SSD drives are not a panacea that will magically change all of this by their very existence. Rather, they are a valuable component in an overall solution that considers not just performance but price, performance and capacity.
Unlike pure SSD solutions or exotic Massively Parallel Processor (MPP) and grid architectures (all of which take a cost-is-no-object approach to giving your database applications the I/O they need), Hyper ISE intelligently applies SSD where SSD is required, uses conventional hard drives everywhere else, and allows organizations to keep the significant server and RDBMS architecture investments in place.