Building a Better Boot SAN for Microsoft Video Services
Big performance in a small footprint
If you enjoy TV or movie content from the Zune® Video Marketplace or the Xbox® console, you’re watching content that has been processed by Microsoft® Video Services. The Microsoft Video Services team ingests the raw media content from Hollywood and prepares it for presentation on your PC, Xbox console, Microsoft Windows® Phone—whatever device you’re using. From a processing perspective, it’s a monumental task: the raw data stream for a single TV show may be 20 gigabytes (GB) in size, and hundreds of files are streaming into the labs each day. Each file must be tagged, transcoded, and resized for presentation in a range of different formats and resolutions.
In the face of growing demand for video on demand, the Video Services team was moving its operations onto a private cloud computing environment comprised of more than 300 Cisco Unified Computing System (UCS) servers. UCS combines computing, storage access, network, security, and management into a fabric architecture that provides an optimal environment for Microsoft enterprise applications. The team was confident that when these computational workhorses were up and running, they would be well-suited to the task of preparing and delivering content for consumers. The challenge at hand, however, lay in getting these servers up and running.
Firing up the cloud faster
The private cloud platform that the Video Services team is using relies on servers that do not have dedicated boot drives. Every server boots from a shared device that resides on the storage area network (SAN) called a Boot SAN.
“We had just bought into the idea of using the UCS as a platform for video encoding,” explains Chris Lisica, the operations lead for Microsoft Video Services, “and we needed an elegant Boot SAN solution. We had made the decision to stop buying physical hard drives to support the individual servers, and we wanted to move to a SAN topology. The problem was that our initial exploration of Boot SAN solutions from many of the traditional storage vendors was turning up solutions that were very expensive, that took up a lot of space in a server rack, and that, frankly, did not deliver very much performance.”
So when Lisica and his team learned that X-IO might have a solution that required only a fraction of the rack space, was far less expensive, and could boot hundreds of UCS servers simultaneously—and in only minutes—they came over to building 25 on the Redmond campus to speak with the X-IO personnel in residence at the Microsoft Partner Solutions Center (MPSC). The solution that Lisica had heard about—the X-IO Hyper ISE—had not yet been formally released, but X-IO was working with the MPSC to harden the device for production deployments. Was there an opportunity to see how well the X-IO Hyper ISE might work in the kind of high-performance private-cloud environment that the Microsoft Video Services team was building? The answer to that was obvious: Definitely.
The Benefits of a Combined Offering
The X-IO Hyper ISE is a performance-driven storage system designed to support more than 200,000 input/output operations per second (IOPS) from a single unit that takes up only three units (3U) of rack space. It fuses 14.4 terabytes (TB) of solid state disk (SSD) and hard disk drive (HDD) technologies into a single pool of storage. It also takes advantage of patented Continuous Adaptive Data Placement (CADP) algorithms to analyze I/O activity and move hotspot data onto SSD only when measurable performance gains will be achieved. All other data is stored on the HDD spindles.
Right from the start, the X-IO Hyper ISE gained points with the Video Services team because of its small footprint. Other Boot SAN solutions they had examined would require 12U or more of rack space—significantly more than the 3U required by the Hyper ISE. The Hyper ISE was also priced thousands of dollars lower than other solutions they had looked at, and that too was attractive. But the performance of the Hyper ISE was the most impressive aspect of this new Boot SAN solution.
“We performed thousands of reboot tests to test the IOPS,” says Craig Simpson, the operations staffer in the Microsoft Video Services that has been overseeing the rollout of the unified computing environment and the integration of the Boot SAN with the UCS servers. “We saw incredible performance during those tests. We could reboot 100 servers at once and we measured 16,000 IOPS for around 15 seconds—and then it was done.”
And the 100 server simultaneous reboot from the Boot SAN is no exaggeration. Simpson explains that part of the testing program in Video Services involved determining how well the Boot SAN technology would help the group manage patches and other issues. There are times when machines need to be taken offline for an update or repair, so how the Boot SAN handles those operations is important.
But even with automated tools in a Boot SAN to handle server shutdown and restarts, neither Simpson nor Lisica wanted to be shutting down and restarting servers one at a time—yet that is precisely how other Boot SAN vendors prescribed that it should be done to avoid overwhelming the Boot SAN. Not so with X-IO: “With the X-IO Hyper-ISE, we can boot up 100 servers as quickly as we can boot up a single server,” says Simpson. “That was very important to us, because on Patch Day at Microsoft we have to restart all the servers—and we really did not want to have to do staggered reboots.”
In addition to facilitating the efficient management of large numbers of servers, Simpson and his team also discovered that the X-IO Hyper ISE can dramatically reduce boot time on a per-server basis. “When we’re imaging a server with Hyper ISE,” says Simpson, “it only takes about three minutes once you touch the disk. That’s very fast—with some other solutions we looked at, that might take as long as 13 minutes.”
Building a Better Boot SAN
Putting the Hyper ISE through its paces in the Video Services environment was not without its moments. Because the product was new and because elements were still in development, there were early glitches that had to be sorted out. From the point of view of the team at Microsoft Video Services, X-IO seemed to treat each glitch as an opportunity to turn a problem into a customer service success.
“We did run into some problems with early versions of firmware,” Simpson recalls, “and X-IO responded with great agility and speed. Over the course of two months, X-IO probably sent us two or three quick fix firmware releases—and they always came overnight. They were very prompt. Since then, the Hyper ISE has proven to be one of the most resilient pieces of hardware that we have in the labs.”
X-IO has also designed the Hyper ISE to be self-monitoring. The unit automatically sends a health report to the X-IO team each day. If it detects any kind of problem in the interim, it automatically sends a message to the system administrators.
“We can log into the device and send a telemetry report up to X-IO for analysis,” says Simpson, “and they’ll let us know what’s going on. The support team at X-IO actually can look at those reports and understand what’s going on—and that’s not that common.”
A Bright Future in the Cloud
Today, the private cloud based on Cisco UCS servers is in place and operational at Microsoft Video Services. Lisica and his operations team have organized the UCS environment into logical pods associated with specific lab activities, and different pods have different profiles and needs. Of the 300-plus UCS servers in Video Services, all are controlled by four separate Hyper ISE Boot SAN devices.
“We could have shared the Hyper ISE hardware,” says Lisica. “The decision to use four separate units had more to do with the logical structure of the UCS pods we created. They all do different things, and since the Boot SAN is integral to the operations of each pod it made sense to use a separate Boot SAN for each one.”
“Ultimately, we have the architecture in place for an infinitely scalable video lab,” Simpson goes on to say, “and the X-IO Hyper ISE provides the power, performance, and flexibility we need to support that.”