The 10 Key Server Virtualization Unknowns, and What to Do About Them

The 10 Key Server Virtualization Unknowns, and
What to Do About Them

16 January 2009

George J. Weiss

Gartner RAS Core Research Note G00164385

As infrastructure virtualization hype runs rampant, does anyone know whether total or comprehensive virtualization will benefit all enterprises? Can we predict best-case/worst-case behavioral effects to create service-level-agreement-predictable deployments? Here are 10 crucial factors and advice.

Overview

This research assumes virtualization is not boundless, frictionless and beneficial to any scale. If we cannot deterministically predict the results, then virtualization simulation, what-if analysis and tests will become essential and integral processes prior to deployment, or failures will result.

Key Findings

With side-order effects potentially creating unexpected results, the way to plan for best- and worst-case scenarios is to adopt a set of tools, good design, best practices and analytics to minimize worst-case scenarios.
Resource planning is more than capacity planning. Other variables include licensing, human capital, total cost of ownership (TCO), facilities and energy, change management, and life cycle administration, which will likely create tool fragmentation.
The combinatorial explosion that could occur as vendors deliver on readily available virtual machine (VM) mobility to maximize resource utilization could create management chaos without an ordered methodology during the planning process.
Searching for the right tools and creating an integrated and logical networkwide view will be one of the toughest challenges facing IT architects.

Recommendations

Large global enterprises should:

Scale virtualization ambitions into digestible chunks (for example, by division, department, subnet, cluster, related applications and specific projects).
Look not only for virtualization-specific tools but also for tools that can manage real, physical and dedicated environments.
Expand capacity planning virtualization to include human capital management, operations management, security management, cost optimization, compliance management, etc., as interdependent and related variables as part of a broader resource planning function.
Begin thinking of new job titles and descriptions addressing the analytics, simulations and scenarios of a broader-scale approach to virtualization.

What You Need to Know

The VM equation should not be restricted to only a few variables focusing on capacity, for example. Security, compliance, ownership, configuration management, change management and facilities management, among others, can be affected. VM proliferation, totality and comprehensiveness, while promising strong benefits (largely based on early user experiences in limited cases), may also create an array of new problems, such as bridging heterogeneity effects, network traffic collision, input/output (I/O) performance efficiency and response among a host of other behavioral effects. Big-scale VM projects must search for and incorporate intelligent planning tools, many of which will still be premature until 2010 and later.

Return to Top

Analysis

We all know that virtualized infrastructures increase capacity utilization. Traditional server infrastructure tightly couples applications to hardware, wasting computing capacity whenever applications use less than 100% of system resources. Virtualized infrastructures decouple applications from hardware, freeing the excess capacity for use by other applications. Gartner clients have reported that single virtualized servers often support ratios from as little as five to 12 VMs on a single server to as many as 70. The result is the ability of IT to consolidate server infrastructure, reduce capital costs associated with server acquisitions and data center infrastructure, and reduce operating costs with improved management, maintenance and energy consumption.

The flip side of the coin is that rapid application provisioning and delivery can create new costs and risks. Reducing the friction from application deployment will likely increase pressures on new application deployment and demand. The well-known real-world equivalent is known as "server sprawl." The analog in the VM world will be known as "VM sprawl." The consequence presents a plenitude of unknowns. Organizations must recognize that they not only may be exchanging one cost and management style for another, but that as physical machines are turned into virtual machines, virtual sprawl is likely to outstrip physical sprawl. Moreover, how VM sprawl acts on the network, storage, I/O and compute power across a network of server nodes, and in an increasingly dynamic model of services, will be a complex multivariant problem.

In addition to being a capacity-planning problem, it presents an integration and architectural problem. For most IT organizations, virtualizing the infrastructure will coexist with the physical and dedicated parts of the infrastructure. Many applications will require the resources of multiple physical servers with the number varying over time, not purely a slice of a single physical server. Invariably, IT organizations will be confronted with some tools that handle dedicated physical environments only, some that handle virtualized environments only, and some that handle both. Some of these issues may be mitigated by intelligent tools that understand the total infrastructure and can move workloads around. However, operational considerations, management and ownership, combined with policies, compliance and security, may restrict movement or, at a minimum, create such complexity that IT operations may decide to keep it simple and minimize too much virtualization.

For example, organizations must understand trade-offs in availability, recoverability, reliability, asset management, agility, cost/chargeback and so on. Most of the tools have emerged piecemeal, many with limited functionality, but good enough for early-stage virtualization deployments. In a few years, when massive virtualization is driven to the top of many IT agendas, many of these tools may not own up to the wider breadth of demand. As dynamic workload placement and increased optimization step to the forefront, the pure performance and utilization aspect will be one of many capabilities that need to be integrated into a unified management console. Tools will need to be self-learning, mindful of security, availability and even power-consumption requirements. Equally important will be the need for transforming data collections into a data repository, linking applications, compliance requirements, and business and performance demands to enable data center planners and architects to create a comprehensive framework for IT resource planning and daily management.

Here, we provide data center planners with a tactical planning guide and insight into the complexity of much higher scales of virtualization. We list the unknowns, or the unpredictables, that inevitably will arise as unintended consequences of interactions, computing and operational patterns that will become combinatorially explosive. From the list, IT organizations will have to create individual parts of a strategic plan that can address each of the potentially unknowable elements in terms of best- and worst-case outcomes. With this advice and these recommendations, IT organizations should be able to minimize surprises and risks as they embark on ever-increasing scale in combined physical and virtual infrastructures, while striving for the ideal of real-time response and service levels.

Return to Top

Tactical Guidelines

1. Can 100% of applications be virtualized, and even be practical, in the next few years?

Best case: Perhaps as much as 80% is virtualizable as infrastructure technology improves. Recognize that some organizations — by size, type of business or type of applications — may have done nearly a complete job of virtualization (at least at the more basic consolidation level).
Worst case: Too many VMs that are controlled by management create side effects and inconsistencies that mitigate against secure and predictable service-level agreement (SLA) services. Moreover, inconsistent pricing, licensing and support by independent software vendors (ISVs) will likely exacerbate software management.
Advice: Don't plan for pervasive virtualization in the next two to three years, unless you can create or employ a third-party behavioral simulation model. Equally important, plan a pervasive model of virtualization by understanding the management, software licensing and process before executing an aggressive implementation. Anticipate and solve problems before they arise. Setting up a test lab environment that can measure all consequences of pervasive virtualization will be impractical. Mileage may vary, but a more reasonable target in this time frame will be a 50% to 70% enterprise-scale virtualization range.

Return to Top

2. How will VM proliferation affect IT organizations' abilities to impose rules and policies?

Best case: Many tools already exist, but only a small percentage will be mature and proven in highly scalable environments.
Worst case: Too many VMs will create a combinatorial explosion of poorly managed VMs.
Advice: Seek and evaluate tools that cover the most important elements for rule-based deployment, including capacity, utilization, performance, configuration, change management, life cycle tracking and compliance.

Return to Top

3. What infrastructurewide behavioral effects can occur as VM creation escalates to thousands of VMs?

Best case: Behavioral effects can be compounded or minimized as a function of how well IT infrastructure and resource planning is thought out and executed.
Worst case: Results may suffer from some or many of the following effects:
- Poor application stacking and isolation
- Overstressed I/O
- CPU performance degradation
- Volatile usage patterns
- Poor workflow and process design
- Wide geography placement
- Poor chargeback
- Poor security
- Poor choice of system and server elements
The consequences of just a few or one of these disturbances can have rippling and tangential effects, including VM migration thrashing. In addition, motion such as VMware's VMotion requires all systems to be on the same subnet, which contributes to degraded performance.
Advice: Good results can only be achieved by meticulous planning around a number of variables, such as placement of subnets, multitiered workflow, optimized near/far resource access, overlaid on common resource pools and technologies, while exercising control over mobility. This is probably the most difficult and complex situation for which tools do not yet exist. Massive VM deployment turns architectural design into a high art waiting for math wizards who know how to apply complex multivariant analysis from their studies of macro- and microeconomics — possibly the next important wave in IT job skills.

Return to Top

4. How and when will dynamic real-time virtualization with continuous load balancing become feasible (if at all)?

Best case: We're probably not going to see such patterns for at least five years or until enough experience exists in solving the VM design problems already mentioned.
Worst case: Too many VMs will create a combinatorial explosion, making real-time, dynamic load balancing on an enterprise scale riddled with holes. For example, real-time deterministic applications of ultra-time-sensitive response times such as military-aerospace, medical systems, air traffic control and financial instruments could be jeopardized.
Advice: Approach this problem by breaking it down into synergistic workloads and applications related to common management tools for simplicity. Devise an enterprise master plan based on evolutionary progression. Overly complex configurations and massive scaling ambitions likely will drain resources and end in failure. It will be crucial to put processes in place that can confirm that hardware operations have been completed at a specific time within specified SLAs and are not disrupted by unrelenting VM proliferation.

Return to Top

5. Can predictable mission-critical service levels with root-cause analysis be applied to VMs in a dynamic and mobile environment?

Best case: Expect more sophisticated tools to address this issue in the next two to three years; but, at present, significant limitations exist.
Worst case: Monolithic, complex end-to-end workflows will likely cause unpredictable outages where current diagnostics are incapable of penetrating the fog of abstracted layers between hypervisor, operating system (OS), application, VM monitoring over WANs and heterogeneous storage. Add to this server heterogeneity, where virtualization exists not only on x86 platforms but across a range of rack- and frame-based x86 and non-x86 platforms, and SLA predictability may deteriorate commensurate to the complexity.
Advice: The best practical solution may be a preproduction test environment aimed at subsets of common resources that can be aggregated in a modular and nondisruptive way.

Return to Top

6. How will virtualization impact ISV licensing terms and practices as resource allocation becomes increasingly dynamic and distributed?

Best case: ISVs with new entry products capitalizing on VMs will seek the advantage over traditional ones by favoring software-as-a-service (SaaS) delivery models.
Worst case: Expect ISVs to resist price deflation as a result of VMs over current physical infrastructure contractual terms. Users will be left with audits, penalties and lack of beneficial savings in moving VMs around clustered resources.
Advice: Search for alternative and effective solutions through cloud services or from SaaS suppliers. Large and influential enterprises should push vendors toward consumption-based usage models of pricing.

Return to Top

7. How will TCO change as VMs proliferate and scale throughout the enterprise, and what are the main dependencies?

Best case: Through a progression of stepwise VM evolutions accompanied by an integrated and automated tool suite, the benefits of rapid VM provisioning and coordinated life cycle management will translate early, project-defined TCO benefits into broader enterprise TCO benefits.
Worst case: Too many disorderly or loosely generated VMs will create tool fragmentation, more hires and potentially lowered availability levels adding to TCO.
Advice: Seek tools that are extensible, integrable and holistic as key factors in improved TCO. Expect standards such as Open Virtual Machine Format (OVF) and embedded chip instructions to overcome some heterogeneous effects, such as crossover hypervisor network segments.

Return to Top

8. How will virtualization impact application design and development to suit modularity, security, OS affinity and configurability?

Best case: Expect application developers to slowly catch up with new software paradigms introduced by virtual design changes in which consolidation remains a continuing driving force in virtualization. Other approaches will exploit VM appliances to selectively simplify infrastructure support.
Worst case: Workloads are heavily resource dependent and tend to exhibit a lot of inertia in moving from resource to resource.
Advice: Build skunkworks and R&D projects as part of competitive transformation projects on a foundation of modular fabrics as they emerge in the next five years.

Return to Top

9. What network constraints will arise from end-to-end application latency as large numbers of VMs are rapidly provisioned from repositories?

Best case: Improved networking technologies — such as application delivery controllers, intelligent network interfaces, WAN optimization controllers and server protocol offload — will ease some of the burden. Application latency and response to end users, however, will remain a constant challenge as data centers push the envelope of virtualization breadth.
Worst case: Haphazardly spawning VMs as part of new application environments, such as browser-based applications and Web services and service-oriented architectures, will put a significant burden on the underlying server and network infrastructures. Guaranteeing SLAs and quality of service (QOS) will be increasingly difficult as VMs migrate over WANs, depend on the Internet and move into clouds.
Advice: No VM deployment or architecture plan should occur without participation of network, server, storage and security teams. Servers should be offloaded of network acceleration and QOS functions, and allowed to perform their primary function — dealing with business logic and serving content.

Return to Top

10. How will storage management evolve to monitor and manage the expected strong growth induced by VM-driven storage during rapid VM provisioning within and beyond the enterprise (such as clouds)?

Best case: The need to reduce TCO drives purchase of tools that increase storage operational efficiency and enable higher utilization of the storage infrastructure than current levels that are in the 20% to 40% range for common direct-attached storage (DAS). Deduplication and thin provisioning combined with virtualization will address ways to make better use of current storage resources, despite challenges presented by VM proliferation.
Worst case: VM proliferation will increase the complexity of managing heterogeneous storage resources (that is, sustaining IT silos) and raise obstacles toward an integrated storage management logical view of a networkwide storage infrastructure, further contributing to problem identification and performance-tuning diagnosis challenges.
Advice: VM proliferation will make storage management tools to manage the complexity introduced by increased proliferation more compelling, and it will change the dynamics, increasingly favoring storage management integrated with virtualization system management. Capacity efficient techniques — such as thin provisioning, space efficient snapshots, compression and data deduplication — that are gaining traction in the physical server world will prove even more beneficial in the server virtualized environment.

Return to Top

� 2009 Gartner, Inc. and/or its Affiliates. All Rights Reserved. Reproduction and distribution of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner's research may discuss legal issues related to the information technology business, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The opinions expressed herein are subject to change without notice.