1E

Data Center Efficiency and Capacity: A Metric to Calculate Both

Current metrics being proposed in the industry for data center efficiencies are not adequate for use in day-to-day operations. This research proposes a simple performance metric to be used to identify underutilized resources and cost-saving opportunities in data centers.

Key Findings

Most data centers do not populate server racks beyond 40% to 60%, on average.
For x86 servers, typical utilization rates are between 7% and 15% in a nonvirtualized environment (still the majority at the moment).
Energy use in low-performance servers can reach 60% to 70% of their name plate rating.

Recommendations

When possible, companies should measure performance, energy consumption and equipment utilization at the rack level.
Compare current performance to energy ratios against optimal targets, not maximums, identifying realistic low-cost growth areas.
Use capacity targets as the benchmark for floor space and energy design.
Users with multiple data centers around the world can use the Gartner Power to Performance Effectiveness (PPE) rating to compare the efficiency of their data center operations in a uniform way.

ANALYSIS

This document was revised on 26 January 2010. For more information, see the Corrections page on gartner.com.

Many IT organizations are being asked to do more with less, reducing budgets or, perhaps, curtailing data center expansion projects altogether. Faced with the harsh realities of a difficult economic climate, data center managers need to focus on creating the most efficient operating environments in order to extend the life of existing data centers. These efficiencies can be gained through many avenues – increasing compute densities, creating cold-aisle containment systems or more effectively using outside air – but the key component over time will be having an easily understood metric to gauge just how efficient the data center is, and how much improvement in efficiencies has been created on an ongoing basis.

What's the Issue?

With the increased awareness of the environmental impact data centers can have, there has been a flurry of activity around the need for a data center efficiency metric. Most that have been proposed, including power usage effectiveness (PUE) and data center infrastructure efficiency (DCiE; see Note 1), attempt to map a direct relationship between total facility power delivered and IT equipment power available. Although these metrics will provide a high-level benchmark for comparison purposes between data centers, what they do not provide is any criteria to show incremental improvements in efficiency over time. They do not allow for monitoring the effective use of the power supplied – just the differences between power supplied and power consumed. For example, a data center might be rated with a PUE of 2.0, an average rating, but if that data center manager decided to begin using virtualization to increase his or her average server utilization from 10% to 60%, while the data center itself would become more efficient using existing resources, then the overall PUE would not change at all.

The PPE Metric

A more effective way to look at energy consumption is to analyze the effective use of power by existing IT equipment, relative to the performance of that equipment. While this may sound intuitively obvious (who wouldn't want more-efficient IT), a typical x86 server will consume between 60% and 70% of its total power load when running at low utilization levels. Raising utilization levels has only a nominal impact on power consumed, and yet a significant impact on effective performance per kilowatt. Pushing IT resources toward higher effective performance per kilowatt can have a twofold effect of improving energy consumption (putting energy to work) and extending the life of existing assets through increased throughput.

If major IT assets were evaluated in this manner, it becomes clear that not only can more-efficient environments be created, but individual asset utilization levels can be increased, effectively improving the performance per square foot within the data center, and potentially deferring the construction of a new data center.

At Gartner, we have created a metric to help demonstrate this effect, called the PPE metric. It was developed to help identify, at the device level, where efficiencies could be gained. Unlike other metrics, the PPE does not compare actual performance to hypothetical maximums, but rather is designed to allow the user to define his or her own optimal maximum performance levels, and then compare average performance against the optimum.

There are three critical components that come into play, only one of which is out of the primary control of IT; rack density levels, server utilization levels and energy consumption. Rack density levels are usually mandated by IT management and, as often as not, are defined based on power levels and the potential heat load that might be generated by specific rack densities. In a typical data center, rack densities of 50% to 60% are very common, yielding an average of 25 1U server slots per rack. Server utilization, especially in x86 environments, is often at the low end of the performance range, averaging between 7% and 15% in many organizations. One of the key drivers for virtualization has been to improve these performance levels, driving servers up toward 60% to 70% average utilization. Driving these servers to higher utilization levels does not dramatically increase power consumption, but PPE is designed to capture that as well. Therefore, optimal power can be defined as not total compute output, but realistic compute output, compared to energy used.

Optimal power performance is calculated based on the following (servers and racks are used as examples):

Rack density x optimal percentage = optimal servers. For example — 21 2u servers x 85% = 18 servers per rack
Optimal servers x optimal server performance utilization x average watts per server divided by 1,000. For example – 18 x 65% x 464/1,000 = 5.28 kilowatts per rack as an optimal power performance

(Note: Average watts per server is the power draw at optimal performance, which can be obtained through measurement, or, in many cases, from the vendor's website.)

The resulting number, in this example 5.28, represents 100% optimal performance.

Average power performance uses the same basic formula, but is based on current workloads:

Rack density x actual percentage used = actual servers
Actual servers x average utilization x average watts per server/1,000 (kilowatts)

The resulting number represents actual performance.

Therefore, PPE can be represented as actual power performance/optimal power performance. For example, optimal performance for a rack can be defined as follows: The watts per server at 65% utilization were taken directly from the vendor's website. Actual performance is shown in Table 1. Again, the watts per 2u server at these performance levels was obtained from the vendor's website, although direct measurements could be used as well.

Table 1. PPE Example

	Average	Optimum
Rack Density	60	85
Servers	13	18
Kilowatt per Rack	5	8.1
Server Utilization	25	65
Performance	1.25	5.28
PPE	23.60	100

Source: Gartner (September 2009)

PPE shows the potential growth available — 76% — within this existing configuration. With a combination of higher virtualization levels and increased rack densities, it's likely this rack environment will support existing growth rates for quite some time. And yes, we must assume that both power and cooling are available to support these higher densities. If not, an analysis of the cost to add additional power and cooling versus the cost to build out a new data center might, in fact, change the overall decision-making process.

A cautionary note: With server virtualization, the belief that server utilization is low and can be improved through the consolidation of multiple workloads onto a single, virtualized server is a standard practice. However, server utilization can be a somewhat misleading concept. First, new servers will generally have lower levels of utilization, because the server asset is likely to have a three- to five-year useful life cycle. For newer applications, utilization will increase year over year as the initial workload increases. Most importantly, clients should consider application performance/throughput and latency as the keystone metric to begin with, and not simply CPU or memory utilization. Remember, optimal server performance is not necessarily high CPU or memory utilization.

Bottom Line

PPE is not the end-all of power and performance monitoring, but was designed to give IT managers a view of performance levels within their data centers, and a means to compare that performance to realistic potential (optimal) performance levels, rather than just using a hypothetical maximum. Using PPE on an ongoing basis will yield a clear view of how power and performance use is changing over time, and how an organization's overall data center efficiency is improving.

Source: Gartner RAS Core Research Note G00164493, David J. Cappuccio, 18 September 2009

Return to Home

Note 1
PUE and DCiE

PUE = total facility power/IT equipment power

DCiE is the exact same thing, but expressed as the inverse: DCiE = 1/PUE