
|
Overview

|

|
This document evaluates the potential benefits and drawbacks of converging data center networks, and debunks a myth about lower costs and complexity. It includes quantitative analysis, along with insights gained from discussions with Gartner clients about organizational issues that affect convergence decisions.
- Don't assume that a single converged Fibre Channel over Ethernet (FCoE) network is desirable, or even feasible.
- Standards for building large, scalable, Layer 2, converged Ethernet backbones are at least a year away. Products proven to be interoperable are much further off.
- Combining storage area network (SAN) and local-area network (LAN) traffic on a single backbone network increases costs and complexity.
- Organizational issues often dwarf the technical issues surrounding network convergence.
- Staff reductions are unlikely to be feasible even if physical networks are converged.
- Maintaining two separate data center networks doesn't mean you can't use the same technology for both.
- Plan to maintain separate data center SANs and LANs for at least the next three years.
- For virtualized servers, employ input/output (I/O) virtualization and network convergence to top of rack (ToR) switches to simplify cabling and reduce costs and complexity.
- To ensure competitive pricing, insist that vendors offering new technologies such as Transparent Interconnection of Lots of Links (TRILL) demonstrate interoperability with at least one credible alternative vendor before you buy.
- Consider Data Center Bridging (DCB) as a possible long-term (more than three-year) alternative to Fibre Channel (FC).
- Be wary of claims that you can build a standards-compliant, end-to-end converged data center network either today or within the next 24 months.
- Start integrating your server, storage and network teams under a single operations structure to prepare for longer-term synergies as their associated disciplines become more closely linked. For most organizations this represents a political challenge that will take years to complete, but until it is done operational efficiencies from staff integration are unlikely.
|
|


|
Analysis

|

|
The Network Backbone Convergence "Buzz"
Once again the networking industry is abuzz with the promise of a single converged backbone infrastructure, this time in the data center core. Variously described as FCoE, Data Center Ethernet (DCE) and, more precisely, DCB, this latest development is intended to succeed where InfiniBand failed to unify computing, networking and storage networks. The argument goes like this: "It must be less expensive to build and manage one larger backbone network than two smaller ones." But this is a case where the facts don't support a seemingly obvious assumption.
The barriers to building a single network range from a dearth of available products, and the price premiums charged for those products, to the requirement to "forklift-upgrade" your entire data center backbone network in order to overcome long-standing organizational barriers. These barriers are expected to remain for at least the next three years. Over time, they may be lowered by product improvements and organizational integration, supporting a more convincing argument for backbone network convergence.
Note that Gartner believes that in-rack server network convergence can be both achievable and advisable. However, at current prices, a converged FCoE in-rack network can be more expensive than maintaining separate Ethernet and FC networks.

If convergence within the server and rack is a good idea, why shouldn't you extend it across the data center network backbone? There are several reasons.

The promise that a single converged data center network will require fewer switches and ports, be simpler, consume less power and require less cooling doesn't stand up to scrutiny. This is because as networks grow beyond the capacity of a single switch, ports must be dedicated to interconnecting switches. In large mesh networks entire switches do nothing but connect switches to each other. This results in a non-linear relationship between usable ports at the edge of the backbone and ports used for inter-switch links. In smaller networks fewer ports are required to perform this interconnect function.
A network topology known as a "fat tree" or "folded Benes" is popular when building large-port-count non-blocking networks using smaller-port-count switches. Figure 1 illustrates how a 32-port non-blocking network can be constructed using 8-port switches.
Figure 1. A 32-Port Non-Blocking Network in a Two-Tier "Fat Tree" Topology
Source: Fulcrum Microsystems

Tables 1 and 2 show that, at best, the converged network requires the same number of ports as two separate networks. And if the same number of ports and switches is required, there are no savings on power or cooling. Assumptions for these configurations are detailed in Note 1.
In the top section of Table 1 all servers are connected to a DCB backbone through ToR switches only or embedded blade switches. Connections to existing FC storage arrays would require DCB-FC bridges, further increasing the cost of the converged backbone configuration.
The lower section of Table 1 shows the results of building separated DCB-based SAN and LAN. In both cases the backbone topology is a two-tier fat-tree network.
Table 1 assumes a fat tree of non-blocking 256-port core switches. The requirement for a non-blocking core is detailed in "Use Top-of-Rack Switching for I/O Virtualization and Convergence; the 80/20 Benefits Rule Applies" (see Note 2 for an update on this topic).
Table 1. Price Comparison: Single Converged DCB Backbone Network Versus Separate DCB-Based LAN and SAN, All Using 256-Port Switches
5,000 servers, DCB end to end, not counting FCoE bridges at the SAN edge |
1,490 |
2,980 |
256 |
11.75 |
6.00 |
17.75 |
4,544 |
$1,000 |
$4,544,000 |
|
|
|
|
|
|
|
|
|
|
5,000 servers, FCoE ToR, two backbones, partial fill on core chassis |
|
|
|
|
|
|
|
|
|
Ethernet backbone |
544 |
1,088 |
256 |
4.25 |
2.25 |
6.50 |
1,664 |
$1,000 |
$1,664,000 |
FCoE SAN backbone |
946 |
1,892 |
256 |
7.50 |
3.75 |
11.25 |
2,880 |
$1,000 |
$2,880,000 |
Total |
1,490 |
2,980 |
256 |
11.75 |
6.00 |
17.75 |
4,544 |
$1,000 |
$4,544,000 |
DCB= Data Center Bridging; FCoE = Fibre Channel over Ethernet; SAN = storage area network; ToR = top of rack
For the assumptions underlying this table, see Note 1 |
Source: Gartner (March 2010)

An additional attribute of fat trees is that they enable very large, fault-tolerant non-blocking networks to be constructed from a large number of relatively inexpensive low-port-count switching elements. The availability of low-cost 48-port ToR DCB switches suggests that, if TRILL support is added, a lower-cost converged core might be possible. In this particular case, using 48-port switches requires a three-tier spine, aggregation and leaf fat tree to construct the converged network, whereas only two-tier fat trees are required to construct the two separate networks (Figure 2 shows a three-tier fat-tree network topology).
This is because, in general, switches with N-ports enable two-tier fat-tree topologies of N2/2 external ports, and that two-tier topology requires 3N/2 switches to build a maximum configuration. So 48-port switches are limited to (482)/2 ports or 1,152 ports. To build the required 1,490-port backbone, a three-tier fat-tree topology is required.
Figure 2. A Three-Tier Fat-Tree Network Topology
Source: Fulcrum Microsystems

Table 2 details the cost of an equivalent backbone to Table 1, constructed using 48-port switches.
Table 2. Price Comparison: Equivalent Backbones to Those of Table 1, Constructed Using 48-Port Switches
5,000 servers, DCB end to end, not counting FCOE bridges at the SAN edge |
1,490 |
2,980 |
2,980 |
48 |
63 |
63 |
32 |
158 |
7,584 |
$500 |
$3,792,000 |
|
|
|
|
|
|
|
|
|
|
|
|
5,000 servers, FCOE ToR, two backbones, partial fill on core chassis |
|
|
|
|
|
|
|
|
|
|
|
Ethernet backbone |
544 |
- |
1,088 |
48 |
23 |
- |
27 |
50 |
2,400 |
$500 |
$1,200,000 |
FCOE SAN backbone |
946 |
- |
1,892 |
48 |
40 |
- |
22 |
62 |
2,976 |
$500 |
$1,488,000 |
Total |
1,490 |
- |
2,980 |
48 |
63 |
- |
49 |
112 |
5,376 |
$500 |
$2,688,000 |
DCB= Data Center Bridging; FCoE = Fibre Channel over Ethernet; SAN = storage area network; ToR = top of rack
For the assumptions underlying this table, see Note 1 |
Source: Gartner (March 2010)

Even when the converged backbone can be built with a two-tier network topology, the best-case scenario is that the converged network requires the same number of ports and costs the same as two separate networks. Although a network built with low-cost 48-port switches is less expensive than one using more expensive 256-port switches, the saving comes at the expense of complexity. In either case, networks constructed using smaller switches require significantly more interconnect cables deployed in a very complex topology, which makes installation and troubleshooting very difficult.
In much smaller configurations with only a few hundred servers the entire network could be constructed using a single very large data center backbone switch. While this topology is very simple, 256- to 512-port 10-Gbps Ethernet non-blocking switches are very expensive, often costing over $1,000,000. Additionally, configuration and management complexities remain.
Since convergence brings no reduction in the amount of equipment needing to be acquired, maintenance and support costs are unlikely to be lower. It should also be noted that committing to a single vendor to increase purchasing volumes may actually increase costs.

Reasons of Design and Management Complexity
With today's switches, the large Layer 2 topologies required by VMware's VMotion virtualization technology make Ethernet network design a very complex multi-overlapping VLAN problem. Designing a large SAN is no simpler. And when the two networks are overlaid on a single infrastructure the complexity increases significantly. As traffic shares ports, line cards and inter-switch links, avoiding congestion ("hot spots") becomes extremely difficult. Over time, emerging standards such as TRILL may make it easier to avoid these hot spots, but mature, standards-compliant implementations are at least two years away. Avoiding hot spots and single points of failure in the event of switch or link failure is a very large design challenge.
Debugging problems is also more difficult in the converged network, since interactions between LAN and SAN traffic can make root-cause analysis harder. Since many problems are transient in nature, events must be correlated across the two virtual networks, which increases complexity. Should an outage be required to solve a problem or simply to perform maintenance, a downtime window that is acceptable for both environments may be required. This increases complexity and may increase costs as well.
The simple solution is to segment the switches to isolate LAN and SAN traffic from each other. Alternatively, of course, you could simply maintain two separate networks and avoid the problem altogether.

Perhaps the greatest impediment to backbone network convergence is organizational. Simply put, in most large organizations, the SAN and LAN administration teams report to different managers, have very different cultures, and don't get along. Most LAN staff see storage staff as Luddites stuck in the previous century, while most storage staff view LAN staff as people who don't know how to run a production network. Additionally, the SAN staff remember what happened to the voice network engineers when Internet Protocol (IP) telephony was introduced they were absorbed into the LAN team as "second-class citizens."
Although Gartner recommends that server, storage and network teams be integrated, for most organizations this will prove a political challenge that takes years to resolve. But until this happens, improved operational efficiencies from staff integration are unlikely.
Gartner's IT Key Metrics data for voice networks and data networks shows that convergence of these networks did not result in significant staff reductions. Some external contractors may have been eliminated, but the unique skills required prevented significant reductions in staffing levels. Although moves, adds and changes are simplified, more highly skilled personnel are required to design, implement and operate the converged infrastructure. We see no reason to think that the results of LAN-SAN convergence would differ from those for time division multiplexing (TDM) voice-VoIP/LAN convergence.

Converge All Data Center Traffic on a Single Technology, but Not a Single Network
There are benefits to standardizing on a single technology for all data center networking if that technology adequately supports the needs of applications. Doing this simplifies acquisition, training and "sparing." However, settling on a single technology does not require that separate networks be combined. Design, operations and troubleshooting are much easier with separate networks and, as this document demonstrates, they may also cost less to build.
Traditionally, the technology that adapts to assimilate others has been Ethernet. If DCB proves a suitable substitute for Fibre Channel and 40/100-Gbps Ethernet is delivered in a timely manner, it will make sense over time to move all traffic to Ethernet. In our model we assume that 10-Gbps DCB switch ports will cost approximately the same as normal 10-Gbps Ethernet ports and 8-Gbps FC switch ports.
We expect DCB to become standard on most data-center-oriented switches. Since 10-Gbps Ethernet already provides a much higher data rate that 8-Gbps FC (10 Gbps versus 6.4 Gbps), DCB promises better price/performance. However, due to the size of the installed base of FC, the promise of the all-Ethernet data center will take at least five years for most enterprises to realize.

Although the promise that a unified fabric will require fewer switches and ports, and result in a simpler network that consumes less power and needs less cooling, may go unfulfilled, this doesn't mean that enterprises should forgo the benefits of adopting a unified network technology. In fact, this approach may prove that in some cases, 2<1.
 © 2010 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. Reproduction and distribution of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner's research may discuss legal issues related to the information technology business, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The opinions expressed herein are subject to change without notice.
|
|
|
|

|

|
|
|
|

|
|

- 5,000 servers, in a mix of 1RU, 2RU and blade server chassis, supporting a mix of dedicated and virtualized workloads.
- 256-port non-blocking switches at $1,000 per port.
- Two-tier fat-tree (folded Benes) network topology.
- All devices are dual-pathed to provide redundancy and some margin of headroom.
- Server traffic is aggressively aggregated in ToR switches to minimize required backbone ports.
- Less aggressive aggregation would reduce the need for a non-blocking core, but would increase the number of leaf ports and negate any significant savings in core ports.
- 5,000 servers, in a mix of 1RU, 2RU and blade server chassis, supporting a mix of dedicated and virtualized workloads.
- 48-port non-blocking switches at $500 per port.
- Three-tier fat tree (folded Benes) network topology due to smaller switch size.
- All devices are dual-pathed to provide redundancy and some margin of headroom.
- Server traffic is aggressively aggregated in ToR switches to minimize required backbone ports.
- Less aggressive aggregation would reduce the need for a non-blocking core, but would increase the number of leaf ports and negate any significant savings in core ports.
|
|

|

|
|
|
|

|
|

Recently there has been some suggestion that heavily over-subscribed, and therefore lower-cost, backbones will be suitable in heavily virtualized environments if virtual machine (VM) affinity is employed. VM affinity keeps all the VMs associated with a particular application co-located on the same blade chassis or in the same rack. While this may remove some server-server TCP/IP traffic from the backbone, it fails to account for the following:
- Approximately two-thirds of all server network input/output is SAN traffic, and this traffic will still have to cross the backbone to reach storage arrays. Distributing ("Balkanizing") storage to localize that traffic reduces efficiencies and increases complexity. And as VMs move across the data center to balance loads or to recover from failures, the storage will remain in place, causing the traffic to cross the backbone.
- Some portion of the LAN traffic comprises end-user interactions, which will cross the backbone.
- As VMs move around the data center to balance loads and recover from failures, traffic patterns are at best difficult to predict. The resulting network congestion will impair application performance, negating any savings from reduce capital costs.
- VM mobility is only practical if very-high-performance links are available between source and destination physical servers. Unanticipated congestion during VM migrations can cause application degradation or disruption.
- Although it is not a storage best practice, many data centers still use server-based backups. These backups can generate an order of magnitude more LAN traffic than they do application traffic. An oversubscribed converged backbone can result in failed backups and interruptions to running applications.
|
|
|