Your Data Center Network Is Heading for Traffic Chaos

  • 27 April 2011

  • Bjarne Munch

  • Research Note G00210674

Data center network design and management must adapt to address the impact of changing application deployment and server virtualization mobility to avoid performance issues of business-critical applications.

Gartner Webinar
I&O Cost Optimization: The Journey Never Ends
X

Overview

Traditional data center networks are designed to monitor, manage and optimize network traffic between end users and application servers. However, the increased use of virtual machine (VM) mobility and new application deployments is significantly increasing traffic levels between servers in the data center. This will challenge established designs and tools, as well as the network managers' ability to troubleshoot and manage application performance.

Key Findings

  • By 2013, enterprises will be improving agility via VM mobility. This requires a flat Layer 2 network; however, unless managed, VM mobility will disrupt the performance of other applications. It can burst up to 9 Gbps and will introduce long-duration data transfer between servers.
  • During the next two to five years, enterprises will introduce new applications and application deployment models that will require network traffic management in the data center network. Traffic monitoring and application performance management (APM) will need to be integrated across application management and network management.
  • Although 40% to 50% of enterprises report projects to establish traffic flow monitoring and APM tools, these initiatives are often focused on WAN performance and are not integrated end to end into the data center.

Recommendations

  • Network planners must avoid just focusing on designing a flat, meshed data center network to suit the needs of VM migration. They should also plan for a data center network that can support ongoing changes in application deployments.
  • Network planners must introduce traffic management and prioritization similar to that deployed today in the WAN, as well as meshed topologies and flatter networks.
  • Network planners must integrate application management and network management to monitor and manage application performance, and to troubleshoot across disaggregated applications.

What You Need to Know

Enterprises' need for improved agility in their applications is driving more-agile deployment models, as well as more-efficient server management via VM mobility. These are significant changes that will affect the network and the performance of applications in a manner different from the traditional application deployments and server management. If enterprises do not change tiered network design, they will experience increased congestion, as traffic levels between servers in the data center grow during the next two to three years.

Analysis

Network traffic within the enterprise data center will continue to grow during the next two to four years by more than the traditional 30%. These new traffic patterns will appear arbitrary and even chaotic, with fluctuations that can be 90 times higher than the traffic peaks experienced by most data centers today. Several factors will contribute to this:

  • Application deployments continue to evolve from monolithic client/server programs to tiered Web deployment to service-oriented architecture (SOA) and composite application deployments.
  • New business functions, such as complex-event processing (CEP) and communications-enabled business processes (CEBPs), are emerging.
  • Live migrations of virtualized server workloads are increasing.

These factors will continue to increase the stress on the data center network, where traffic flow between servers will increase by an order of magnitude. The traditional data center network is generally designed for application traffic moving between users and servers. These networks must be redesigned to support this emerging server-to-server traffic; A large amount of low-bandwidth and short-duration traffic flows from new application models, in addition to high-bandwidth and long-duration traffic flow from VM migrations between servers. Network planners and network architect must plan for these new impacts on their networks to avoid performance issues caused by network congestion. They need to ensure that appropriate tools are in place to monitor and manage the traffic.

Changing Application Deployment Models

Most business applications now in use are based on n-tiered Web architectures, while a few applications (such as e-mail) are still based on the client/server architecture. Although this is a well-proven and reliable application deployment model, it is also inflexible and time-consuming to adapt to changes and new business scenarios. SOA and composite application deployments have gained traction among enterprises, due to increased agility in application development (see Note 1).

The concept of composite applications has been around since late 1990s, but their use is relatively new to enterprises, with an estimated market penetration of 5% to 20%. However, the use of composite applications is growing rapidly, as organizations seek to leverage established assets, thereby minimizing the amount of new code that must be developed and maintained. Gartner expects composite application deployment to become mainstream in two to five years, and most business-critical applications will become composite (see "Key Issues for Composite Applications and Enterprise Mashups, 2010" and "Hype Cycle for Application Infrastructure, 2010").

This means that, during the next two years, network planners need to prepare their networks for a significant increase in end-user-related application traffic levels between servers in the data center. For example, if a user initiates a transaction to an application server that requires five independent services to be involved, this one user action thus may require five additional server interactions over the data center LAN. These traffic flows will have a transactional nature, which means they will not be bandwidth-intensive; however, they will be latency- and loss-sensitive. Because the end-to-end traffic flow will become disaggregated into a number of transactions, it will become difficult to obtain an end-to-end view. Thus, it will be difficult to troubleshoot.

Emerging and Changing Business Functions

New business functions are emerging that optimize business processes. These include CEP, CEBP and various analytics. In combination, these emerging functions could create a significant level of network traffic between servers in the data center, especially CEP (see Note 2). A range of other applications — such as master data management or hosted virtual desktops — will also increase network traffic, but not specifically LAN traffic between servers in the data center.

Specialist vendors have been serving leading-edge CEP projects for years, without attracting much attention from mainstream users, and adoption is still between 1% and 5%. However, awareness of CEP is steadily growing, and Gartner expects more deployments in a three- to four-year time frame (see "Emerging Technology Analysis: Complex Event Processing"). CEBPs are similar to CEP, in that they offer automated event notification.

Although there is significant awareness that adoption is low (less than 1%), Gartner expects more deployments in the next three or four years (see "Hype Cycle for Business Process Management, 2010"). For example, if a sales order is placed, this may trigger an event that CEP may aggregate to real-time information about accumulated sales activity. This will create data center internal network traffic for event notification that is not directly related to a specific business application, although it will be initiated by events within a business application. These traffic flows are unlikely to be bandwidth-intensive or latency-sensitive, but they will appear as arbitrary flow. High levels of such traffic flows will complicate traffic monitoring in general, but may also affect the performance of more-sensitive traffic.

Changing Server Virtualization Practices

Initially, most of the interest in server virtualization projects has been driven by the desire for cost reduction. These virtualization projects have focused on reducing the number of physical servers by increasing server utilization via the consolidation of several workloads on one physical server. However, this focus on cost reduction is shifting toward virtualization as a key driver to improve agility and availability. This will result from dynamic provisioning of VMs and the movement of VMs between physical servers in the data center. Enterprises are now also beginning to place mission-critical workloads on virtualized servers, because it's easier to provide disaster recovery.

As enterprises increasingly take advantage of VM mobility and live VM migration between physical host servers, the network needs to be designed such that bandwidth contention is minimized for traffic moving among servers. Network latency is not a significant consideration in the data center, because round-trip latencies of up to 5 ms can be tolerated before transmission time becomes an issue.

Gartner believes that VM live migration is early mainstream, but will become a mainstream technology in two years (see "Hype Cycle for Virtualization, 2010"). Thus, network planners need to prepare their networks for increased use of VM mobility during the next two years. Each VM migration can require that large amounts of data be transferred between the two server pairs. With VM migration, only the actual workload is moved, and not the storage, which means that the time required to migrate depends on how memory-intensive the workload is, the platform being used and the number of VMs to be moved.

As a result, the virtual memory to be moved is typically a minimum of approximately 2 to 4 GB/VM for an x86 platform, and workload migration is typically 30 to 60 seconds per workload (see Note 3). Most VM motion is performed via a dedicated 1 Gbps interface on the server; however, as servers migrate to 10 Gbps, all traffic will be shared across that interface. In that situation, VM mobility can saturate the interface, and negatively affect the performance of other applications. For example, VMware vMotion can burst up to 8 Gbps or 9 Gbps. Virtualization in a Unix environment is less mature and, for current hypervisors, the migration can take up to 10 minutes (see "Use of Virtualization in the Unix Environment Is Growing").

Impacts on the Data Center Network

The consequence of changing application deployment and increased use of VM migration will mean that traffic patterns in the data center network are changing from being predominantly client/server (north-south) to a significant level of server-to-server (east-west) flows. By 2014, network planners should expect more than 80% of traffic in the data center network to be between servers.

This means that network design must change. Traditional three- and four-tier data center network architectures focus on aggregating traffic flows from servers to users. This is not an optimum design for traffic flow between servers, because it may have to move through several switches to the core to get to other servers. Thus, the traffic would encounter a number of switch hops, as well as move through high user-traffic aggregation points. Instead, networks must be designed to support arbitrary traffic flows, which means a meshed topology, instead of the traditional tree topology. This requires changes to the physical and logical topology design.

During the next four to five years, the data center network will be flat and fully meshed. However, until then, network planners should reconsider their existing designs to facilitate better internal traffic flows. Reducing the number of tiers in the network (see "Minimize LAN Switch Tiers to Reduce Cost and Increase Efficiency") will reduce the number of network hops; however, the traffic flow is still physically restricted by the switches and the cabling between the switches. In other words, if traffic has to move from a top-of-rack switch through the core to another top-of-rack switch, then this could lead to contention in the core and/or the uplinks.

A nonblocking core will solve part of this; however, if possible, the server workload allocation should facilitate traffic flows among servers connected to the same switch — that is, VM pairs with high traffic between them — or server pairs in VM migration should be placed in the same network switch to reduce bandwidth contention of the uplink. This will require cooperation among network managers, server administrators and application developers.

Even with a physically meshed design, an arbitrary traffic flow is not possible. This is because of the Spanning Tree Protocol (STP), which creates unpredictable topologies and forces traffic via one path through the network, even when other, more-optimal paths are available.

Standardization is under way in the Internet Engineering Task Force (IETF) and Institute of Electrical and Electronics Engineers (IEEE) to introduce protocols that remove this topology constraint of STP by allowing highly arbitrary traffic flows. IETF is developing the Transparent Interconnection of Lots of Links (TRILL) protocol, and IEEE is working on the 802.1aq Shortest Path Bridging (SPB) protocol. The objective is to eliminate the issues of STP by defining a shortest-path protocol to establish virtual-LAN-aware, Layer 2 arbitrary multipaths through an Ethernet LAN. This not only enables optimal management of multiple traffic flows through the network in a meshed manner, it provides a higher degree of scalability than STP.

Network planners should expect TRILL to be ready in a 2011/2012 time frame; however, network planners should expect vendors to develop their own proprietary solutions (such as Cisco), as well as create virtual switching solutions (such as Brocade and Juniper) that don't need STP, TRILL or SPB. In any case, to avoid interoperability issues and focus on ensuring interoperability at the edge of the network, network planners should design their core networks based on one vendor.

Network planners also need to enhance traffic monitoring within the data center to improve visibility into traffic flows, They should plan for a higher degree of traffic management.

Network planners must consider the impacts on their network-monitoring tools. A large number of the individual traffic flows in the network will be difficult to tie into an end-to-end application view. This will be important for fault seeking, as well as APM. A Gartner survey conducted in August 2010 (see "User Survey Analysis: Network Challenges and Opportunities in Data Centers Through 2011") indicated that 49% of respondents have running or are planning projects for traffic flow monitoring — the figure is 43% for APM tools.

However, the responses are ambivalent, with some focusing on the WAN, and some focusing on the data center network. The key to managing application performance in these new environments is to integrate application management and network management tools to obtain an end-to-end view.

For network managers, this additional network traffic will appear arbitrary and even chaotic. However, the traffic is closely related to how applications are deployed and how servers are managed, which means that there is a high degree of predictability. For this reason, network managers must work closely with application and server administrators to minimize network impacts where possible, and to understand the impacts better to ensure the performance of all applications.

Traffic management and traffic prioritization are common practices for application traffic across the WAN between the user and the server, but not for traffic between servers. Live VM migration can burst to nearly 9 Gbps, which can cause disruptions of latency-sensitive traffic, even over 10 Gbps network interfaces. Network planners need to ensure coordination between network-based traffic management and hypervisor traffic management, using tools such as VMware NetIOC.

Tactical Guidelines

Network planners must redesign their network such that traffic can move between servers without disrupting application performance. However, enterprises must focus specifically on integrated application and network management to manage performance and troubleshooting.

Strategic Planning Assumption

By 2014, network planners should expect more than 80% of traffic in the data center's local-area network (LAN) to be between servers.

Note 1
Composite Application Deployment

Traditionally, an application is a monolithic software package deployed on a dedicated server as a homogeneous entity. It is often described as a client/server architecture, in which a user transaction typically results in a small number of data transfers between the user workstation and an application on a specific server. These applications are typically Web-enabled in an n-tier architecture, where a Web browser communicates with a Web server, which then communicates with the application server.

In the postmodern application architecture, the application is no longer a homogeneous entity; instead, it is an application composed of multiple components (containers) that are often shared by multiple applications. Each component has a specific purpose, and can be designed by different teams and deployed on different platforms. This means that components may be distributed across several physical hosts in the data center or externally in the cloud, and shared as a service among several applications.

The connections between each of these components are loosely coupled, and interactions are based on well-defined interfaces that are invoked during runtime. This means that a single user transaction generates a large number of interactions among the components carrying out the transaction. Each interaction includes only a small amount of data. Within the typical n-tier Web architecture, there will still be the Web-server-to-application-server traffic, but there will be additional server-to-server traffic related to the components, which is needed to compose the application.

Note 2
CEP

CEP is an emerging technology area, providing organizations with the ability to filter streams of event data (e.g., from business transactions or sensors), find insights in patterns of events and then trigger an appropriate response. The purpose is to implement continuous intelligence applications that enable faster and better decisions (such as for operational decision support), and to trigger automated processes that require little or no human involvement. CEP is often structured as an overlay on top of a conventional application portfolio, enabling a new layer of operational monitoring, without disrupting the existing transactional and reporting applications. However, this is expected to create a potentially high-level event-notification-type traffic in the data center network to collect this event information.

Note 3
Live VM Migration

VM migration relates to moving a VM with its contained guest operating system, applications and input/output connections from one physical server to another with a similar configuration. This is done by capturing the entire memory state (i.e., the RAM footprint) of the running VM, including its applications and guest operating system, in a file, which is then copied to a new physical server. The size of the virtual memory that needs to be moved varies, but typically a minimum is on the order of 2 GB to 4 GB. For larger servers, the size can be 16 GB or more. Such live migration can be initiated automatically for resource management (e.g., if the host resources are too highly utilized); in which case, individual VMs may be moved. VM motion can also be used manually to migrate to another location for patch management or for disaster recovery, in which case all VMs on a host will be moved.

Besides high network load, the virtualization platforms require that VM migration happen within a single Internet Protocol (IP) subnet, because of their inability to maintain transport sessions over changes in endpoint IP address (because the IP address is the unique VM identifier). This may prohibit the scalability of the pool of server capacity, because of difficulties in scaling the network subnet, due to lack of scalability of the STP.