Selecting a Converged Access Solution for Blade Server Architectures

30 May 2013 ID:G00252192
Analyst(s): Andrew Lerner

VIEW SUMMARY

Blade server architectures are blurring the demarcation between networking, storage, and server technologies. This makes it increasingly difficult for IT infrastructure personnel to evaluate and select an appropriate converged access solution for their organization.

Overview

Key Challenges

  • There is a blurring of demarcation between server, networking and storage components and responsibilities as a result of emerging blade server architectures, network/storage convergence, and proliferating virtualization.
  • Network and storage input/output (I/O) are increasing in both utilization and complexity, and can no longer be appropriately evaluated by a single team.
  • Infrastructure teams are having difficulty coming to agreement regarding blade server I/O selection. This stems from leading vendors having different architectures, performance levels and capabilities.

Recommendations

  • Utilize a cross-functional team with full participation from server, networking, virtualization and storage personnel to evaluate and select blade server I/O.
  • Avoid preconceived notions and preferences favoring specific vendors.
  • Focus on your organization's specific applications and use cases to derive latency, bandwidth, traffic flow, and management requirements.
  • Compare solutions based on their architecture, capacity, performance, features and management capabilities.

Table of Contents

Introduction

Gartner is experiencing an increase in client inquiries from organizations evaluating converged blade server I/O. These organizations desire converged access to improve efficiency (i.e., reduction of cables, interfaces, switches, etc.) but are having difficulty agreeing on the appropriate solution.

This research clarifies the blade server evaluation process, and focuses on leading blade server solutions from Cisco, Dell, HP and IBM (see "Magic Quadrant for Blade Servers"). Blade enclosure I/O solutions have undergone three general iterations that impact both scale and management complexity:

  • Pass-through I/O modules provide a 1-1 ratio of internal server interfaces to external LAN and storage area network (SAN) interfaces. This provides a clear demarcation between server and storage/networking but results in a large number of physical cables leaving the blade server enclosure (often up to 100), each requiring switchports on upstream switches.
  • Integrating switching entails installing networking/storage switches inside the blade server enclosure as a module. This results in drastic reduction of physical cables and upstream switchports required, but requires management of individual SAN/network switches (usually four per enclosure).
  • Converged I/O solutions provide network and storage consolidation that reduces the number of cables, switchports and switches to be managed. This is accomplished via installation of a converged I/O module within the chassis (typically two for redundancy). This also lays the groundwork for improved I/O provisioning and orchestration (see "Comparing Data and Storage Network Convergence Options").

Analysis

Utilize a Cross-Functional Team With Full Participation From Server, Networking, Virtualization and Storage Personnel to Evaluate and Select Blade Server I/O

Blade server I/O is an integral component of overall blade server architecture, and evaluation/selection should no longer be handled solely by the server/platform team. Based on client inquiry, a common scenario includes server personnel leading the evaluation process but not achieving buy-in from storage and networking teams that prefer another vendor's approach.

Instead, networking, storage, virtualization and server teams must collaborate on all phases of infrastructure life cycle, including selection, design, implementation, maintenance and management. Specific recommendations to foster this approach include:

  • IT leadership should guide the evaluation team on the holistic benefits of converged access I/O. This can avoid a myopic evaluation process, which often leads to selection of separate and suboptimal storage/networking solutions.
  • Platform/server, storage, networking and virtualization teams should participate equally in the selection and evaluation process.
  • IT leaders should plan to maintain some level of separation of duties between roles, which may be required for policy or regulatory requirements.

Avoid Preconceived Notions and Preferences Favoring Specific Vendors

IT leadership should coach and guide evaluation teams to fairly evaluate solutions. In particular, evaluation teams must avoid preconceived notions and conventional wisdom regarding what vendors provide the best solution. This includes avoiding generalizations such as "Vendor X has the best network solution" and instead focusing on the technical capabilities of the solution. In addition, bias toward incumbent vendors must be appropriately discounted. Ultimately, evaluation teams must identify and focus on specific evaluation criteria including capacity, performance, features/capability, and management to enable apples-to-apples product comparisons.

Focus on Your Organization's Specific Applications and Use Cases to Derive I/O Requirements

Organizations must understand their specific application use cases and requirements to select an appropriate blade server I/O solution. These can be used to further derive specific requirements, including latency, bandwidth, traffic flows and manageability (see "Focus on the Five Dimensions of Network Design").

Latency and Bandwidth

The evaluation team must identify latency needs for applications to be served within the enclosure. Latency differs significantly between solutions, ranging from 0.8 to 3.2 microseconds for server-to-server communications within the enclosure. This latency is acceptable for most enterprise use cases, including converged storage and real-time applications such as voice and video that require latency in the milliseconds. However, a small number of specialized applications are impacted by latency deltas in the microseconds range (i.e., high-frequency trading, database clustering and big data). Consequently, organizations planning to deploy these specialized services within blade enclosures should weight latency higher in their evaluation criteria.

Identification of bandwidth needs for the enclosure will require collaboration between application, networking, storage, server and virtualization teams. The key factors to account for include:

  • Network bandwidth requirements for applications and servers within the enclosure
  • Storage bandwidth requirements for applications and servers within the enclosure
  • Peak utilization periods for both network and storage bandwidth utilization
  • The physical-to-virtual server ratio targeted for the enclosure

This information can be used to derive the amount and speed of interfaces required for individual servers and uplink interfaces to ensure adequate capacity.

Application Traffic Flows

Evaluation teams must take application traffic flows into account during the evaluation process. This includes determining the degree of user-to-server (north/south) versus application-to-application (east/west) flows. Of particular importance for blade server selections is the amount of intrachassis east/west communications.

Traffic flows vary significantly between enterprises but are trending toward east/west in most environments. Flow direction is influenced by several factors, including (1) overall application architecture (i.e., multitier), (2) server/application deployment policies, and (3) services delivered external to the enclosure (i.e., security, middleware). For example, many organizations stripe application servers across different physical enclosures for redundancy or funnel traffic through external hardware appliances to provide security or application delivery services. Both of these examples result in reduced intrachassis flows as traffic must exit the enclosure. Several other common flow characteristics include:

  • Public cloud, Internet and WAN traffic typically flow north/south. Traffic crossing security zones or traversing physical appliances (i.e., firewall, intrusion detection system/intrusion prevention system [IDS/IPS], application delivery controller [ADC], and Web application firewall [WAF]) also typically flow north/south.
  • East/west flows are exhibited by virtual machine mobility, multitier and service-oriented application architectures, server backups, and general inter-application connections.
  • See "Your Data Center Network is Heading for Traffic Chaos" and "Eight Key Impacts on Your Data Center LAN Network."

Management and Orchestration Capabilities

I/O management is a key component to orchestration and, thus, integral in realizing improved provisioning agility promised by converged access solutions. Typical evaluation of management systems entails a bottom-up approach, focusing on individual element management capability. While these features are important, Gartner recommends using a top-down approach to determine how the system can best enable business agility in your organization. Personnel evaluating I/O management should focus on the following key abilities:

  • Support automated orchestration and provisioning activities within your organization.
  • Provide an integrated systemwide, top-down view of the blade server environment, including environmental, server, network, storage and virtualization layers.
  • Achieve a high degree of integration with existing virtualization layer software and application performance management systems.
  • Integrate with existing management systems used by storage, networking, and server teams.
  • Support open and standardized protocols and APIs.

Ensure that personnel verify these functions via pilots or proofs of concept during the blade enclosure evaluation process. Too often, evaluation of network management functionality is limited to slide presentations and online webinars by vendors. Pilots should include orchestration and provisioning activities for virtualization, server and I/O components. In addition, any "comfort bias" toward existing management tools should be appropriately discounted (see "Decision Point for Network Management Instrumentation").

Compare Solutions Based on Their Architecture, Capacity, Performance, Features and Management Capabilities

Cisco, Dell, HP and IBM are the leading vendors, accounting for more than 84% of blade server shipments in 2012.1 While the base requirements for blade server I/O modules are simple (provide connectivity for servers within the chassis to external LANs and SANs), these vendors vary in their approach.

Architecture

HP pioneered this market, and their Virtual Connect FlexFabric 10Gb/24-Port module is a fixed-form port aggregator providing some L2 switching capability via a 480 Gbps internal fabric (see Gartner's "HP Bladesystem" and "HP Bladesystem: In-Depth Assessment"). IBM's Flex System Fabric CN4093 module is a Layer 2/3 switch with 1.28 Tbps internal fabric. The module is fixed-form but can be licensed in three tiers. Dell's MXL 10/40GbE blade switch module is a modular solution with a 1.28 Tbps fabric and provides L2/L3 switching capability.

Cisco's architecture is significantly different from the other vendors. It utilizes a two-tier hardware approach modeled after Cisco's Nexus Series data center switching product line. The two hardware tiers include a fixed-form Fabric Extender (FEX) within the enclosure that connects to a Fabric Interconnect (FI), located external to the enclosure. The FEX provides connectivity for blade servers within the enclosure, while the FI aggregates connections from FEX. The FI provides connectivity to external LAN/SAN interfaces, L2 switching connectivity, and supports up to 20 separate enclosures (see "Cisco Unified Computing Systems (UCS)").

Key Evaluation Criteria

To fairly and transparently evaluate blade server I/O solutions, evaluation teams should review specific attributes of performance, features and management. These attributes can be utilized to draw apples-to-apples comparisons between vendors and include:

  • Capacity and performance:
    • Line-rate capacity and overall throughput
    • Maximum number and supported speeds of server-facing interfaces
    • Maximum number and supported speeds of uplink interfaces
    • Oversubscription ratio (uplink to server-facing interfaces)
    • Intramodule latency
  • Features and capability:
    • Storage protocols supported (i.e., native Fibre Channel, Fibre Channel over Ethernet [FCOE], Internet Small Computer System Interface [iSCSI], and network-attached storage [NAS])
    • Support for data center bridging (DCB) protocols, including priority-based flow control (PFC), enhanced transmission selection, and data center bridging exchange (DCBX)
    • Number of media access control addresses and virtual LANs [VLANs] supported
    • Support for VLAN trunking, link aggregation (LAG) and multichassis LAG
    • Support for jumbo frames
    • Support for emerging protocols, including transparent interconnection of lots of links (TRILL), shortest path bridging (SPB), network virtualization using generic routing encapsulation (NVGRE), and virtual extensible local-area network (VXLAN)
    • Support for network routing features including first hop router gateway protocols, unicast and multicast dynamic routing protocols
    • Support for access control lists and quality of service capabilities
    • Ability to run in a mode that does not participate in spanning-tree (i.e., transparent to spanning-tree from the perspective of upstream switches)
  • Management and miscellaneous:
    • Support for common element management techniques, including Simple Network Management Protocol (SNMP) and flow-based protocols
    • Role-based access control for both command line interface and graphical user interface (GUI)-based administration
    • Interface partitioning or "carving"
    • Port profiles
    • Port mirroring
    • Device and chassis stacking
    • Capacity and expandability to support a pay-as-you-grow model
    • Support for software-defined networking capabilities

Notable Differences in Existing Solutions

With respect to the above criteria, there are several notable differences between the four major vendors (Cisco, Dell, HP and IBM) currently, including:

  • Dell and IBM offer the lowest interblade latencies, at 0.80 and 0.85 microseconds, respectively. HP's converged module latency is 2.0 microseconds, while their Ethernet-only module is .90 microseconds. Cisco's latency is 3.2 microseconds.
  • Dell and IBM are the only solutions that provide Layer 3 routing and access control list capability. IBM is the only solution supporting dynamic multicast routing.
  • Dell and IBM are the only solutions that provide 40G interfaces and are upgradable.
  • IBM has the best base interface oversubscription rate (1.75:1) and the highest aggregate interface throughput (640 Gbps).
  • IBM is the only solution supporting both 40G and native fiber channel interfaces on the same module.
  • Dell has the highest uplink capacity, totaling 240 Gbps (via six 40G interfaces) and is the only solution supporting 10GBASE-T interfaces.
  • The Cisco FEX module is supported in non-Cisco enclosures, including Dell, Fujitsu and HP.
  • Cisco's two-tier hardware architecture prevents local traffic switching between servers within the enclosure.

A Framework for Getting Started

Organizations should record the key requirements for services to be supported within the blade enclosure, including latency, bandwidth, flow characteristics, and criticality. Evaluation teams can then weight the importance of related I/O characteristics appropriately, resulting in the selection of a solution optimized for the organization's needs. For example, an enclosure supporting an enterprise desktop backup solution would have significantly different requirements than a hosted virtual desktop deployment. Table 1 is an example that evaluation teams can reference in this process.

Table 1. Sample Table for Application Characteristics Served Via Enclosure

Applications and Use Cases for Services Residing Within Enclosure

Business-Criticality

Bandwidth Required

Ultra-Low Latency Required

Traffic Flows

Hosted Virtual Desktops

High

Medium

No

N/S, E/W

User-Facing Services

High

Low

No

N/S

Internal SOA/Web Services applications

High

Low

No

E/W

Virtual Machine Mobility

Medium

High

No

E/W

Remote Storage Replication

Medium

High

No

N/S

Server Backup

Low

High

No

E/W

Overall Chassis Summary

Medium to High

Medium to High

No

N/S, E/W

Source: Gartner (May 2013)

Evidence

1 See "Market Share: Computing Platforms Worldwide, 2012"