Compaq logo: Link to Compaq.com home page HIGH PERFORMANCE SOLUTIONS

Inside This Issue:

Cover article

Gartner Research Note: Compaq TruCluster Server UNIX-based Clustering Solution

Compaq White Papers:

Cluster File System in Compaq TruCluster Server

Using Compaq's TruCluster Server software in a Web server environment

Flexible Workload Management with Compaq Tru64 UNIX Resource Management

Compaq continues to build on its TruCluster Server leadership

Compaq is first to enable dynamic expansion of Oracle9i  Real Application Clusters databases

Compaq introduces Tru64 UNIX Campus Wide Disaster Tolerant Clusters

Compaq's Investment Protection

Gartner Research Note: Compaq Rolls Out Customer Assurance Program

AlphaServer Family

Compaq TruCluster Server UNIX-based Clustering Solution

Summary
TruCluster Server, Compaq's Unix clustering software for its AlphaServer products, is a highly integrated and comprehensive clustering solution.

Table of Contents
List of Tables
Table 1: TruCluster Server v5.1a: Price List
Table 2: Comparison: Unix Server-Clustering Solutions
Table 3: High-Availability Agents (a.k.a. Scripts) for Third-Party Applications

Corporate Headquarters
Compaq Computer Corp.
20555 State Highway 249
Houston, TX 77070, U.S.A.
Tel: +1 713 370 0670
Fax: +1 713 514 1740
Internet: www.compaq.com

Overview
All of the major Unix server vendors have a high-availability cluster offering. Furthermore, all the Unix server vendors have an offering that includes, or has available as an add-on, support for a parallelized database. Beyond this, the high-availability clustering solutions from major Unix server vendors are all different and proprietary, with some solutions emphasizing ease of management, while others have focused their efforts on supporting larger clusters. Compaq's TruCluster Server software stands out as a single product offering that simultaneously addresses application failover, parallel database support, horizontal application scalability, combined with advanced single-system image features that simplify cluster administration.

Compaq announced TruCluster Server v5.0 in 1999 to coincide with the release of version 5 of the Tru64 UNIX operating system. In September 2000, Compaq announced version 5.1 of TruCluster Server, adding features such as direct I/O as well as support for the GS series of AlphaServers. The latest version, 5.1a, was released in October 2001 and supports 100 Mbps and Gigabit Ethernet as the cluster interconnect. Compaq's Memory Channel 2 is still available as a cluster interconnect and delivers better performance with its greater bandwidth and lower latency.

TruCluster Server v5.1a allows up to eight nodes to be configured in a cluster. A node can be either an AlphaServer or a partition within an AlphaServer – thus allowing failover between partitions within one server or even between partitions in different servers. The server nodes in the cluster are connected using the switch-based, high-bandwidth, low-latency PCI Memory Channel 2, or for applications that are not passing large volumes of data through the cluster interconnect, commodity 100 Mbit or Gigabit Ethernet may also be used. (To ensure that the cluster interconnect does not represent a single point of failure for mission critical applications, redundancy of the cluster interconnect is supported.) TruCluster Server also includes a distributed lock manager that supports concurrent access to cluster-wide resources, including databases. Storage devices in a TruCluster Server are configured as in a "shared nothing" cluster. However, the clustering software allows any node to access any disk in the cluster transparently. To users and administrators, it appears as if all storage is "owned" by the cluster as a whole.

Single System Image
Configuring a cluster for high availability, getting the cluster up and running, and then managing that cluster can be a daunting task. At worst, each node must be independently configured, made operational, and then administered separately. Then there is the added burden of managing the entity called a cluster. Therefore, a clustering solution that simplifies and streamlines server and cluster management can be a real time and money saver.

When the cluster has a single system image, it is easier to work with the cluster in many ways: easier for users to connect to, easier to manage, easier to change, and easier to develop highly available applications. A cluster with a single system image provides many benefits:

  1. With a cluster name, users do not need to know individual node names; they can simply access the cluster and will be attached to a server in the cluster.
  2. Applications can be easily moved from one cluster node to another, since files and I/O devices are accessed the same way from every cluster node.
  3. Creation of failover scripts is simplified for the same reasons.

TruCluster Server software provides a set of services that allow the cluster to appear as a single system from the differing perspectives of a user (client), administrator, another (client) networked computer not part of the cluster, and application developer, rather than its actual topology as a collection of servers on a private LAN.

Thus from a client perspective rather than seeing several servers, the computing power of the cluster is the sum of the computing power in each of the nodes. Similarly, all disk devices and network connections are seen as a single set of resources on the superserver that is the cluster.

From a developer's perspective, the resources and files that are distributed across multiple nodes can be addressed the same way from any node in the cluster and they can be used as local devices rather than networked devices: e.g., file system calls.

From an administrative perspective, the cluster and its resources are managed as if they were a single system, rather than a collection of servers on a private LAN.

Global File System
The Global File System of TruCluster Server ensures that there is a unique name and file identifier for each file, which is accessible to all applications executing on the cluster. Additionally, Compaq's global file system implementation has a shared root that further enhances the manageability of the cluster by eliminating the need for each cluster node to have its own private copy of the operating system and its configuration files. For example, if the administrator needed to install a patch on the operating system running on all of the servers in the cluster, the administrator should only have to install that patch once, and all the nodes in the cluster running that application would run the patched version.

Analysis
The key feature of Compaq's TruCluster Server is a cluster file system with a shared root. Compaq's cluster file system supports global device naming and access, and allows the cluster to be managed as a single system. This means that the configuration need only be specified once for the entire cluster; each individual node does not have to be configured separately. Similarly, software and patches are installed once for all the nodes in the cluster, since the single system image is shared, and the cluster itself is a single security and management domain. Furthermore, Compaq uses the same administrative interface whether managing a single AlphaServer system or a TruCluster Server cluster.

With the TruCluster Cluster File System (CFS), I/O paths and names of resources are the same for all cluster nodes. This makes it relatively easy to create failover scripts for applications. Compaq provides its Cluster Application Availability (CAA) framework within the TruCluster Server software to assist customers in integrating their application into the cluster. When an application is moved to a new node, the I/O path and device names do not need to be redefined. TruCluster Server presents the same global namespace to each node in the cluster. TruCluster Server also supports a cluster alias. Users can connect to the cluster, and the clustering software will automatically connect the user to the least busy node.

Oracle and TruCluster Server
In a cluster, where there may be simultaneous access to the same data from multiple nodes, a Distributed Lock Manager (DLM) is used to synchronize access to ensure that the data remains consistent. In addition to the Distributed Lock Manager that is supplied with products such as Oracle Parallel Server and Oracle 9i Real Application Clusters (RAC), Compaq's TruCluster Server includes a Distributed Lock Manager that can be used by application developers to ensure data integrity for parallelized applications.

To maximize performance and achieve consistency across the many platforms on which the Oracle database is deployed, Oracle has long preferred to directly manage their own disk I/O and memory usage rather than use Unix operating system services. Disk I/O and memory usage are two of the most important parameters that impact database performance. Although there are standard application programming interfaces (APIs) to these operating system services, implementations between vendors vary with nuances that can impact database performance. These slight differences make it difficult to achieve efficient performance across a wide range of vendor equipment.

Oracle 9i RAC has ambitious goals with respect to database performance, availability and manageability that are independent of the platform on which the database will run:

  • The Oracle9i RAC database should be viewed as a single database, even though it is spread across multiple systems for increased performance.
  • Each server in the cluster can act as a failover server for every other server in the cluster to provide continuous availability.
  • The database is administered as a single database image.
  • Applications are written the same way for the single system Oracle9i and parallel system (Oracle9i RAC) database versions. There is no difference to the application developer who must develop applications for a single system or a cluster.
To ensure that these goals can be met, Oracle will be including new software called Portable Clusterware, as part of Oracle 9i RAC. In a vote of confidence for Compaq's TruCluster Server implementation, in February 2000, Compaq and Oracle announced a multiyear technology and business partnership that would tightly integrate components of Compaq's Tru64 UNIX cluster technology into Oracle9i Real Application Clusters to create Portable Clusterware.

Pricing
Table 1
TruCluster Server v5.1a: Price List
[return to List of Tables]
  List Price per Node (US$)
TruCluster Server v5.1a 3,000-48,000 (depending on server)

GSA Pricing

Yes.

Competitors
How should a customer compare clustering solutions? That will vary depending on each customer's requirements, of course, but there are basic functions that each solution should be measured against. These comparison factors include:

  • Number and selection of available agents or scripts to allow ISV applications to take advantage of the cluster's availability and/or scalability attributes.
  • Number of nodes in the cluster.
  • Whether or not the cluster presents itself as a single system image.
  • Speed (or low latency) of the cluster interconnect (the faster, the better – especially for scenarios such as parallel database clusters or clusters offering concurrent file access).
  • Capability to change the cluster's configuration in a variety of ways, with the least impact on the users.
  • Types of load balancing supported by the clustering solution (the more extensive, the better).
The table "Comparison: Unix Server-Clustering Solutions" compares Compaq's TruCluster Server against its major competitors, based on these comparison factors. The clustering solutions that compete against TruCluster Server are HP's MC/ServiceGuard, IBM's HACMP, and Sun Cluster 3.0 from Sun Microsystems.

Table 2

The table "High-Availability Agents (a.k.a. Scripts) for Third-Party Applications" lists the agent software, or scripts, that each vendor offers with its clustering solution. These agents help ensure that the application performs as expected within the cluster.

Table 2

Strengths

Easy to Manage
Of all of the products examined in this report, Compaq's TruCluster Server is the closest to achieving a single system image. It is the only solution that boasts a cluster-wide file system with a shared root. The benefits provided by the single system image in TruCluster v5.1a are numerous. For example, the configuration for the entire cluster need only be specified once, and software and patches are installed just once for all nodes in the cluster in the shared system image. There is a single security domain for all of the nodes in the cluster, and the cluster knows each I/O device by a cluster-wide name. Overall, Compaq's TruCluster Server creates the easiest cluster to manage out of all the solutions examined in the competitive section of this report.

Many Application Agents
There are a large number of availability scripts offered at no extra charge from Compaq for third-party software in the cluster. When necessary, customers can write their own failover scripts – a relatively easy task because the I/O paths and names do not need to be redefined for a server that will take over for a failed server.

Pre-Configured Solutions
Configuring and deploying a cluster can be an extremely complicated task. Compaq provides pre-configured solutions for high availability that have all the tough problems solved (including the environmental issues of power redundancy, cabling redundancy, etc.).

Limitations

Product Roadmap
Compaq's June 2001 announcement that it would standardize its various server product lines on Intel's Itanium processor architecture by 2004 means that at a minimum, today's AlphaServer customers may face a transition from current Alpha microprocessors to Itanium. This is not an unusual situation as it is faced by customers of many other Unix server vendors too.

However, the September 2001 announcement of a planned merger between HP and Compaq to achieve a projected cost savings for the combined company means product eliminations from the two company's broadly overlapping product lines. Prospective TruCluster Server customers need to ask Compaq for assurances about the future of TruCluster Server should the merger take place. Specifically, if enhancements to Tru64 UNIX are stopped, will TruCluster Server also fade away? Or will Compaq commit to porting TruCluster Server to the HP PA-RISC-based servers and, eventually, to HP's and Compaq's servers with IPF (Intel's Itanium Processor Family) processors?

Limited Number of Nodes
TruCluster Server only supports a maximum of eight nodes in a cluster. Sun supports only eight nodes, too, but Sun's largest node (the Star Fire 15000 Enterprise) supports more processors per node.

Insight
Management of a cluster is one of the most important features to consider when choosing a high-availability cluster solution, and a cluster with a single system image, such as TruCluster Server provides, will be the easiest to manage. Within its scalability limits of 8 nodes with 32 processors each, TruCluster Server is a good choice not only as a deployment platform for mission-critical applications but also for applications that require horizontal scalability to accommodate widely fluctuating workloads. Compaq offers a large number of availability agents for important ISV applications at no extra charge. TruCluster Server is an excellent choice for customers already running Tru64 UNIX. If and when the merger between HP and Compaq occurs, we hope that TruCluster Server will be established as a strategic product for the combined company, making it available to an even larger population of customers.

Server Clustering
Unix clusters were first introduced in the early 1990s to increase application availability. If the server on which the application was running went down, the application would automatically be started on another server in the cluster, hence increasing application availability by application failover. For a long time, the word "cluster" was synonymous with application failover. Increasingly, clusters now describe several different types of multiple computer systems. "Cluster" products can be used to:

  • Increase application availability.
  • Increase application scalability by supporting parallelized applications (e.g., Oracle OPS, high-performance technical computing).
  • Increase application scalability by running multiple independent copies of the same application on different cluster nodes and load balancing incoming requests (horizontal scalability).
  • Simplify day-to-day administration for multiple servers through command replication or by providing features that make the cluster and its resources appear as a single system.
A cluster consists of a number of servers linked together through a high-speed private network. In its simplest form, the private network is a commodity LAN. For enhanced performance and availability, specialized proprietary cluster interconnects, such as Compaq's Memory Channel 2, are used.

Each server in the cluster, referred to as a node, continuously checks the state (i.e., health) of the other nodes. If one node becomes unavailable, its workload can automatically be restarted on one or more other nodes in the cluster. In order to ensure that an application can execute on the node to which it has been moved, the resources (particularly the data) that it is dependent upon must still be accessible.

All clustering solutions must have a way of making disk data available to more than one node. There are two different architectural frameworks for doing this: "shared nothing" and "shared everything." In a shared nothing model, disks can be accessed from only one node at any time. The server with the logical connection to the disk "serves" the data to other nodes over the cluster interconnect. This incurs latency and performance penalties on the "served" node. In a shared nothing model, the disks can be physically attached to more than one node, or a mirrored copy of the data is accessible from a different node. When a node failure is detected, the clustering software passes disk ownership/master copy to another node with physical access and restarts the application on the selected node.

In a shared everything model, disks are "served" from a disk/file server and can be directly accessed from each node. The limitation of this alternative is cluster interconnect saturation. However, when a node fails there is no disruption in data access, since data is directly accessible from all nodes.

Gartner Datapro Product Report DPRO-94885, 6 February 2002, Jane Wright.

top

High Performance Solutions Web Letter is published by Compaq. Additional editorial material supplied by Gartner, Inc. © 2002. Editorial supplied by Compaq is independent of Gartner analysis and in no way should this information be construed as a Gartner endorsement of Compaq NH' products and services. Entire contents © 2002 by Gartner, Inc. All rights reserved. Reproduction of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The reader assumes sole responsibility for the selection of these materials to achieve its intended results. The opinions expressed herein are subject to change without notice.


 
privacy statement and legal notices