What do Continental Airlines, the London Stock Exchange, Hershey Foods, eBay, AT&T, MCI WorldCom, Microsoft, the New York Stock Exchange and the Federal Aviation Administration have in common? They all experienced high-profile downtime and received enormous press coverage for all the wrong reasons. For most, the outages resulted in lost revenue. For all, the downtime tarnished their company image and reputation.
Hershey Foods experienced an enterprise resource planning system rollout debacle in 1999, which prevented it from shipping products during the critical Halloween season. The cost of this mishap: a 19 percent drop in net income for 3Q99. In January 2001, Microsoft suffered a three-day outage of many of its Web sites because of a domain name service (DNS) configuration error — a common cause of failure for many enterprise Web sites. On 8 June 2001, a 90-minute system failure at the New York Stock Exchange showed, once again, that even the most-sophisticated IT environments are vulnerable to failure usually caused by human or process error.
What conclusions can be made about downtime as we move toward collaborative commerce, with expanded integration of business processes across enterprise boundaries? First of all, enterprises built on shaky foundations will incur it. And secondly, now that downtime is public information, it will tarnish a company's image and reputation.
Gartner research shows that an average of 80 percent of mission-critical application service downtime is directly caused by people or process failures. The other 20 percent is caused by technology failure, environmental failure or a disaster. The complexity of today's IT infrastructure and applications makes high-availability systems management enormously difficult (see "Making Smart Investments to Reduce Unplanned Downtime," TG-07-4033 ).
When enterprises invest in achieving higher levels of application service availability, they tend to focus on increasing the redundancy in the environment. Although redundancy is critical to providing high levels of availability, it cannot and should not be the only line of defense. Enterprises must also mitigate downtime risks caused by people and process failures. For these causes of downtime, strong IT operations and applications development processes are required. IT operational processes are vital to application service availability, but are often overlooked — especially in distributed application environments — because of architecture/infrastructure complexity, immaturity of the processes and tools, and a lack of commitment to the IT resources needed.
Applications requiring high levels of availability must be managed with operational disciplines — also known as network and systems management (NSM) disciplines — to avoid unnecessary and potentially devastating outages. The following are proactive operations management disciplines, which have direct and high returns from an application availability perspective:
Enterprises cannot offer consistently high levels of availability without maturing IT management processes. By investing in these processes, enterprises can mitigate their exposure to the majority of application service downtime risks involving people and process error.
This month’s Spotlight focuses on storage management, desktop management and performance management — three critical management processes affecting application service availability.
“Delivering the Right Amount of Data Availability,” T-13-9757 , by Mark Nicolett, explores proactive management processes related to keeping storage and data available and protected.
“Desktop Software Configuration: Key to Desktop Availability,” T-13-9394 , by Ronni Colville and Donna Scott, offers insight into best practices for achieving higher levels of desktop availability.
“Performance Management: A Framework for Success,” COM-13-9854 , by Milind Govekar, offers insight into best practices for performance management.
Next, we tackle the issue of problem management. Enterprises must measure their performance to know how they are performing.
“Service Desk Metrics: Time for a Change,” SPA-13-5979 , by Kris Brittain and Steve Cain, explores how service desk quality metrics are changing with the increased use of Web-based self-help tools.
“24/7 Is a Management Thing,” SPA-13-2616 , by Andy Kyte, explores the trade-offs among availability and cost — focusing specifically on problem management for extended applications — used by external customers, suppliers and business partners.
“The High Cost of Achieving Higher Levels of Availability,” SPA-13-9852 , by Donna Scott, takes a look at why costs increase exponentially as application service levels rise, and the levels of availability that most enterprises consider “good enough.”
|Resource Id: 334197|