
|
Overview

|

|
This research examines the alternatives for controlling the cost of storage and makes specific recommendations to help users rise to the challenges of growth.
- As the economy recovers, storage growth is expected to return to the 60% to 70% range.
- Technologies and techniques exist to help users deal with storage growth and its consequences.
- The conversion to thin provisioning is freeing up 30% to 60% of existing storage.
- Storage quality of service (QoS) features are increasing the Serial Advanced Technology Attachment (SATA) hard-disk drive (HDD) share of the storage system market.
- Storage growth usually requires modernization of the storage environment.
- CIOs and I&O leaders should create a technology and policy road map for dealing with storage growth.
- Implement thin provisioning and storage tiers, preferably with an automated product.
- Deploy data reduction technology in backup and archiving today, and in primary data in the future.
- Get rid of old data and retire/replace quota systems with archiving systems and policies.
|
|


|
Table of Contents

|


|
What You Need to Know

|

|
In 2008, the total storage capacity purchased with enterprise storage arrays was 5.829 exabytes (see Note 1). Gartner's forecast for 2013 is that 51.734 exabytes will be sold, a nearly ninefold increase in just five years. However, users don't purchase, install and then get rid of storage after a year or so. This means that purchases compound capacity increases over three to five years. The total capacity forecast for the five-year period 2009 through 2013 is 121.669 exabytes, which means that, on average, users will purchase more than 20 times the additional capacity during that period than was purchased in 2008. And this period starts with a recession and the slowest storage capacity growth seen in many years. Obviously, individual enterprises will vary on either side of this growth rate.
The rate of recovery from this recession is probably the biggest wildcard in the forecast, followed by rates of development and adoption of technologies that reduce the amount of actual storage required for a given amount of data. For example, deduplication is mostly a backup technology, although it is now also used for archiving. As it becomes more widely available for primary data, it will tend to slow growth. Finally, archiving and retention policies are increasingly being examined by enterprises, and the rate of adoption could also affect the forecast.
Action Item: Enterprises should have a good history on storage capacity, utilization and other factors affecting capacity growth, and they should use this to generate a forecast of future storage needs.

|
|


|
Analysis

|

|
IT organizations that have been successful at stopping or slowing the actual growth of data are extremely rare. Industrywide, storage capacity grew by 79% in 2008, compared with 2007. The good news is that annual storage growth for 2009 has slowed to 39%, from averages of 60% to 70% compounded over several years prior to 2009. The bad news is that this growth is expected to return to the 60% to 70% range as the economy recovers from the recession.
These compounded high growth rates are not accidental. Sometimes it seems that everywhere you look you find something else driving up storage capacity. Gain a new customer or hire a new employee, and storage capacity goes up. Come to work another day and the output of your applications will have increased storage capacity. And nothing ever seems to take information away. It just constantly expands. Richer forms of media, such as audio, pictures and even video, accelerate the pace of this growth, compared with earlier years. In fact, with 2009 the sole exception, in each of the past 10 years, the storage industry has sold more than 60% more disk capacity than the previous year.
Fortunately, there are technologies and techniques that can help IT organizations address storage growth and its consequences. The remaining sections deal with the possible directions users can take in addressing these issues.

1.0 Improve Utilization and Buy Less Storage: Thin Provisioning
On a non-thin-provisioned disk array, storage capacity is consumed as soon as it is allocated to an application, whether data is written to that capacity or not. On a thin-provisioned disk array, storage asset utilization is increased because the disk array delays the consumption of disk capacity until an application actually writes data to that capacity. Thin provisioning eliminates the waste associated with all the allocated, but unwritten, storage capacity buffers assigned to individual applications and inaccessible by others. With thin provisioning, it is possible to allocate far more storage capacity than is available on the disk array at the time. And, when new capacity is to be assigned to an application, it can, in most cases, be completed immediately, because all applications share a common free space pool.
The only issue with thin provisioning is that administrations need to monitor the free space available in the disk array to ensure that it does not fall below estimated peak levels, and to allocate additional storage to the pool when needed. However, this is inherently easier than monitoring storage utilization and growth rates on a server image basis, because there are substantially fewer storage systems than server images in most user environments.
Benefits associated with thin provisioning include optimized storage utilization, reduced storage hardware and software costs, less heat dissipation, power and space consumption, reduced provisioning complexity and time, as well as potentially lower storage management and administration costs. The following data summarizes information gleaned from numerous inquiries with users and vendors:
- Typical utilizations:
- Direct-attached storage (DAS), unmanaged: 10% to 40%
- Storage area network (SAN), managed/manual provisioning: 40% to 50%
- Thin provisioning: 60% to 75%
- Thick-to-thin recaptures: 30% to 60%
- An average 25% of unutilized capacity is due to purchasing cycle
Action Item: Deploy thin provisioning where possible; however, if your current products do not support it, then factor it into your next storage purchase as a major requirement.

2.0 Deploy Data Reduction Technologies to Reduce Required Storage
Compression is a well-known technology that uses algorithms to encode data so it can be represented by fewer bits. This technology has been used for years to increase the amount of data that can be stored on tape, as well as with virtual tape library systems.
Single-instance store (SIS) defines a data-reduction technology in which one copy of a file is retained (stored), and the other occurrences point to the single retained copy. SIS algorithms generally rely on hashing algorithms (such as MD5 or SHA-1) to identify duplicate content and, therefore, do not require that the file belong to the same owner or have the sale filename to be recognized as a duplicate. Common examples include Microsoft Exchange (prior to Exchange 2010), EMC Centera, Iron Mountain's Connected Backup and most archiving products.
Deduplication takes the data reduction concept of SIS further. Deduplication uses data identification and comparison algorithms to dramatically reduce the space requirements of individual objects. It does this by storing only unique "chunks" of data, thereby eliminating redundancy. While SIS detects and eliminates duplicates on a file basis, block-level deduplication examines redundancy at a subfile level, comparing previously stored data.
Action Item: These technologies are available for backup and archive environments; however, users should track the evolution of products that will enable primary data to be reduced without a significant performance penalty.

3.0 Reduce the Cost of Growth: Storage Tiers
Most storage companies are delivering QoS features that lower storage costs by migrating data from high-performance disks to high-capacity disks. These features take virtualization to the next level by allowing a virtual volume to move between different storage pools. A more-flexible implementation of this feature allows a virtual volume to span different storage pools and non-disruptively move chunks of the virtual volume as performance, protection and cost needs dictate. This approach is radically different from solutions that recognize software abstractions, such as file systems and databases. It does not rely on software tools to monitor file accesses across complex input/output (I/O) paths, and it does not rely on tools that analyze database accesses to identify stale records. Instead, it relies on metadata that the storage system maintains to identify candidate volumes or chunks of volumes to be migrated. This makes it entirely server- and application-neutral conceptually, it is optimization at the storage system level.
SSDs can be used to increase storage utilization by avoiding low-utilized, redundant arrays of independent disks (RAIDs) with short stroked disks. Movement of data to SATA disks produces savings not only from the reduced acquisition cost/gigabyte, but also from the lower power and cooling costs, compared with higher-performance drives. Formal definitions of storage tiers should include tape, archive and special requirements, such as compliance.
Action: Users should implement QoS functionality to reduce the cost of storage or place more data online to improve service-level agreements (SLAs).

4.0 Give the Problem to Someone Else
Giving the problem to someone else doesn't really address the problem, unless an alternative is available to unblock limitations in staff or budgets, and unless the SLAs in the agreement direct a solution. However, users have several choices for acquiring and managing storage:
- On-premises assets:
- User-integrated
- Professional services integrated
- Key applications provides security and control of data
- Hosted services/service providers:
- On-premises
- Off-premises
- SLAs OK? potential cost reduction; good for speed of deployment
- Cloud computing
- Public clouds
- Private clouds
- Latency OK? potential cost reduction good for Tier 2/Tier 3 data
Cloud computing is a paradigm shift that will redefine the relationship between buyers and sellers of IT-related products and services. It is an alternative delivery and acquisition model for IT-related services. A definition of the cloud is an abstract statement of the fact that IT services (or capabilities) are being made accessible through the Internet to anyone that has the wherewithal to buy and use them.
IT services delivered through hardware, software and people are becoming repeatable and usable by a wide range of customers and service providers. Backup has been offered by service providers for years, and archiving as a service has grown rapidly during the past two years. New technologies (cloud enablers) are providing a new platform for backup and archive service and primary storage providers to build on. Some corporations may choose to build out their own private cloud infrastructures to enable greater scaling and access to data, but they should allow for all resources to be owned and managed internally.
Action Item: When looking to modernize the storage architecture, include expanded delivery options in the evaluation.

5.0 Stop or Slow Growth: Quotas, but at What Cost?
Quotas are not new. The vendors and products are well-known. The code is proven, and it has been implemented widely for e-mail and file storage applications. As such, it is one of the few techniques that enable an IT department to significantly slow the growth of storage when it has been deployed. The problem is that quotas shift the burden of dealing with growth from the IT department to the organization's employees, customers and others. It seems reasonable in concept to ask people to manage the size of their storage, but, in practice, the value of the time involved greatly exceeds the cost of throwing hardware (more storage) at the problem.
At today's prices, it takes only a few minutes spent reacting to a quota limit to pay for the cost of an incremental gigabyte of storage for that user. Even when the full burdened cost of that gigabyte is taken into the equation, the value of the time spent far exceeds the cost. Of course, few, if any, organizations calculate the lost productivity costs, so IT departments are not usually questioned about the decision to implement quotas.
Action Item: Consider carefully the full productivity costs before implementing or renewing a quota system to manage growth. Where implemented, study the limit policies for opportunities to reduce employee time spent on quota management.

6.0 Remove Old Data: Archiving and Policy
Most applications hold a significant amount of data that is relatively inactive or not accessed at all. Yet the data that is active often generates significant I/O activity, requiring good response time. This requires expensive disk space, but archiving software can use policy to separate the two, and, at a minimum, move the infrequently accessed data to a lower-cost tier. The question is whether an organization can define a policy and process to eventually destroy the data when it is no longer useful. Time and disuse are often inadequate criteria to define policy. A review of importance or other factors (such as risk and compliance) is often required to make the final decision. When partial deletion or other accidental loss of data is likely, it is sometimes better to bias action toward complete destruction, rather than risk misinterpretation of partial information.
Where the data belongs to individual users, it is possible to "give it back" by establishing a rotating process that takes old data off the system and returns it to the user on CD/DVD. We have seen this work in a number of smaller organizations, but larger organizations should be cautious. This approach has the potential to create a nightmare in managing organization policy around security, retention, termination and other areas. However, it removes the data from systems, and relieves the IT department of the otherwise lifetime sentence to own and protect it.
Action Item: Organizations should review/create policy and process around the retention and destruction of information and use this policy to get rid of unused data on systems.

7.0 Control the Growth of Backup Storage
Historically, many organizations have used backup tapes to archive data. This has caused almost endless growth of backup storage, and has compromised litigation and corporate retention policies. Best practices recognize that archiving and backup are two different processes, with different purposes and different information. Data to be archived should be carefully selected, saved in a complete and usable form, and copies stored under archiving policy in a way that makes them retrievable, protected and controlled for the data's usable life. Backups should be kept for relatively short periods of time, typically a few weeks at most, and organized for one purpose only recovery.
A modern trend is backup to disk, rather than tape, The short retention policy makes this more feasible, and deduplication technology can yield anywhere from 15 times to 30 times reduction in actual disk capacity required. A typical rule of thumb for sizing a deduplication target is a full copy of the data plus an additional 1% for each day of retention to hold changes. Another trend in backup to disk is the use of space-saving snapshots on arrays at both primary and secondary sites. These snapshots require capacity sized as only the 1% per day of retention as a rule of thumb. Of course, if you get rid of data on systems, then it no longer will appear in backups; this is the best way to reduce backup space.
Action Item: Separate archive from backup, and choose backup retention based on recovery needs only, which are usually much less than 90 days.

8.0 Storage Growth Drives the Need for Increased Storage Administration Productivity
Many of the technologies and techniques described in this research can improve storage administration productivity. Gartner forecasts that hierarchical storage management (HSM) and archiving will be the fastest-growing storage software segment, with an 18.9% compound annual growth rate through 2013. Archiving solutions are beginning to incorporate file and application capacity reporting with their data movement functions.
Storage arrays are becoming more intelligent, and the addition of more-sophisticated management features can dramatically lower administrative requirements. Although some of these features are only available from a handful of vendors, expect all major providers to deliver these capabilities soon. On a thin-provisioned disk array, storage asset use is increased, because the disk array delays the consumption of disk capacity until an application actually writes data to that capacity. Thin provisioning eliminates the waste associated with all the allocated, but unwritten, storage capacity buffers assigned to individual applications and inaccessible by others.
Historically, storage has often been treated as a line-item expense. This worked well when there was only a single QoS available. Today, there are differentiated service levels therefore, different cost structures, across primary and secondary storage solutions. Chargeback and billing storage as a service can help match the solution to the business value of the data.

9.0 Conclusion: Controlling the Cost of Storage Growth Requires Modernization
No single vendor provides SAN or network-attached storage (NAS) products that contain all the technologies discussed above. Moreover, most arrays installed in IT departments provide none or only a few of these technologies. The implications of this are obvious if users want to control costs resulting from the growth of storage, modernization of the storage environment is required. However, it's important to "look before leaping," which often requires improvements in processes and management. Some suggestions:
- Understand your data and storage environments first, then buy technologies
- Improve monitoring, reporting and forecasting
- Implement and use a storage dashboard
- Understand application storage performance and utilization patterns important for thin provisioning and solid-state disk (SSD) placement
- Formalize storage resource management (SRM), but keep it simple
- Be proactive, rather than reactive
 © 2010 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. or its affiliates. Reproduction and distribution of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner's research may discuss legal issues related to the information technology business, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The opinions expressed herein are subject to change without notice.
|
|
|
|
|

During the period 2009 through 2013, enterprises will purchase and install 20 times more terabytes of storage than they did in 2008.
|
|
|

Exabyte (EB) = 1,000,000,000,000,000,000 =
1018
bytes or 1 billion gigabytes
|
|
|