Research from Gartner
Organizations Will Need to Tackle Three Challenges to Curb Unstructured Data Glut and Neglect
Increased data growth propelled by the Nexus of Forces has created an unstructured data nightmare. To effectively manage data growth and security, information managers will need to deploy the right tools, and educate employees organizationwide on how to overcome instinctual data hoarding.
Impacts
- Information managers are unsure of what tools are available to assess scope and risk of the enterprise data footprint, stifling most early stage information management programs.
- Lack of full understanding of regulatory requirements and hope for analytics prompts organizations to retain all information, creating challenges for information managers.
- Everyone is a "data hoarder" by nature, and new storage options are propelling corporate data glut and hindering data deletion efforts by information managers.
Recommendations
- Purchase a file analysis product to get a picture of the data demographics, emphasizing redundant, outdated and trivial data along with sensitive and personally identifiable information.
- Engage the CIO and CDO in creating retention policies; these individuals should then, in turn, engage their peers in legal, risk, compliance, security, business intelligence and lines of business across all regions.
- Use the data gathered through file analysis, including potential cost savings, to assist each business unit to dispose of or mitigate risk of unstructured data.
Strategic Planning Assumption
Through 2020, less than 10% of organizations will find value in "dark data."
Analysis
Most of us are guilty of "data hoarding." Without so much as a thought, we save every digital photo, email, document, presentation and spreadsheet, losing track of what we have saved along the way. Across the enterprise, employees are blindly building a bottomless lake of "dark data," and, in many cases, a corporate mantra of "save everything, just in case" is encouraging the behavior.
Additionally, the Nexus of Forces has opened up myriad ways to create, store and access user-controlled data. No one really knows the true scope of enterprise data glut. This is because the IT organization has, until recently, been in the dark. It has been restricted to storage resource management (SRM) and search tools, which are often not deployed and provide little to no functionality for determining if data in particular, unstructured data has any real business value or if it is sheer waste.
So what do we do? And, perhaps more importantly: given dropping storage costs, does uncontrolled data growth even matter?
Uncontrolled data growth does matter. Client inquiries suggest that, for many organizations, around 30% of data is redundant, outdated or trivial (ROT). Inquiries also suggest that around 50% of data has an indeterminate value, while the remaining data is mission-critical. Assuming a midsize storage environment, with between 1PB and 4PB of raw capacity, and a storage total cost of ownership (TCO) of $2,325 per TB raw or $3,092 per TB usable (assuming 75% of raw capacity being usable), this equates to $927,600 to $3,710,400 in wasted spending on ROT.1 Moreover, if the 50% of data with indeterminate value proves to be waste, these numbers skyrocket, resulting in unnecessary storage costs of $1,546,000 to $6,184,000.
Making matters worse, storage teams typically throw more and more storage at the ballooning data problem. In fact, a recent Gartner survey found that 51% of survey respondents felt that, when it came to the general management of storage, "not managed besides purchasing more …" best described their strategy (see the Appendix).
The good news: storage hardware costs continue to drop. However, hardware only accounts for 48% of the storage TCO.1 And, even if the TCO could drop to zero, another problem would still remain: keeping everything not only can lead to extremely costly and damaging issues of noncompliance, but also creates a bigger pool of sensitive and personally identifiable information (PII), vulnerable to improper access.
What do we do about all of this data? This answer is two-part. First, information managers must implement a data policy and management program, if they haven't already. Second, they must recognize that, for such a program to tackle data growth and deletion, it must include the deployment of the necessary tools to identify the problem, while also dealing with the people-related challenges of hoarding and misguided thinking. In covering these topics, this research focuses primarily on end-user-controlled data, such as files, images and objects.
Figure 1. Impacts and Top Recommendations for Managing Data Growth
Source: Gartner (June 2015)
Impacts and Recommendations
What You Need to Know
Information managers are unsure of what tools are available to assess scope and risk of the enterprise data footprint, stifling most early stage information management programs
We can't deal with the data problem until we can see it and understand it. And, with as much as 80% of enterprise data now being unstructured data (according to Gartner estimates), hope is available with file analysis products. Information managers can now scour unstructured (and in some cases structured) content, and perform standard and customized metadata analysis. Using such a tool, information managers can, for instance, quickly determine duplicate files or files that belong to employees that are no longer with the company. In addition, many file analysis products have "content awareness" for PII, payment card industry (PCI) and personal health information (PHI) identification.
After analyzing files, information managers can then use the resulting reports to inform their strategic initiatives for getting end users to delete data (for example, by making a number-driven decision on a reasonable inbox size) or for building a business case to present to the CIO and other executives. Aside from disposing of unwanted data, these tools can also assist in identifying data of value, such as data that should be tagged as records or data that can be filtered into analytic programs.
Recommendations:
- Shed light on the unstructured data problem with the help of a file analysis product. One of the issues we always encounter is that the subject of data glut is too abstract for most people to understand. Information managers should consider a file analysis to get a picture of the enterprise unstructured data footprint, using the tool to home in on ROT, and on sensitive and PII as a starting point.
- Begin to proactively classify data. As organizations look backward through a "file analysis" lens, this can lead to unstructured data management best practices and begin to open the possibilities of data classification policies that tag data, with human oversight, at the point of creation based on the organization's business needs.
- Ensure buy-in and enforcement by the chief data officer (CDO), data scientists, information officers and others who have an interest in analyzing corporate data. Nothing happens until someone gets excited. Information managers will need to gain C-level sponsorship if they want to educate the organization and the individual. With sponsorship gained, they will then need to create a cross-functional working team that is able to make decisions regarding corporate data value, classification, tagging, migration, analysis and disposal.
Lack of full understanding of regulatory requirements and hope for analytics prompts organizations to retain all information, creating challenges for information managers
Organizations have regulations that need to be adhered to. All too often, lack of full awareness of these regulatory requirements leads to a policy of "keep everything just in case." Ironically, this kind of behavior is often a violation of actual regulations.
Keeping everything also presents a larger-than-necessary target for hackers. In the December 2014 Sony hack, for example, hackers accessed thousands of emails, including deleted items that never actually "went away." Sony noted that, posthack, it was changing its email retention policy from six years for emails with financial information to two years for all email, unless that email is on legal hold.2 Organizations need to understand and balance what has to be kept (for example, Barclays was fined $3.75 million in 2013, after failing to keep critical records3) versus what data exposes the organization to risk while not providing any value and not being required to retain.
Undisciplined regulatory adherence presents storage managers with an uphill battle, requiring them to educate and persuade the ranks. In addition, the hype and hope for big data analytics is only further increasing the problem. Many leaders, including those in IT, see "big data analysis" for lakes of unstructured data as the technology equivalent of dumpster diving, wherein they mine trash data for gold. With this mindset, all data begins to look as if it could be useful it's not.
Organizations need to walk a fine line between what value they want to generate from their datasets and what is actually possible. Tools, such as those for file analysis, can present a map of unstructured data. In addition, the work already done by master data management (MDM), business intelligence (BI) and data scientists can provide a decent representation of what is included in the structured dataset. By analyzing a combination of all the data and weighing corporate goals, regulatory requirements and viable usage of the data, information managers can help set realistic policies for data classification, storage, analysis and disposal.
Gartner inquiries suggest that less than 10% of organizations are even beginning to analyze dark data. And of this small group, the value of that data is still to be determined. This is not to say that analytics is not a good idea; however, it's a matter of "garbage-in, garbage-out," where analytics results are only as good as the underlying dataset that is being analyzed. Sixteen copies of the same spreadsheet are very unlikely to provide any more insight than one copy, but they are very likely to get in the way create more trash to wade through, so to speak not to mention that the extra copies further tax the IT infrastructure, and information management and storage teams.
To tackle these challenges, information managers will need to become compliance champions and big data realists.
Recommendations:
- Engage the CIO and CDO (if one exists), who should in turn engage his or her peers in legal, risk, compliance, business applications, business intelligence and the lines of business. An information governance program must be owned at a high level in the organization. Information managers should speak to their CIOs about the importance of such a program. If one is not already in place, they should also stress the need for a cross-functional team and the potential hiring of a chief data officer, so as to send the right message to the entire organization.
- Become a data realist. Information managers, and the organizations they belong to, will need to become big data realists, carefully weighing the cost of saving everything for data analytics versus the benefits. In doing so, they must recognize that "just in case" is not a valid business case. Working with the organization's business intelligence or analytics team is a good first step in identifying what can be analyzed effectively and what data might have higher corporate value.
- Manage and value information as an actual corporate asset. Although information is not generally recognized by the CFO or accountants as an actual asset, information managers should behave as if it is. This includes applying inventory management best practices to your information, and quantifying its economic value as if it were a balance sheet asset. Doing so will introduce information life cycle discipline, improve the often-casual attitude toward data, help information managers prioritize information-related initiatives, and enable the intelligent determination of data retention policies and practices.
- Determine information governance scope. Organizations that aim to govern all information will fail. Information managers, together with the cross-functional team, need to ask and answer, "What is the information that matters? How does our company and its business units create value, internally and externally, and what information can put a shape to that value?" Start with the 80/20 rule; look at the 20% effort in governing datasets that can return 80% of the desired results.
- Work with your compliance and legal departments to better understand their discovery needs. In many instances, e-discovery costs can be much higher than simply settling the case, given the sheer volume and sprawl of data. Information managers should consider going lean on data retention (and therefore making it easy to produce subpoenaed data), instead of "throwing in the towel" and being forced to ultimately recognize that it's cheaper to simply settle litigation in some instances.
Everyone is a "data hoarder" by nature, and new storage options are propelling corporate data glut and hindering data deletion efforts by information managers
Data hoarding is endemic, spanning the enterprise ranks. In general, people hold on to files for the following reasons:
- You never know when you'll need it. People think they might need a file someday, so keep everything "just in case."
- Proof of productivity. People believe their files are their work and, consequently, demonstrate that they are working.
- Fear. Employees believe the deletion of data might get them fired, whereas asking for another terabyte of storage is no big deal.
- Technology enablement. The Nexus of Forces in conjunction with new applications, file structures and unlimited cloud storage repositories all encourage employees to create more data, more versions and more copies, without encouraging the cleansing or organizing of that data.
- Deletion unwillingness and impossibility. Almost no one has the inclination or time to hand-sort his or her files; and the volume of those files may be so high that human intervention is futile, requiring automation.
- Out of sight, out of mind. Because most people can't see the data problem, they give it no thought.
Because many of the above reasons are deeply ingrained human tendencies, they will prove difficult to reverse, unless privacy or security is at risk. As a result, information managers will need to devise tactical solutions that will make good data management practices easy to implement.
Recommendations:
- Enforce accountability. Drawing on the results of file analysis, share the data waste of each business unit, along with the potential cost savings of deleting that data, with each line-of-business leader.
- Set a cloud storage bill threshold. Unlike on-premises or corporate-owned storage that is acquired once, amortized over time and has fixed maintenance costs, cloud storage bills may vary and quickly rise if too much data is transferred to the cloud too quickly drawing the attention of accounting all the way to the CFO. Information managers should review the data that will be moved to cloud storage and ensure it is the right data being stored in the right way. They should also set a contractual agreement with their cloud storage providers, requiring alerts when storage usage and costs reach specific thresholds.
- Make better use of storage tiering. Many organizations have not utilized storage-tiering capabilities as effectively as possible. This results in one or two tiers of expensive storage that is the total storage allotment in place for all applications, regardless of latency or access requirements. However, this is changing rapidly, through enhanced storage-tiering technologies now available in the market.
- Implement strong data management policies for user data and enforce them. Implement an archive, disable PSTs and encourage users to clean out their sent items folder and deleted items folder. Employ a strong data loss prevention process that protects information while also enabling users to work with data from a productivity platform of choice. Information managers should be aware, however, that employees may try to get around this problem by using other tools if quotas or retention periods are too severe.
Appendix
Survey Overview and Storage Management Results
Conducted in December 2014, Gartner's Storage Management Study surveyed attendees of Gartner's annual Data Center events in North America and Europe. Those surveyed were both from end-user organizations and experts in storage.
Figure 2, below, shows survey results for the question "When it comes to general management of your storage, which response best describes your organization?"
Study Objective
This research explores user organizations' approaches to storage management environments in order to identify future trends and to benchmark findings from previous studies.
Methodology
- Fielded via Gartner's Real-Time Research Center (RTRC) online survey kiosks at Gartner's annual Data Center Conference in North America and European Data Center Summit. The study was fielded during the month of December 2014.
- Sample of 177 respondents comprising IT individuals at end-user organizations with strong knowledge of their organizations' storage technologies and architecture.
- No regional, company size or industry screening or criterion was used.
- Greatest representation is from large organizations (1,000-plus employees worldwide) in North America. This distribution is similar to those fielded in the past.
- Year-over-year comparisons were included where applicable.
- The results of this study are representative of the respondent base and not necessarily the market as a whole.
Figure 2. How Businesses Are Dealing With Storage Management
Source: Gartner (June 2015)
Evidence
1 "IT Key Metrics Data 2015: Key Infrastructure Measures: Storage Analysis: Multiyear"
2 "A Breakdown and Analysis of the December, 2014 Sony Hack," Risk Based Security, 5 December 2014.
3 "Barclays Fined $3.75m After Record-Keeping Failure," BBC News, 27 December 2013.
Source: Gartner Research Note G00275931, Alan Dayley, Debra Logan, 17 June 2015