This research provides technical professionals with a guidance framework for the systematic design of a data lake. Many once believed that lakes were one amorphous blob of data, but consensus has emerged that the data lake has a definable internal structure.
The Gartner Approach
The Guidance Framework
- Start With Analytics Requirements
- Identify the User Groups of the Data Lake
- Identify the Architect Who Is Responsible for the Data Lake
- Determine Relevant SLAs
- Define Success Criteria
- Determine Stakeholders
- Step 1: Macro-Level Architecture — Three Prototypical Patterns
- Inflow Data Lake
- Outflow Data Lake
- A Data Science Lab
- Comparison of the Data Lake Architecture Styles
- Internal Structure
- Step 2: Medium-Level Architecture — Zones
- The Definition and Separation of Zones
- Step 3: Micro-Level Architecture and Detailed Design Decisions
- Select Platform Technologies
- Data Modeling
- Start With the Single Architecture Style
- Optimize the Supporting Infrastructure
- Define Data Governance
- Bimodal Governance and Pace Layering
- Ensure You Have the Right Skills
- Implement the Data Lake for Its New Capabilities
- Carefully Plan How the Data Flows In and Out of the Lake
- Determine Data Consumption
- Plan Data Ingest
- Ensure There Is a Realistic Delivery Plan
- Risks and Pitfalls
- Myth: A Data Lake Is Hadoop
- Myth: Hadoop Is Big Data and Is Fast, So It Has Great Performance
- Myth: The Data Lake Doesn't Require Data Modeling
- Myth: Put Any and All Data You Can Into the Data Lake
- Myth: Data Lakes Contain Petabytes of Raw Data
- Myth: Data Lakes Are Inexpensive
- Myth: Keeping Data in One Place Equals a Single Source of the Truth
- Myth: Everyone Can Use the Data Lake
- Myth: A Data Lake Is the New Enterprise Data Warehouse
- Myth: A Data Lake Is Just a Data Integration Method
- Myth: A Data Lake Can Scale to Thousands of Users
- Myth: If We Build a Data Lake, Then People Will Use It
Gartner Recommended Reading
©2020 Gartner, Inc. and/or its affiliates.
All rights reserved.
Gartner is a registered trademark of Gartner, Inc. and its affiliates.
This publication may not be reproduced or distributed in any form without Gartner’s prior written permission.
It consists of the opinions of Gartner’s research organization, which should not be construed as statements of fact.
While the information contained in this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information.
Although Gartner research may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such.
Your access and use of this publication are governed by Gartner’s Usage Policy.
Gartner prides itself on its reputation for independence and objectivity.
Its research is produced independently by its research organization without input or influence from any third party.
For further information, see
Guiding Principles on Independence and Objectivity.