Data hubs, data lakes and data warehouses are all significant areas of investment for data and analytics leaders to support increasingly complex, diverse and distributed data workloads. Gartner research found that 57% of data and analytics leaders are investing in data warehouses, 46% are using data hubs and 39% are using data lakes.
Data hubs, data lakes and data warehouses are not interchangeable alternatives
While data and analytics leaders are familiar with these terms and hear about them from technology providers, many don’t understand the differences. “Data hubs, data lakes and data warehouses are not interchangeable alternatives,” says Ted Friedman, Distinguished VP Analyst, Gartner.
Friedman adds that data and analytics leaders must understand the purpose of these three types of data structures, and the role they can play together in a modern data management infrastructure to best support specific business requirements.
Data warehouses versus data lakes versus data hubs
Data warehouses store well-known and structured data. They support predefined and repeatable analytics needs that can be scaled across many users in the organization. Data warehouses are suited to complex queries, high levels of concurrent access and stringent performance requirements.
Data lakes collect unrefined data (that is, data in its native form, with limited transformation and quality assurance) and events captured from a diverse array of source systems. Data lakes usually support data preparation, exploratory analysis and data science activities.
Data hubs are conceptual, logical and physical "hubs" for mediating semantics (in support of governance and sharing data) between centrally managed (i.e., widely used) and locally managed data (typically single-use data). They enable the seamless flow and governance of data.
Recognize how they differ in focus
Data warehouses and data lakes have a common focus — supporting the analytic needs of the organization. In contrast, data hubs are not focused on analytical use of data. They do not store detailed data for extended periods.
They enable data sharing and apply governance controls to the data flowing across the organization’s various applications and processes. For example, data and analytics leaders can use a data hub to improve delivery of data form business applications to a data warehouse or a data lake.