Gartner Research

What Matters When Comparing Hadoop Distributions

Published: 10 October 2014

ID: G00264263

Analyst(s): Svetlana Sicular

Summary

Hadoop is turning from an intriguing possibility into a fundamental technology. This research explains what a typical Hadoop stack looks like and how the leading Hadoop distributions approach it. Enterprises should select a Hadoop distribution based on their needs, not on vendor marketing messages.

Table Of Contents

Analysis

  • What Is Hadoop?
    • Apache Hadoop
    • Hadoop Community
    • Hadoop Ecosystem
    • How to Understand Hadoop Functional Components and Alternatives
    • Hadoop 2 Adds New Capabilities and New Complexity to Its Ecosystem
  • What Is a Hadoop Distribution?
  • How to Understand and Differentiate Hadoop Distributions
    • The Structure of a Hadoop Distribution
    • Distribution Vendor Partnerships
    • Open-Source and Proprietary Components
    • Implementing Hadoop on Your Own Is Lots of Work With Little Benefit
    • Futures of Hadoop Distributions
  • Strengths
  • Weaknesses

Guidance

  • Hadoop Distributions Will Mature in Two to Five Years
  • How to Select a Hadoop Distribution
    • Partner Relationships Are the Key Selection Criteria
    • Vendor Industry or Sector Experience Matters
    • Support Availability Matters
    • Functional Requirements Should Not Dominate Your Selection Criteria
  • Avoid Common Misconceptions
    • Vendor Lock-In Is an Ungrounded Concern
    • Hadoop Does Not Completely Replace an EDW
    • Do Not Rely on Standard Benchmarks; Do Your Own Testing
    • Data Lakes Are Not a Master Data Management or a Data Integration Solution

The Details

  • Seven Frameworks of the Hadoop-Based Stack
    • Data Management Frameworks
    • Processing Frameworks
    • Development Frameworks
    • Integration Frameworks
    • Modeling Frameworks
    • Operational Frameworks
    • Analytics Frameworks
  • The Leading Hadoop Distribution Vendors
    • Cloudera
    • Hortonworks
    • MapR
    • IBM
    • Pivotal
  • Other Hadoop Distributions
  • How to Start

Gartner Recommended Reading

©2020 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form without Gartner’s prior written permission. It consists of the opinions of Gartner’s research organization, which should not be construed as statements of fact. While the information contained in this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or influence from any third party. For further information, see Guiding Principles on Independence and Objectivity.

Already have a Gartner Account?

Become a client

Learn how to access this content as a Gartner client.