Gartner Research

Create a Data Strategy for Machine Learning in Advanced Analytics Initiatives

Published: 10 May 2019

ID: G00377818

Analyst(s): Carlton Sapp


Organizations struggle to use data effectively and efficiently to support machine learning in advanced analytics initiatives due to growing diversity in data projects. This research guides data and analytics technical professionals on developing a data strategy to support successful deployments.

Table Of Contents

Problem Statement

The Gartner Approach

The Guidance Framework

  • Prework: Build a Business Motivation Framework for ML
    • Defining the End Objective
    • Defining the Means Objectives
    • Providing Assessment and Governance to Support the Data Strategy
    • Defining Influencers Critical to the Success of the Data Strategy
  • Step 1: Develop a Targeted Acquisition Strategy
    • 1.1 Determine Where to Get Data
    • 1.2 Select an Approach to Acquiring Internal and External Data
    • 1.3 Enable Data Engineering Pipelines
    • 1.4 Establish Data Science Pipeline
    • 1.5 Enable Data Science Workflows
    • 1.6 Enable Supervised Learning Workflows
    • 1.7 Enable Unsupervised Learning Workflows
    • 1.8 Secure Data Science Pipelines
  • Step 2: Define Data Preprocessing Architecture
    • 2.1 Refine the Architecture With Storage Options
  • Step 3: Connect ML Analytic Engines
    • 3.1 Feed Big Data Analytic Engines to Support ML Initiatives
    • 3.2 Complement With Automated Machine Learning Engines
  • Step 4: Deliver to ML Workloads
    • 4.1 Complement ML Workloads With Pretrained Networks and Packaged Datasets
    • 4.2 Work With Different Technology Approaches to Sourcing ML Workloads
  • Step 5: Perform a Business Process Review of ML Output
    • 5.1 Identify and Prioritize Processes to Review
    • 5.2 Gather and Analyze Current Process Data
    • 5.3 Conceptualize Future State
    • 5.4 Integrate ML Output Into Business Process
    • 5.5 Evaluate Outcomes
  • Follow-Up
    • Manage Data Pipelines and ML Workloads
    • Adopt Flexible Data Quality Strategies for Machine Learning

Risks and Pitfalls

  • Risk No. 1: Building Data Science Pipelines Can Be Especially Challenging When Dealing With Big Data Without the Right Tools
  • Risk No. 2: Poor Data Quality Will Significantly Impact Performance and Accuracy
  • Risk No. 3: Techniques for Securing Data Science Pipelines Are Still in Their Infancy
  • Pitfall: Bounded Rationality Exists Even Within ML Applications

Gartner Recommended Reading

©2021 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form without Gartner’s prior written permission. It consists of the opinions of Gartner’s research organization, which should not be construed as statements of fact. While the information contained in this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or influence from any third party. For further information, see Guiding Principles on Independence and Objectivity.

Already have a Gartner Account?

Become a client

Learn how to access this content as a Gartner client.