Gartner Research

How to Create a Data Strategy for Machine Learning-Powered Artificial Intelligence

Published: 31 May 2017

ID: G00324342

Analyst(s): Carlton Sapp


MLpAI can help deliver systems with more automation and less human intervention, but success requires a data strategy to deal with the complexity of real-world data. This research guides technical professionals involved in MLpAI on developing a data strategy to support successful deployments.

Table Of Contents

Problem Statement

  • Introducing MLpAI and Its Limitations

The Gartner Approach

The Guidance Framework

  • Data Strategy for ML Process Framework
  • Prework: Building a Rationalization Framework for MLpAI
    • Defining the End Objective
    • Defining the Means Objectives
    • Providing Assessment and Governance to Support the Data Strategy
    • Defining Influencers Critical to the Success of the Data Strategy
  • Step 1: Build Problem or Task Taxonomy
  • Step 2: Design Data Science Pipeline
    • 2.1 Constructing Batch Data Science Pipelines
    • 2.2 Constructing Online Data Science Pipelines
  • Step 3: Enable Data Science Workflows
    • 3.1 Enabling Supervised Learning Workflows
    • 3.2 Enabling Unsupervised Learning Workflows
  • Step 4: Create Data Science Stages
    • 4.1 Critical Stages of Preprocessing
    • 4.2 Supporting Computationally Intensive Training Stages
  • Step 5: Integration
  • Step 6: Refine With Storage
    • 6.1 Using Memory
    • 6.2 Using Distributed File Systems
    • 6.3 Using Distributed Data Stores (Persistent Data Store)
    • 6.4 Using Relational Databases
  • Step 7: Operationalization and Maintenance
    • 7.1 Compute-Intensive vs. Data-Intensive Components in Workflows
    • 7.2 Securing Data Science Pipelines
  • Follow-Up
    • Introducing DevOps to MLpAI and Vice Versa

Risks and Pitfalls

  • Risk No. 1: Building DS Pipelines Can Be Especially Challenging When Dealing With Big Data Without the Right Tools
  • Risk No. 2: Poor Data Quality Will Significantly Impact Performance and Accuracy
  • Risk No. 3: Techniques for Securing DS pipelines Are Still in Their Infancy
  • Pitfall: Bounded Rationality Exists Even Within MLpAI Applications

Gartner Recommended Reading

©2021 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form without Gartner’s prior written permission. It consists of the opinions of Gartner’s research organization, which should not be construed as statements of fact. While the information contained in this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or influence from any third party. For further information, see Guiding Principles on Independence and Objectivity.

Already have a Gartner Account?

Become a client

Learn how to access this content as a Gartner client.