How to Create a Data Strategy for Machine Learning-Powered Artificial Intelligence

Archived Published: 31 May 2017 ID: G00324342


Not a Gartner Client?

Want more research like this?
Learn the benefits of becoming a Gartner client.

contact us online


MLpAI can help deliver systems with more automation and less human intervention, but success requires a data strategy to deal with the complexity of real-world data. This research guides technical professionals involved in MLpAI on developing a data strategy to support successful deployments.

Table of Contents

  • Problem Statement
    • Introducing MLpAI and Its Limitations
  • The Gartner Approach
  • The Guidance Framework
    • Data Strategy for ML Process Framework
    • Prework: Building a Rationalization Framework for MLpAI
      • Defining the End Objective
      • Defining the Means Objectives
      • Providing Assessment and Governance to Support the Data Strategy
      • Defining Influencers Critical to the Success of the Data Strategy
    • Step 1: Build Problem or Task Taxonomy
    • Step 2: Design Data Science Pipeline
      • 2.1 Constructing Batch Data Science Pipelines
      • 2.2 Constructing Online Data Science Pipelines
    • Step 3: Enable Data Science Workflows
      • 3.1 Enabling Supervised Learning Workflows
      • 3.2 Enabling Unsupervised Learning Workflows
    • Step 4: Create Data Science Stages
      • 4.1 Critical Stages of Preprocessing
      • 4.2 Supporting Computationally Intensive Training Stages
    • Step 5: Integration
    • Step 6: Refine With Storage
      • 6.1 Using Memory
      • 6.2 Using Distributed File Systems
      • 6.3 Using Distributed Data Stores (Persistent Data Store)
      • 6.4 Using Relational Databases
    • Step 7: Operationalization and Maintenance
      • 7.1 Compute-Intensive vs. Data-Intensive Components in Workflows
      • 7.2 Securing Data Science Pipelines
    • Follow-Up
      • Introducing DevOps to MLpAI and Vice Versa
  • Risks and Pitfalls
    • Risk No. 1: Building DS Pipelines Can Be Especially Challenging When Dealing With Big Data Without the Right Tools
    • Risk No. 2: Poor Data Quality Will Significantly Impact Performance and Accuracy
    • Risk No. 3: Techniques for Securing DS pipelines Are Still in Their Infancy
    • Pitfall: Bounded Rationality Exists Even Within MLpAI Applications
  • Gartner Recommended Reading
© 2017 Gartner, Inc. and/or its Affiliates. All Rights Reserved. Reproduction and distribution of this publication in any form without prior written permission is forbidden. The information contained herein has been obtained from sources believed to be reliable. Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartners research may discuss legal issues related to the information technology business, Gartner does not provide legal advice or services and its research should not be construed or used as such. Gartner shall have no liability for errors, omissions or inadequacies in the information contained herein or for interpretations thereof. The opinions expressed herein are subject to change without notice.

Free Research

Discover what 12,000 CIOs and Senior IT leaders already know.

Free Access

Why Gartner

Gartner delivers the technology-related insight you need to make the right decisions, every day.

Find out more

Call +1 855-515-4486 or contact us

to become a Gartner client.