Gartner Research

Boost Your Training Data for Better Machine Learning

Published: 26 July 2019

ID: G00389867

Analyst(s): Akif Khan , Alexander Linden , Anthony Mullen


Not having access to enough quality training data is one of the biggest showstoppers for machine learning projects. Data and analytics leaders responsible for machine learning initiatives can overcome this situation by following the nine techniques described here.

Table Of Contents
  • Key Challenges



  • Scrutinize Your Current Data Collection Strategy
    • Technique 1: Move Past Legacy Data Collection Requirements to Collect More Data
    • Technique 2: Spend More Time on Data Preprocessing
  • Acquire More Data
    • Technique 3: Incorporate External Datasets to Enrich Your Own Dataset
    • Technique 4: Exploit Crowdsourcing to Generate New Data and Labels
    • Technique 5: Obtain More Data From Peer Organizations With Data Pooling and Sharing
  • Synthesize Additional Data
    • Technique 6: Augment Data Using Domain-Specific Transformations
    • Technique 7: Simulations Can Also Generate New Labelled Event Data
  • Use Advanced Machine Learning Techniques
    • Technique 8: Minimize Expensive Real-World Sampling With Active Learning
    • Technique 9: Use Transfer Learning to Utilize Data That You Don’t Even Have Access To

Gartner Recommended Reading

©2021 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form without Gartner’s prior written permission. It consists of the opinions of Gartner’s research organization, which should not be construed as statements of fact. While the information contained in this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or influence from any third party. For further information, see Guiding Principles on Independence and Objectivity.

Already have a Gartner Account?

Become a client

Learn how to access this content as a Gartner client.