Gartner Research

Drive Data Scientists’ Productivity With Data Preprocessing Techniques

Published: 20 August 2019

ID: G00407378

Analyst(s): Sumit Agarwal

Summary

Data scientists struggle to meet projected timelines due to iterative data preprocessing and lack of data. This research describes innovative ways that data and analytics technical professionals can improve their efficiency using specialized toolsets, synthetic data and crowdsourcing.

Table Of Contents

Analysis

  • Improve the Efficiency of the Data-Wrangling Process
    • Determine an Appropriate Data Sample
    • Validate the Quality of Data
    • Evaluate Feature Relationships
    • Identify and Resolve Missing Values
    • Provision Faster Compute
  • Leverage Crowdsourcing and Active Learning to Label Data
  • Generate Synthetic Data When Little or No Data Is Available
    • Distribution of Existing Data
    • Gaming Engines
    • Generative Adversarial Networks
    • Transfer Learning

Conclusion

Gartner Recommended Reading

©2020 Gartner, Inc. and/or its affiliates. All rights reserved. Gartner is a registered trademark of Gartner, Inc. and its affiliates. This publication may not be reproduced or distributed in any form without Gartner’s prior written permission. It consists of the opinions of Gartner’s research organization, which should not be construed as statements of fact. While the information contained in this publication has been obtained from sources believed to be reliable, Gartner disclaims all warranties as to the accuracy, completeness or adequacy of such information. Although Gartner research may address legal and financial issues, Gartner does not provide legal or investment advice and its research should not be construed or used as such. Your access and use of this publication are governed by Gartner’s Usage Policy. Gartner prides itself on its reputation for independence and objectivity. Its research is produced independently by its research organization without input or influence from any third party. For further information, see Guiding Principles on Independence and Objectivity.

Already have a Gartner Account?

Become a client

Learn how to access this content as a Gartner client.