Data preparation is an iterative and agile process for finding, combining, cleaning, transforming and sharing curated datasets for various data and analytics use cases including analytics/business intelligence (BI), data science/machine learning (ML) and self-service data integration. Data preparation tools promise faster time to delivery of integrated and curated data by allowing business users including analysts, citizen integrators, data engineers and citizen data scientists to integrate internal and external datasets for their use cases. Furthermore, they allow users to identify anomalies and patterns and improve and review the data quality of their findings in a repeatable fashion. Some tools embed ML algorithms that augment and, in some cases, completely automate certain repeatable and mundane data preparation tasks. Reduced time to delivery of data and insight is at the heart of this market.
This market evaluates vendors of data science and machine-learning platforms. These are software products that data scientists use to help them develop and deploy their own data science and machine-learning solutions. More precisely, Gartner defines a data science and machine-learning platform as: A cohesive software application that offers a mixture of basic building blocks essential both for creating many kinds of data science solution and incorporating such solutions into business processes, surrounding infrastructure and products. Machine learning is a popular subset of data science that warrants specific attention when evaluating these platforms.