Published: 06 December 2022
Summary
Ensuring quality of semistructured and unstructured data for machine learning provides unique challenges for data and analytics technical professionals. This research investigates the challenges and risks related to data quality for AI and ML models, and how to overcome or mitigate them.
Included in Full Research
Overview
Key Findings
For semistructured and unstructured data, organizations have most thoroughly addressed one data quality dimension: timeliness. Other data quality dimensions are poorly addressed in terms of both technology and organization practices required to support those other data quality dimensions.
Most unstructured data will remain out of reach to data consumers and unused or unusable until accessibility issues have been addressed.
Accuracy and relevance of unstructured data for AI and ML is driven entirely by its use case. Therefore, independent assessment and validation of unstructured data is impossible prior to defining the use case.
Unstructured data management creates a paradox for technical
Clients can log in to view the entire
document.