Any advice on how to implement data lineage automation? Have you found areas where introducing automation is easiest or most useful?

127 viewscircle icon3 Comments
Sort by:
Director of Corporate Development2 years ago

To begin the initiative, I focused on the most critical data elements for our enterprise first. They were financial and regulatory metrics. That kicked off the first initiative. Then I spoke with the analytics teams to understand their perspective on which data flows were most important for their executive dashboards and BI. That got us started, and then it was just working through the implementation list and adjusting priority as needed. As Philip noted, I have also had success with Collibra when working with AWS.

Director of Data2 years ago

I will answer the second question first. Data lineage automation is certainly useful, collecting lineage data manually is time consuming and error-prone. Data lineage will fulfill several data governance requirements such documenting data security rules, who used the data, version tracking, etc. Another advantage is the ability to collect more lineage data and quickly compared to the manual process. This will enhance reporting, analysis, and planning. 
I don't know what you mean by easiest, but I believe that depends on the automation tool used and there are variety of tools available for data lineage automation. I googled "Lineage automation tools review" and I got several helpful articles. Avoid sponsored links :) 
There are open source and commercial tools foe lineage gathering and management. To implement it in your environment you can either use the vendor to help you or work with your technical team. 
 

VP of Data in Banking2 years ago

It will depend on the technology you are using. Collibra has automated lineage tools that integrate with our AWS datalake. 

Content you might like

Yes42%

No15%

Sometimes42%

View Results

More user-friendly (less technical descriptions, better search, etc.)21%

Better data lineage50%

Increased scalability (e.g., faster onboarding of new data sources)42%

Improved detection of privacy/compliance issues38%

Improved detection of data quality issues29%

We have not seen any improvements13%

New challenges outweigh the improvements8%

We are not using AI for metadata management8%

View Results