Can someone please share any insights of how they have implemented data masking and obfuscation of sensitive data before the data lands in cloud storage such as s3 or Redshift? I have tried glue databrew but the solution does not scale for large number of files coming into the bucket concurrently.

1.5k views4 Comments

Sort by:

Data Manager in Banking8 months ago

For auditability and troubleshooting, we brought the file to S3 landing bucket in raw form and then applied the data tokenization for sensitive columns. Next activity is to delete the raw data immediately after use or retain for couple of days to support future troubleshooting. Also ensure that except service account, no one can read the raw file.

1 1 Reply

no title5 months ago

Thanks for sharing.

Data Architect in Government8 months ago

Does it have to be masked in-flight ? Have you considered downloading to raw and then masking it?

1 1 Reply

no title5 months ago

Both masking in flight and masking after downloading are in scope, we might already have sensitive fields in our historical data, we would like to clean that up. We would also like to prevent any new sensitive data coming in.

Content you might like

Why do you think there are so few mature AI-driven autonomous pentesting solutions on the market, and why does this topic seem to generate more hype than in-depth technical discussion?

Which pitfalls—model bias, false positives/negatives, data quality, regulatory constraints—often impede AI-based security tools, and how can they be mitigated in a financial-services context?

Do you use vibe coding as a technical leader in your organization? Please comment your thoughts on the various tools.

Github Copilot71%

Cursor AI57%

Claude14%

Windsurf

Firebase Studio - Gemini29%

OpenAI - ChatGPT43%

Any others AI coding assistants ?

View Results

What is your typical weekly work schedule look like?

< 40 hrs11%

40 - 55 hrs68%

56 - 65 hrs20%

66 - 75 hrs7%

76 - 85 hrs2%

> 85 hrs

View Results

How do you think AI will disrupt business across industries? Add to my list: 1. Content creation 2. Photos and video production 3. Basic coding and debugging 4. Strategic analysis to be highly complimented

Can someone please share any insights of how they have implemented data masking and obfuscation of sensitive data before the data lands in cloud storage such as s3 or Redshift? I have tried glue databrew but the solution does not scale for large number of files coming into the bucket concurrently.

Sort by:

Content you might like

Why do you think there are so few mature AI-driven autonomous pentesting solutions on the market, and why does this topic seem to generate more hype than in-depth technical discussion?

Which pitfalls—model bias, false positives/negatives, data quality, regulatory constraints—often impede AI-based security tools, and how can they be mitigated in a financial-services context?

Do you use vibe coding as a technical leader in your organization? Please comment your thoughts on the various tools.

What is your typical weekly work schedule look like?

How do you think AI will disrupt business across industries? Add to my list: 1. Content creation 2. Photos and video production 3. Basic coding and debugging 4. Strategic analysis to be highly complimented

What sets us apart?

RELATED ONE-MINUTE INSIGHTS

CrowdStrike Outage: Impact And Recovery

Data-Driven Customer Experience: Uniting D&A and CX Teams

2024 Marketing Priorities and Challenges: Insights from the Field

Data and Analytics Priorities and Challenges: 2024 Trends

Generative AI and Software Engineering Teams: Adoption and Training

Take Your Insights On-the-Go