Can someone please share any insights of how they have implemented data masking and obfuscation of sensitive data before the data lands in cloud storage such as s3 or Redshift? I have tried glue databrew but the solution does not scale for large number of files coming into the bucket concurrently.

1.5k viewscircle icon4 Comments
Sort by:
Data Manager in Banking8 months ago

For auditability and troubleshooting, we brought the file to S3 landing bucket in raw form and then applied the data tokenization for sensitive columns. Next activity is to delete the raw data immediately after use or retain for couple of days to support future troubleshooting. Also ensure that except service account, no one can read the raw file.

Lightbulb on1 circle icon1 Reply
no title5 months ago

Thanks for sharing.

Data Architect in Government8 months ago

Does it have to be masked in-flight ? Have you considered downloading to raw and then masking it? 

Lightbulb on1 circle icon1 Reply
no title5 months ago

Both masking in flight and masking after downloading are in scope, we might already have sensitive fields in our historical data, we would like to clean that up. We would also like to prevent any new sensitive data coming in.

Content you might like

Github Copilot71%

Cursor AI57%

Claude14%

Windsurf

Firebase Studio - Gemini29%

OpenAI - ChatGPT43%

Any others AI coding assistants ?

View Results

< 40 hrs11%

40 - 55 hrs68%

56 - 65 hrs20%

66 - 75 hrs7%

76 - 85 hrs2%

> 85 hrs

View Results