What are some limitations of red teaming as a method for testing GenAI tools/LLMs?

3.8k viewscircle icon4 Comments
Sort by:
CISO/CPO & Adjunct Law Professor in Finance (non-banking)2 years ago

Red teaming against an AI tool is challenging because opposed to static defenses and known repeatable responses LLMs have variability in their outputs. 

Each output is derived from calculating the distance between embeddings (words converted to numbers) and applying a framework of rules to the level one output. For example, the word "analysis" is usually close to the word "of" followed by a noun, that is a first level output. The first level out should be put through additional processing to ensure it makes sense in the specific (or as lawyers say instant) context. The framework could evaluate the word in from of the word analysis and also ensure the sentence doesn't end with "of". If the word before analysis is "lengthy" then "analysis" can end the sentence.

The way AI tools are usually constructed there are certain word groupings that "break" the tool for graphic output and perhaps there are derivations that break text only models. Inserting improper text is possible due to the challenges with edit checks for AI prompts. The improper text may be assimilated to corrupt the system.  Imagine MS Excel learning from the data which is input and a person telling excel that 2+2 =5 repeatedly. Eventually it would be corrupted, but the corruption might not be obvious on every use.

Even if text systems aren’t corrupted multi-modal (audio, video, image, input/output) tools are the future therefore just breaking images can hobble the system. Like any other tool there are attackers and defenders, and it is likely that the Nightshade poison pill will be nullified in the future.

MIT paper reference below.  https://arxiv.org/pdf/2310.13828.pdf

The difficulty of red teaming doesn’t allow companies breathing room though since improper outputs can run afoul of upcoming AI laws or current laws.  Internal compliance teams and certainly lawyers will be anxious for IT teams to check potential AI results.

Lightbulb on1 circle icon1 Reply
no title2 years ago

Nvidia has released Nemo Guardrails which can address the point to some extent but largely I agree what you mentioned and its very challenging to to Red Team specifically against GenAI<br>

Chair and Professor, Startup CTO in Education2 years ago

What is red teaming?

1 Reply
no title2 years ago

Red team is the group pretending to be an attacker, blue team is defense and purple is a combination.<br>

Content you might like

Always-on service32%

On-demand service – traffic is scrubbed only when an attack is detected and mitigated66%

One time use – service is activated upon request1%

View Results

Yes52%

No32%

Not sure, we’re not measuring this16%

N/A, we’re not using AI coding tools

View Results