I am in the process of building a GenAI tool for Support Agents to interact and find answers quickly. To ensure the accuracy and quality is maintained over time, does anyone have any recommendations on tools or frameworks to evaluate quality and accuracy for LLMs/GenAI tools?

Business Applications Strategy Generative AI

384 views1 Upvote3 Comments

Sort by:

VP Support Readiness in IT Servicesa year ago

I was the original poster (OP) - we have build our own solution which currently integrates with slack and soon will be built into SFDC. We currently do measure quality via human feedback but I want to also validate this using some level of automation. I have started on a framework but do not have any specific tools in mind yet on how to combined human with automation and present it in a way that is easy to consume and surface where deviations may occur - like some of you, we are refining as we go.

VP of Customer Success in Softwarea year ago

This is such an important question—thanks for bringing it up! We’ve been working on something similar and faced the same challenge of maintaining accuracy and quality with a GenAI tool for support agents. Here’s how we’ve approached it:

Start with Human Validation: We kicked things off with a phased approach, focusing on auditing certain types of content first—especially answers that could trigger legal concerns or have a higher impact. It helped us prioritize where accuracy mattered the most.

Use Feedback Loops: We made sure to set up mechanisms for agents to flag AI responses directly, creating a continuous feedback loop. This real-world input has been invaluable for improving the tool over time.

Leverage the Right Tools: We’ve been using Intercom and FIN AI, which not only make interactions seamless but also offer features that help us refine responses and evaluate quality. They’ve been a solid foundation for scaling this effort.

Track the Right Metrics: To stay on top of accuracy and quality, we look at things like how often agents override the AI’s suggestions, user feedback, and other key trends. It’s a great way to identify where the tool needs tweaking.

It’s definitely been a journey, and we’re still refining as we go. Happy to dive deeper if you’d like to compare notes or share ideas. What stage are you at with your implementation? Would love to learn from your experience too!

Director of Customer Successa year ago

It really depends on the tech stack being used for case management work. Here at NYC DOF we use Dynamics CRM and are looking to enable Microsoft Co-Pilot to do exactly what is described in the use case. Aside from case summaries and assisting with outbound communications, Co-Pilot will suggest responses based on a library of approved knowledge base article content to help agents get quick answers to questions. Hope this helps.

Content you might like

How can CSMs improve customer utilization without tons of hand holding?

Have you tried DeepSeek yet? How do you think the performance compares to other Gen AI tools? Do you think it will change the way you use AI in your role? How?

Which business function do you see the most need for automation?

Sales11%

Marketing29%

Accounts Payable22%

Accounts Receivable14%

Legal6%

HR13%

Other (pls comment)2%

View Results

What’s the best software development methodology for enterprise-level projects?

Waterfall14%

Prototype18%

Rapid Application Development7%

Agile Scrum43%

Agile Kanban8%

Dynamic System Development1%

Lean Software Development2%

Other .. please add it down3%

View Results

I am in the process of building a GenAI tool for Support Agents to interact and find answers quickly. To ensure the accuracy and quality is maintained over time, does anyone have any recommendations on tools or frameworks to evaluate quality and accuracy for LLMs/GenAI tools?

Sort by:

Content you might like

How can CSMs improve customer utilization without tons of hand holding?

Have you tried DeepSeek yet? How do you think the performance compares to other Gen AI tools? Do you think it will change the way you use AI in your role? How?

Which business function do you see the most need for automation?

What’s the best software development methodology for enterprise-level projects?

What sets us apart?

RELATED ONE-MINUTE INSIGHTS

CrowdStrike Outage: Impact And Recovery

GenAI Training Methodologies: Insights for Marketing Teams

2024 Cloud Spending: IT Balances Costs with GenAI Innovations

Generative AI Training for IT Organizations

Navigating the Future: The Evolution of GenAI in Legal

Take Your Insights On-the-Go