When building a RAG system from scratch, which of the following take the most time and resources?

Gathering the data/documents for the knowledge base to be loaded17%

Curating, annotating, cleaning and/or pre-processing the data (excl. chunking)33%

Optimizing the splitter / chunking of the knowledge base documents

Defining the optimal RAG system architecture (vector/graph DB, critique models...)33%

Integrating the RAG system with the rest of the GenAI application tech stack17%

Other (grateful if you could specify in the comments!)

6 PARTICIPANTS

623 views1 Upvote1 Comment

Sort by:

Group Director of Information Security in Bankinga year ago

1. Absolute clarity about business use cases, the RAG is intended to resolve that would decide the repos to be loaded post curation. Laying down the governance model first.

2. If the query falls back to OpenAI Model, how much will it cost per prompt and who will bear those costs?

3. Data loss prevention despite curation and cleaning. Example: "What's the salary of x or y in my department?" Such data should be protected by an AI DLP solution and act as a proxy behind RAG.

Content you might like

What’s your top barrier to adopting AI-driven pentesting?

Lack of mature vendor solutions44%

Trust in AI accuracy62%

Budget constraints17%

Skills to operate the tools27%

View Results

If you didn't have to work, what would you do?

Do you plan to buy cyber insurance in the next year?

Yes42%

No, we don't have plans to37%

No, we already have cyber insurance19%

View Results

Does anyone use a real time validation tool within the SAP Ariba SLP process to confirm vendor names, addresses, tax id, bank account, etc.? If so, what tool are you using and are you happy with it?

I am looking for some benchmark data on supplier onboarding cycle time to see how we measure up. I would like to include the time from intake to risk assessment to supplier created and ready to transact in our ERP.