When building a RAG system from scratch, which of the following take the most time and resources?

Gathering the data/documents for the knowledge base to be loaded17%

Curating, annotating, cleaning and/or pre-processing the data (excl. chunking)33%

Optimizing the splitter / chunking of the knowledge base documents

Defining the optimal RAG system architecture (vector/graph DB, critique models...)33%

Integrating the RAG system with the rest of the GenAI application tech stack17%

Other (grateful if you could specify in the comments!)

6 PARTICIPANTS
623 viewscircle icon1 Upvotecircle icon1 Comment
Sort by:
Group Director of Information Security in Bankinga year ago

1. Absolute clarity about business use cases, the RAG is intended to resolve that would decide the repos to be loaded post curation. Laying down the governance model first.

2. If the query falls back to OpenAI Model, how much will it cost per prompt and who will bear those costs?

3. Data loss prevention despite curation and cleaning. Example: "What's the salary of x or y in my department?" Such data should be protected by an AI DLP solution and act as a proxy behind RAG.

Content you might like

Lack of mature vendor solutions44%

Trust in AI accuracy62%

Budget constraints17%

Skills to operate the tools27%

View Results

Yes42%

No, we don't have plans to37%

No, we already have cyber insurance19%

View Results