When building a RAG system from scratch, which of the following take the most time and resources?

Gathering the data/documents for the knowledge base to be loaded17%

Curating, annotating, cleaning and/or pre-processing the data (excl. chunking)33%

Optimizing the splitter / chunking of the knowledge base documents

Defining the optimal RAG system architecture (vector/graph DB, critique models...)33%

Integrating the RAG system with the rest of the GenAI application tech stack17%

Other (grateful if you could specify in the comments!)

6 PARTICIPANTS
623 viewscircle icon1 Upvotecircle icon1 Comment
Sort by:
Group Director of Information Security in Bankinga year ago

1. Absolute clarity about business use cases, the RAG is intended to resolve that would decide the repos to be loaded post curation. Laying down the governance model first.

2. If the query falls back to OpenAI Model, how much will it cost per prompt and who will bear those costs?

3. Data loss prevention despite curation and cleaning. Example: "What's the salary of x or y in my department?" Such data should be protected by an AI DLP solution and act as a proxy behind RAG.

Content you might like

Yes, we have a comprehensive governance policy50%

No, we need more guidance around this48%

Not sure what policies are currently in place1%

View Results

Choice 142%

Choice 250%

Choice 38%

Choice 4

View Results