When building a RAG system from scratch, which of the following take the most time and resources?

Gathering the data/documents for the knowledge base to be loaded17%

Curating, annotating, cleaning and/or pre-processing the data (excl. chunking)33%

Optimizing the splitter / chunking of the knowledge base documents

Defining the optimal RAG system architecture (vector/graph DB, critique models...)33%

Integrating the RAG system with the rest of the GenAI application tech stack17%

Other (grateful if you could specify in the comments!)

6 PARTICIPANTS
623 viewscircle icon1 Upvotecircle icon1 Comment
Sort by:
Group Director of Information Security in Bankinga year ago

1. Absolute clarity about business use cases, the RAG is intended to resolve that would decide the repos to be loaded post curation. Laying down the governance model first.

2. If the query falls back to OpenAI Model, how much will it cost per prompt and who will bear those costs?

3. Data loss prevention despite curation and cleaning. Example: "What's the salary of x or y in my department?" Such data should be protected by an AI DLP solution and act as a proxy behind RAG.

Content you might like

Yes80%

No16%

Unsure2%

View Results
Read More Comments

data security posture management 29%

data loss prevention 53%

data access governance 42%

encryption 36%

privacy enhanced technology 33%

use of synthetic data 11%

None, not using AI 2%

View Results