If a mid size bank wants to create its own CHATGPT + its own data (like policy, procedures etc.) loaded into it for fast retrieval and search, what advice folks will have for the reference architecture and components/tools that might be needed to accomplish that? Any thoughts on resources and cost will also be welcome.

477 viewscircle icon2 Comments
Sort by:
Chief Executive Officer7 days ago

You will need at least the following infrastructures:

-> RAG (Retrieval-Augmented Generation) is the default: do not fine-tune initially.
->Orchestration: LangChain, LlamaIndex, Semantic Kernel, or Guidance
--> Route queries (FAQ vs policy vs procedure).
--> Generate queries for multi-step retrieval (expansion, re-ranking, de-dup).
--> Insert citations and quoted snippets in answers.

-> Models (pick 1–2, keep pluggable):
--> Hosted API (enterprise controls): Azure OpenAI, OpenAI, Anthropic on AWS Bedrock, Google Vertex.
--> Self/Private-host (compliance/data residency): Llama 3.1/3.2 variants, Mistral Large, Qwen, etc. via NVIDIA NIM, vLLM, or TGI.

->Embedding model: use a modern embedding that supports multilingual + long context. Keep an embeddings layer you can re-run offline if you swap models.

Plus you have to add all the application layers ( depending on your use cases ).

Costly wise

**** NB: it really depends about the application type and on how many users will use it. ****

based on my persona experiences:

CAPEX ( infrastructure setup, data ingestion and cleaning, vectorization and embeddings, APP development, Security and COmpliance setup) -- it really depends on in house skills and the type of applications, roughly from $50k to $150k.

OPEX - recurring - (hosting for LLM, Vector database and storage, data pipeline ingestion, monitoring infrastructure, human oversight), roughly $30k - $50k /month

Lightbulb on3 circle icon1 Reply
no title3 days ago

Thank you.

Content you might like

Yes, we’re hiring39%

No, we’re reskilling current staff50%

No, we’re fully staffed 11%

No, and we don’t plan to hire for this

View Results

All users11%

Some users47%

A few users34%

No users5%

Unsure1%

View Results