If a mid size bank wants to create its own CHATGPT + its own data (like policy, procedures etc.) loaded into it for fast retrieval and search, what advice folks will have for the reference architecture and components/tools that might be needed to accomplish that? Any thoughts on resources and cost will also be welcome.
Sort by:
no title3 days ago
Thank you.

You will need at least the following infrastructures:
-> RAG (Retrieval-Augmented Generation) is the default: do not fine-tune initially.
->Orchestration: LangChain, LlamaIndex, Semantic Kernel, or Guidance
--> Route queries (FAQ vs policy vs procedure).
--> Generate queries for multi-step retrieval (expansion, re-ranking, de-dup).
--> Insert citations and quoted snippets in answers.
-> Models (pick 1–2, keep pluggable):
--> Hosted API (enterprise controls): Azure OpenAI, OpenAI, Anthropic on AWS Bedrock, Google Vertex.
--> Self/Private-host (compliance/data residency): Llama 3.1/3.2 variants, Mistral Large, Qwen, etc. via NVIDIA NIM, vLLM, or TGI.
->Embedding model: use a modern embedding that supports multilingual + long context. Keep an embeddings layer you can re-run offline if you swap models.
Plus you have to add all the application layers ( depending on your use cases ).
Costly wise
**** NB: it really depends about the application type and on how many users will use it. ****
based on my persona experiences:
CAPEX ( infrastructure setup, data ingestion and cleaning, vectorization and embeddings, APP development, Security and COmpliance setup) -- it really depends on in house skills and the type of applications, roughly from $50k to $150k.
OPEX - recurring - (hosting for LLM, Vector database and storage, data pipeline ingestion, monitoring infrastructure, human oversight), roughly $30k - $50k /month