We are looking for a solution to implement a chatGPT-like solution on premise? The solution should propose a GUI & back end components, ability to connect to LLM on prem and in the cloud (routing according to sensitivity of the data). Feature examples: questions to document/knowledge base, simple agent creation, RAG, and multimodal.

202 viewscircle icon2 Upvotescircle icon4 Comments
Sort by:
The guy you should take seriously in Travel and Hospitality15 hours ago

If you are looking to roll out a ChatGPT-style solution on-prem with cloud smarts, I'd advise the Microsoft solution stack, which IMHO meets all your needs and then some. 

Architecture: 
Use a web-based GUI (React, Blazor, or Power Apps) with Azure Functions or AKS for orchestration. AKS now supports KAITO for deploying OSS LLMs like Phi-4 and Qwen locally—YAML your way to greatness.

LLMs:
On-prem: Use ONNX Runtime GenAI or vLLM for fast, efficient inference.
Cloud: Azure OpenAI Service gives you GPT-4, GPT-4o, and embeddings with enterprise-grade security.

Hybrid Routing: 
Route sensitive data to on-prem LLMs and general queries to Azure OpenAI using Azure Arc and AI Studio. 

RAG & Multimodal: 
The Azure Multimodal AI & LLM Processing Accelerator supports RAG, document Q&A, multimodal inputs, and even confidence scoring.

Agent Creation: 
Copilot Studio and Azure AI Studio let you build and orchestrate agents with Prompt Flow and LLMOps best practices.

Governance:
Microsoft’s AI Governance framework ensures your solution is secure, compliant, and responsibly deployed.

So yes, you can have your AI cake on-prem, drizzle it with cloud, and serve it with a side of governance! Hit me up if you (or anyone reading this) would like links to the accelerators or GitHub repos? 

VP of Engineering in Manufacturing19 hours ago

I assume that you are looking to create a domain-specific knowledge base Q&A.   We have found that just loading the documents into a RAG database and opening it up creates a very low-quality experience.   Depending on the LLM used we see a lot of hallucination.  If we lower the temperature, we see unhelpful answers.

To fix this we focused on a few things.
1) We really needed to create a Q&A test suite for making sure we get accurate responses.  So we manually created about 200 questions and answers and would run the system through this test matrix to test its accuracy before going live into production. 
2) Turns out that just chopping up our documents resulted in poor affinity out of the vector DB.   We needed to look at both the structure and content of the documents.  Adding metadata about a paragraph or set of paragraphs, rather than just blindly segmenting the input documents helped.
3) Sometimes the content itself was not well structured to enable a RAG application.   For example tables and charts are particularly difficult.  So sometimes we would put in context text that explained the images, or we rewrote that part of the documentation to make it easier to parse.  Going forward I expect documentation teams and knowledge base design will become constrained by requirements to enable LLM and that will change the style and structure of these repositories.
4) Look at agentic AI platform (e.g. langchain).   Instead of just dumping everything into a single vector DB and summarizing it out of context, try putting general documentation separate from break/fix or knowledge base.  Then prompt-rewrite the user prompt based on general documentation so that when you ultimately query the break/fix documentation you ask the right question.  Or alternatively query the user to fill in information that is missing (e.g. product model # or other context that is important).

We have found these systems are conceptually simple, but when they hit scale it becomes tricky, and you have to marry your organization's domain, documentation/KB, and technology together to get a predictable answer.

Director of IT Governance in Finance (non-banking)19 hours ago

We stood up some technology to leverage LLM capabilities with our contract data - help determine risks, opportunities, consistency, etc..   One of the first challenges was applying guardrails to the Chat part of the solution and obfuscate / mask what is private to us.  Our technology partner created an obfuscation engine to control what goes to the LLM and how the tool is used. It became a stand alone solution and thousand of people at our company are using it, now - it sounds like it is what you're looking for.  It also has a private LLM, etc..  If you want to know more about that, connect with me- Gartner is very familiar with what we've done.

VP of Product Management19 hours ago

Hi Cedric, our organization has faced similar challenges – deciding between hosting options and ensuring a new environment is secure, user-friendly, has RAG, and integrates with existing systems. I'd be happy to have a conversation with you and one of our AI architects to discuss our path. I've sent you a connection request.

Content you might like

1-36%

4-657%

7-1029%

>106%

View Results

Increase investment58%

Stay the same41%

Decrease investment1%

View Results