Like many companies, we're starting to develop applications with generative AI, primarily using the RAG architecture because frequent data changes make fine-tuning less effective. We're considering GPT-3.5, GPT-4, and Gemini, but want options beyond these due to cloud dependency and cost. We're planning a multi-model approach, selecting LLMs based on context size and other factors. Can you suggest a decision-making framework for evaluating LLMs?
Sort by:
Product Associate in Softwarea year ago
Hi!
I cannot directly answer your question as I myself have used only the GPT family from OpenAI. However just for completeness I would add some good options to your list:
- GPT-family LLMs from Azure OpenAI services: provide similar value to the direct OpenAI API, but with more control, e.g. where the data is processed geographically;
- Claude API from Anthropic: haven't used but plan to check as the quality seems to be on par with OpenAI models.
Hugging Face (https://huggingface.co) is something like GitHub for LLMs, Datasets etc. It will give your team the opportunity to test a variety of LLMs, both open source and proprietary, including OpenAI's, Microsoft's, and Google's models. Through my research I have found that while some models may not currently be the best in the market, they show great potential to become very good in the near future, particularly in the healthtech or fintech market.