- HeyCloud
- Posts
- Building an Enterprise-Grade RAG with Open Source LLMs - Overview
Building an Enterprise-Grade RAG with Open Source LLMs - Overview
First article of our series on how to build an Enterprise-Grade RAG with open source LLMs
Building an Enterprise-Grade RAG with Open Source LLMs - Overview
This is the first article of the “series Building an Enterprise-Grade RAG with Open Source LLMs”. In subsequent articles, we will detail each of the sections below.
Introduction
Imagine a powerful search and query system that sits entirely within your organization, leveraging your own data while preserving complete privacy and control. This is the promise of Retrieval-Augmented Generation (RAG) coupled with open-source Large Language Models (LLMs). By combining these technologies, you can build a highly effective information retrieval and response system tailored to your specific needs, anxieties, and security requirements.
But where do you even begin? This article explores the world of local RAG with open-source LLMs, highlighting key considerations and resources to get you started.
Why a local RAG?
In today's data-driven world, concerns about privacy and control are paramount. Cloud-based LLMs often require uploading your data, potentially exposing sensitive information. But with a local RAG, you choose what data goes in and who has access to it. This empowers you to build applications that leverage the power of language models while remaining firmly in control.
Picking the Right LLM: LLM Leaderboards
Open-source LLMs are rapidly evolving, and choosing the right one for your RAG can be daunting. Fortunately, LLM leaderboards offer valuable insights. Consider platforms like Papers With Code and Big Science's leaderboard to compare performance metrics across various tasks and datasets. For RAG-specific tasks, resources like the BLING benchmark can help tailor your search. Here is a summary of leaderboards as of today (Feb 10th 2024).
Hugging Face Open LLM Leaderboard: This platform aims to track, rank, and evaluate open LLMs and chatbots. It is a space on Hugging Face that hosts the leaderboard. Important to note that this leaderboard is self-publishing and easily manipulated by model publishers.
The Big Benchmarks Collection on Hugging Face: This collection gathers benchmark spaces on Hugging Face beyond the Open LLM Leaderboard. It includes several benchmarks like the MTEB Leaderboard, LMSys Chatbot Arena Leaderboard, and more.
LMSYS Chatbot Arena: A crowdsourced platform for LLM evaluations that uses human preference votes to rank LLMs with the Elo ranking system. In my opinion, this is the most reliable leaderboard for user facing LLM applications.
There are other leaderboards and LLM benchmarking frameworks, but we will dig deeper into those in a future article.
Remember, the most relevant leaderboard for your needs depends on your specific goals and desired evaluations. Consider factors like:
Task focus: Choose a leaderboard targeting the tasks your LLM will perform.
Evaluation type: Decide between automated benchmarks or human-based evaluations.
Open-source focus: If you value open-source models, choose a leaderboard specifically catered to them.
By exploring these various leaderboards, you can gain valuable insights into the strengths and weaknesses of different LLMs and make informed decisions for your projects.
Local Vector Databases
The heart of a RAG system lies in its ability to efficiently retrieve relevant information from your data. This is where local vector databases come in. Solutions like Weaviate, Pinecone, and Milvus excel at storing and searching for high-dimensional vectors, ensuring fast and accurate responses within your local infrastructure.
Weaviate: https://www.weaviate.io/
Pinecone: https://pinecone.io/
Milvus: https://milvus.io/
Building Enterprise-Grade Features
Taking your local RAG to the next level means incorporating enterprise-grade features. Role-Based Access Control (RBAC) ensures only authorized users access sensitive data, while Single Sign-On (SSO) streamlines user authentication. Open-source frameworks like Keycloak and Auth0 can help you seamlessly integrate these features into your RAG system.
In a future post, we will dig deeper into challenges around Enterprise-Grade Features in RAG systems, but in nutshell, these are the main considerations one must take care of:
The authentication server must never communicate directly with the LLM
Team workplaces must be built on top of robust isolation and multitenancy. Data leakage is more prominent in RAG apps than traditional applications.
Please refer to our previous article for related security issues.
Hardware Considerations: Local or Cloud?
Whether you host your RAG system locally or in the cloud depends on your needs and resources. Local deployment offers complete control and privacy but may require substantial hardware investment. Cloud solutions provide scalability and ease of setup but come with potential latency and security concerns. Carefully evaluate your data size, desired performance, and budget to make the optimal choice.
Conclusion
This is just a starting point. The world of local RAG with open-source LLMs is constantly evolving, offering exciting possibilities for businesses that prioritize privacy and control. By harnessing the power of these technologies, you can build innovative applications that unlock the true potential of your data, all while adhering to your specific security and compliance requirements.
In the next articles of this series, we will dive deeper into LLM leaderboards, LLM benchmarks, Case Studies of Enterprises already using open source LLMs, etc