Agentic RAG Framework and Databricks Mosaic AI

5 min readAug 17, 2024

As the concept of compound AI systems is getting traction in the market in terms of ability to reason, exploit, and modify external knowledge of intelligent agents, improving the ability to interact and understand information, the construct of Agentic RAG is getting popular and in this article, I’ll try to explain the fundamentals of Agentic RAG — where is different from traditional RAG, where it can help, different flavors of it and finally how Databricks as an established leader in Gen AI space has come up with their Mosaic AI framework to expedite this journey.

Introduction of Agentic RAG Framework

While traditional RAGs have revolutionized how we build high quality and more accurate answers by allowing LLMs to access and process information from external sources, Agentic RAG takes us to a step further by introducing a layer of intelligence further by form of AI agents. These agents are all injecting intelligence and autonomy into the RAG framework in terms of broader conversational context, intelligent retrieval strategies, multi agent orchestration, agent reasoning, post generation verification and adaptation to learning.

Where Agentic RAG can help?

Summarization — While traditional RAGs retrieve top K chunks, Agentic RAG would be useful if the document is extensive.

Document comparison — Since traditional RAG fetches top K chunks from each document, effective comparison may be throttled.

Tackling multi-part questions — Handling multi-hop questions primarily based on structured data or relations is a challenge.

and so on…

How is it different from traditional RAG?

A traditional RAG begins with turning your structured or unstructured dataset into text documents, and breaking down text into small pieces (chunks).

Then a text embedding model steps in, turning each chunk into vectors representing their semantic meaning. These embeddings are then stored in a vector database, serving as the foundation for data retrieval.

Upon receiving a user query, the vector database helps retrieve chunks relevant to the user’s request. With context, an LLM synthesizes these pieces to generate a coherent and informative response.

Now based on the function, spectrum of capabilities, they can be either routing, one-shot query planning, utilizing tools, employing reason + act (ReAct) methodology, and orchestrating dynamic planning and execution in nature.

Routing Agent relies on an LLM to select the appropriate downstream RAG pipeline

Query planning agent divides a complex query into parallelizable subqueries, each of which can be executed across various RAG pipelines and then the amalgamated results are synthesized into a comprehensive response.

Tooling Agent chooses additional data that is required from external sources such as an API, an SQL database, or an application with an API interface to provide enhanced context to input query.

ReAct Agent is capable of handling sequential multi-part queries while maintaining state (in memory) and this essentially is a complex combination of routing, query planning, and tool use into a single entity.

Dynamic Planning Agents are more complex in nature with enhanced reliability, observability, parallelization, control, and separation of concerns. It determines the necessary steps to fulfill an input query plan by creating a DAG and then determine the tools, if any, required for executing each step in the plan and execute.

How Databricks Mosaic AI Agent Framework addresses Agentic RAG?

Mosaic AI Agent Framework comprises a set of tools on Databricks designed to help developers build, deploy, and evaluate production-quality agents like Retrieval Augmented Generation (RAG) applications.

Following is the high-level component map to facilitate comprehensive Agentic RAG in Databricks:

Data Pipelines involving Delta Tables, Volumes, Workflows, DLT etc.

Vector Embeddings using either open-source models like BGE, E5 or proprietary models like OpenAI, Anthropic or custom fine-tuned models.

Serverless Mosaic Vector Databases to store the embeddings

Building Agents using mlflow LangChain or PyFunc under MLflow with pre-defined input/output schema

Logging of Agent using LangChain or PyFunc under MLflow

Registration of the RAG chain as model within Unity Catalog (UC)

Deployment of Agents which got registered in Unity Catalog

Enablement of Review Apps where stakeholders can give feedback and all the query requests/responses to agents gets logged

Tracing of Agents using Autologging for LangChain agents and Fluent API for PyFunc based agents.

Installation of Mosaic AI Agent Evaluation Framework to leverage LLM Judge and Agent Metric on parameters like chunk relevance, ground ness, query relevance, response safety, latency, total token count etc.

Conclusion

The emergence of Agentic RAG have transcended conventional RAG systems. As the demand for compound AI systems uptick, the distinction between Traditional RAG vs Agentic RAG becoming increasingly clear, underscoring the importance of adopting these innovative solutions. By incorporating custom agents that can interact with multiple systems, automating reasoning, and dynamically selecting the best tools for the task at hand, we are building more effective solution for handling complex queries and a wider range of use cases and Databricks Mosaic AI Agent Framework is certainly going to play a pivotal role in this entire journey to evaluate the quality of Agentic RAG application, iterate quickly with the ability to test hypothesis, redeploy application easily, and at the same time have the appropriate governance and guardrails to ensure quality.