Inside the race to build agent-native databases

Mike's Notes

I was curious about this article by Ben Lorica because Pipi 9 uses database-driven agents. So what does Ben have to say?

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

11/11/2025

Inside the race to build agent-native databases

By: Ben Lorica
Gradient Flow: 29/10/2025

Ben Lorica edits the Gradient Flow newsletter and hosts the Data Exchange podcast. He helps organize the AI Conference, the AI Agent Conference, the Applied AI Summit, while also serving as the Strategic Content Chair for AI at the Linux Foundation. You can follow him on Linkedin, X, Mastodon, Reddit, Bluesky, YouTube, or TikTok. This newsletter is produced by Gradient Flow.

In a recent piece, I explored the growing mismatch between our existing data infrastructure and the demands of emerging AI agents. Since then, I have had the opportunity to speak with some founders and engineering leaders who are tackling this challenge directly. Their work confirms that the rise of agentic AI is not just an application-layer phenomenon; it is forcing a fundamental reconsideration of the database itself. This article examines four distinct initiatives that are reimagining what a database should be in an era where software, not just humans, will be its primary user.

AgentDB: The Database as a Disposable File

AgentDB reimagines the database by treating it not as persistent, heavy infrastructure but as a lightweight, disposable artifact, akin to a file. Its core premise is that creating a database should be as simple as generating a unique ID; doing so instantly provisions a new, isolated database. This serverless approach, which can utilize embedded engines like SQLite and DuckDB, is designed for the high-velocity, ephemeral needs of agentic workflows, where an agent might spin up a database for a single task and discard it upon completion.

The initiative assumes that a significant portion of agentic tasks do not require the complexity of a traditional relational database. Its target use cases include developers building simple AI applications, agents needing a temporary “scratchpad” to process information, or even non-technical users who want to turn a data file, like a CSV of personal expenses, into an interactive chat application. Its primary limitation is that it is not designed for complex, high-throughput transactional systems with thousands of interconnected tables, such as an enterprise resource planning (ERP) system. AgentDB is currently live and accessible, with a focus on empowering developers to quickly integrate data persistence into their AI applications with minimal friction.

Postgres for Agents: Evolving a Classic for AI

Tiger Data’s “Postgres for Agents” takes an evolutionary, rather than revolutionary, approach. Instead of building a new database from scratch, it enhances PostgreSQL, the popular open-source database, with capabilities tailored for agents. The cornerstone of this initiative is a new storage layer that enables “zero-copy forking.” This allows a developer or an agent to create an instantaneous, isolated branch of a production database. This fork can be used as a safe sandbox to test schema changes, run experiments, or validate new code without impacting the live system.

This approach is built on the assumption that the reliability, maturity, and rich ecosystem of Postgres are too valuable to discard. The target user is any developer building applications with AI, who can now instruct an AI coding assistant to safely test database migrations on a full-scale copy of production data. It also serves AI applications that require a robust and stateful backend. The platform is now available via Tiger Data’s cloud service, which includes a free tier. While the core forking technology is currently proprietary, the company is signaling a long-term commitment to the open Postgres ecosystem.

Databricks Lakebase: Unifying Transactions and Analytics

The Databricks Lakebase represents a broad architectural vision aimed at dissolving the long-standing wall between operational and analytical data systems. It proposes a new category of database — a “lakebase” — that embeds transactional capabilities directly within a data lakehouse architecture. Built on open standards like Postgres, it is designed to be serverless, separate storage from compute for elastic scaling, and support modern developer workflows like instantaneous branching.

The core assumption of the Lakebase is that intelligent agents require seamless access to both real-time operational data and historical analytical insights to perform complex tasks. For example, an inventory management agent needs to check current stock levels (a transactional query) while also considering predictive demand models (an analytical query). The Lakebase is targeted at organizations, particularly those already invested in a lakehouse architecture, that want to build AI-native applications without the cost and complexity of maintaining separate databases and data pipelines. This is currently a strategic roadmap for Databricks, accelerated by its recent acquisition of companies like Mooncake Labs, and represents a long-term effort to create a single, unified platform for all data workloads.

Bauplan Labs: A Safety-First Approach for Agents

Bauplan Labs approaches the problem from the perspective of safety and reliability, motivated by the principle that modern data engineering requires the same rigor as software engineering. Their work focuses on creating a “programmable lakehouse,” an environment where every data operation is managed through code-based abstractions. This provides a secure and auditable foundation for AI agents to perform sensitive tasks. The central concept is a rigorously defined “Git-for-data” model, which allows agents to work on isolated branches of production data. Crucially, it introduces a “verify-then-merge” workflow. Before an agent’s changes are integrated, they must pass a series of automated correctness checks.

This framework assumes that for agents to be trusted with mission-critical systems, their actions must be verifiable and their potential for error contained. The target use cases are high-stakes scenarios, such as an agent tasked with repairing a broken data pipeline or safely querying financial data through a controlled API, where a mistake could have significant consequences. Bauplan is building its platform on a formal blueprint for safe, agent-driven data systems, an approach already being validated by early customers. While the company offers open-source tooling on GitHub, its focus is on providing a commercial-grade framework for high-stakes, agent-driven applications that will influence the design of future platforms.

The Broader Infrastructure Shift

These four initiatives, from AgentDB’s file-like simplicity to the ambitious unification of the Databricks Lakebase, highlight a clear trend: databases are being reshaped to serve machines. Whether by evolving the trusted foundation of Postgres or by designing safety-first frameworks like Bauplan’s, the data community is moving toward systems that are more ephemeral, isolated, and context-aware. As outlined in my earlier thoughts, databases are becoming more than just repositories of information; they are the operational state stores and external memory that provide agents with the traceability, determinism, and auditable history needed to function reliably.

Of course, the database is just one piece of the puzzle. As agents become more integrated into our workflows, other components of the technology stack also require reimagination. Search APIs, traditionally designed to return ten blue links for a human, must be adapted to deliver comprehensive, structured information for a machine. Development environments and IDEs are already evolving to become collaborative spaces for humans and AI coding assistants. The entire infrastructure, from headless browsers that allow agents to interact with the web to the observability tools that monitor their behavior, is being rebuilt for an agent-native world.

Quick Takes

Workflow Engine design proposal, tell me your thoughts

Mike's Notes

A thoughtful article by Oskar Dudycz about Emmett has convinced me to use Event SourcingUnfortunately, Emmett can't be used inside Pipi 9. Modifying Pipi 9 Workflow Engine with Event Sourcing would be possible by modifying the existing Message Store and related components. The design outlined here is very suitable and easy to implement in CFML. Thanks, Oskar.

Workflows external to Pipi can use third-party tools.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library > Subscriptions > Architecture Weekly
  • Home > Handbook > 

Last Updated

10/11/2025

Workflow Engine design proposal, tell me your thoughts

By: Oskar Dudycz
Architecture Weekly: 29/07/2025

Software Architect & Consultant / Helping fellow humans build Event-Driven systems, Blogger at https://event-driven.io/, OSS contributor https://github.com/oskardudycz, Proud owner of Amiga 500.

Welcome to the new week!

As you may know, I’m building Emmett - a library/framework aiming to make building business-focused apps smoother with an event-driven approach. It’s a community project, and with community projects, sharing is caring, so I’m sharing today with you an idea, hoping that you’ll also share something with me: your thoughts on it.

Summary

We're adding a workflow processing engine to Emmett for coordinating multi-step/asynchronous business processes. It uses an event store behind the scenes to provide durable execution with full observability, treating each workflow instance as an event stream that acts as both inbox and outbox for handled messages.

Problem

Multi-step processes are everywhere in business applications, not just the typical order-fulfilment examples:

  • Hotel group checkout coordinating multiple room settlements,
  • Code review workflow with multiple approvers and checks,
  • Customer onboarding with verification steps,
  • Incident response coordination across teams,
  • Data ingestion with validation and enrichment stages.

Why Durable Execution Matters

These processes need to survive more than we often realise. A group checkout may be a relatively quick batch operation, but a document review workflow can span days with human approvals. An incident response could run for hours, coordinating different teams.

When things fail, and they will, we need to know exactly where we stopped and why. A network blip shouldn't cause an in-progress checkout to be dropped in the middle. A service restart shouldn't forget which steps it passed. A bug fix shouldn't require manually tracking down stuck processes.

Of course, you could implement your own processing based on the existing abstractions in Emmett, like command handlers, consumers and async processors, but...

DIY solutions always have gaps. We handle the happy path, add some retry logic, then discover edge cases in production:

  • Messages lost during deployment
  • Processes stuck after partial failures
  • No way to resume after fixing bugs
  • Can't answer "what happened to order X?"

Current Tools Work But Bring Complexity

Temporal, Restate, AWS Step Functions, and similar tools solve these problems. They guarantee processes are complete despite failures. They provide visibility into process state. But they come with cognitive overhead.

To write a simple approval workflow in Temporal, you learn about activities, workflows, workers, task queues, and their deterministic replay requirements. Your "send email for approval" becomes an activity with retry policies, timeouts, and heartbeats. Testing requires their test server, time-skipping APIs, and mock activities.

These tools require learning their entire programming model, even for simple workflows. You get powerful features, but pay the complexity cost upfront, whether you need those features or not.

What We Actually Need

We need durable execution that feels natural. Something that allows just writing regular code. Testing should be just calling functions. The patterns should match how we already think about our domain. Simple workflows should be simple to write, with complexity available when needed. Therefore, we need a model that reuses and repeats patterns, allowing for the learning ladder. This can be achieved by providing safe defaults and extension points for advanced scenarios. In other words, an accessible model enabling both newbies and advanced users.

Background: Event Sourcing as Natural Solution

What Makes Execution Durable

At its core, durable execution means three things:

  1. Persist progress at each decision point
  2. Resume from the last known state after failures
  3. Know exactly what happened and why

Traditional approaches use state machines with database tables, workflow engines with proprietary storage, or complex checkpoint mechanisms. But there's a more straightforward way.

Event Sourcing: Built-in Durability

Event sourcing stores every business decision as an immutable event. For workflows, this means every input received and output produced becomes an event in a stream. This isn't just storage - it's the recovery mechanism. We can expand this model and use the event store as the message store, storing all messages. This allows us to store both commands (intention to run business logic) and events (facts about what has happened, which can be used as triggers for next steps in the workflow).

Consider the following scenario:

  1. Clerk initiating the group checkout for two guest stays, let's say with ids 1 and 2.
  2. We need to record that the group checkout was initiated and start regular checkout processes for each guest stay.
  3. Then we need to gather information based on successful or failed checkouts and complete the group checkout.

With our workflow design, the message stream with each recorded message representing a step in the workflow can look as follows:

1: InitiateGroupCheckout command handled.

2: GroupCheckoutInitiated event recorded for guest stays 1 and 2.

3: CheckOut command scheduled, initiating checkout for guest stay 1.

4: CheckOut command scheduled, initiating checkout for guest stay 2.

Then, the background process can poll those messages and sequentially process them, storing the checkpoints after successful processing.

Now, we could get the next step:

5: GuestCheckedOut event received informing about successful checkout for guest stay 1.

6. Process crashes, and the second checkout was not processed.

That's not an issue, as upon restart, we can

  1. Read the last successful checkpoint (which happens to be 3).
  2. Continue processing triggering Checkout for guest stay 2.
  3. No special recovery code needed

The workflow doesn't know it crashed. It just continues from where it left off.

Why Event Sourcing Makes Workflows Simple

Beyond recovery, event sourcing brings natural benefits to workflows:

  • Time Travel Debugging: When a workflow fails, replay event-by-event to see the exact decision sequence. No more "I wonder what state it was in."
  • What-If Analysis: Fix a bug and replay to see if workflows would complete correctly. Test process changes against historical data.
  • Natural Audit Trail: Every decision is recorded with its trigger. Compliance and debugging come for free.
  • Error Analysis: We could also store errors as messages. See patterns across failed workflows. Understand systemic issues.

Emmett already provides these event sourcing primitives. We have event stores that handle both events and commands (message stores). We have the decide/evolve pattern for business logic. We have consumers for background processing. What's missing is the orchestration pattern to coordinate across multiple streams with proper correlation. That's what workflows add.

Solution Overview

Workflows in Emmett follow the same mental model as command handlers: they receive commands, make decisions based on state, and produce new messages. But unlike command handlers that only respond to commands, workflows can react to events from anywhere in your system.

The pattern has three functions:

  • decide: Business logic that takes input + state and returns outputs
  • evolve: Builds state based on events (both inputs and outputs)
  • initialState: Starting state for new workflow instances

Each workflow instance gets its own event stream. This stream serves as both an inbox (recording inputs) and an outbox (recording outputs). This self-contained design means everything about a workflow instance lives in one place.

This design draws huge inspiration from Yves Reynhout's Workflow Pattern, adapting it to use Emmett's abstractions while aiming to retain its power.

Check also:

  • Pull Request with sample Emmett's Workflow based on this RFC,
  • My earlier article on implementing Workflow Pattern in TypeScript,
  • My article on message bus and message handling that inspired some of this design

How it Works - Technical Design

Processing Flow

When a message triggers a workflow, we use a double-hop pattern:

  1. Input message arrives from source stream (event or command).
  2. Router function (getWorkflowId) determines which workflow instance.
  3. Store input in the workflow's stream (as inbox).
  4. Consumer delivers the stored message from the workflow stream
  5. Rebuild state by replaying all events through evolve.
  6. Call decide with the input and current state.
  7. Store outputs back in the workflow stream (as outbox).
  8. Output processor sends commands and publishes events.

This flow ensures durability at each step:

Code Example - Group Checkout

Let's see how this works with a group checkout workflow that coordinates multiple individual checkouts:

export const GroupCheckoutWorkflow: Workflow
 GroupCheckoutInput,
 GroupCheckout,
 GroupCheckoutOutput
> = {
 decide,
 evolve,
 initialState,
};

The state tracks which guests have checked out:

export type GroupCheckout =
  | { status: 'NotExisting' }
  | {
      status: 'Pending';
      guestStayAccountIds: Map<string, GuestStayStatus>;
    }
  | { status: 'Finished' };

The decide function contains our business logic:

export const decide = (
  input: GroupCheckoutInput,
  state: GroupCheckout,
): WorkflowOutput<GroupCheckoutOutput> => {
  const { type } = input;
  switch (type) {
    case 'InitiateGroupCheckout': {
      return initiateGroupCheckout(input, state);
    }
    case 'GuestCheckedOut':
    case 'GuestCheckoutFailed': {
      return onCheckoutFinished(input, state);
    }
    case 'TimeoutGroupCheckout': {
      return timedOut(input, state);
    }
  }
};

and state evolution:

export const evolve = (
  state: GroupCheckout,
  {
    type,
    data: event,
  }: WorkflowEvent<GroupCheckoutInput | GroupCheckoutOutput>,
): GroupCheckout => {
  switch (type) {
    case 'GroupCheckoutInitiated': {
      if (state.status !== 'NotExisting') return state;
      return {
        status: 'Pending',
        guestStayAccountIds: event.guestStayAccountIds.reduce(
          (map, id) => map.set(id, GuestStayStatus.Pending),
          new Map<string, GuestStayStatus>(),
        ),
      };
    }
    case 'GuestCheckedOut':
    case 'GuestCheckoutFailed': {
      if (state.status !== 'Pending') return state;
      return {
        ...state,
        guestStayAccountIds: state.guestStayAccountIds.set(
          event.guestStayAccountId,
          type === 'GuestCheckedOut'
            ? GuestStayStatus.Completed
            : GuestStayStatus.Failed,
        ),
      };
    }
    case 'GroupCheckoutCompleted':
    case 'GroupCheckoutFailed':
    case 'GroupCheckoutTimedOut': {
      if (state.status !== 'Pending') return state;
      return {
        status: 'Finished',
      };
    }
  }
};
We register it with routing and type information:
  processorId: 'GroupCheckoutWorkflow',
  workflow: GroupCheckoutWorkflow,
  getWorkflowId: (input) => input.data.groupCheckoutId ?? null,
  inputs: {
    commands: ['InitiateGroupCheckout', 'TimeoutGroupCheckout'],
    events: ['GuestCheckedOut', 'GuestCheckoutFailed'],
  },
  outputs: {
    commands: ['CheckOut'],
    events: [
      'GroupCheckoutCompleted',
      'GroupCheckoutFailed',
      'GroupCheckoutTimedOut',
    ],
  },
});
export const groupCheckoutWorkflowProcessor = workflowProcessor({

Inputs and outputs have to be registered explicitly. We cannot detect them from types, as they're erased on the JavaScript runtime. Still, they will have a proper intellisense and get compiler help, not allowing them to register more types than allowed.

We're having an explicit registration for commands and events to make API explicit, and also not require additional wrapper functions in the decide method.

See the full usage sample:

Stream Structure

The workflow's stream name comes from the getWorkflowId function. Here's what the stream contains as the workflow executes:

Pos |  Kind   | Direction | Message
----|---------|-----------|------------------------------------------
1   | Command | Input     | InitiateGroupCheckout {groupId: '123'}
2   | Event   | Output    | GroupCheckoutInitiated
3   | Command | Output    | CheckOut {guestId: 'g1'}
4   | Command | Output    | CheckOut {guestId: 'g2'}
5   | Command | Output    | CheckOut {guestId: 'g3'}
6   | Event   | Input     | GuestCheckedOut {guestId: 'g1'}
7   | Event   | Input     | GuestCheckoutFailed {guestId: 'g2'}
8   | Event   | Input     | GuestCheckedOut {guestId: 'g3'}
9   | Event   | Output    | GroupCheckoutFailed

Notice how inputs and outputs interleave. The workflow receives inputs (positions 1, 6, 7, 8) and produces outputs (positions 2, 3, 4, 5, 9). Inputs can be either commands sent sequentially from the API, other message processing, or events recorded in regular streams.

Emmett's event store, which is actually a message store that tracks both events and commands with their types.

This single stream contains the complete story, including what triggered each decision and what the workflow ultimately decided. No distributed tracing needed.

Why This Design

Actor Model: Self-Contained Instances

Each workflow instance acts like an actor with its own mailbox:

This isolation brings several benefits:

  • No shared state: Each workflow instance has its own stream
  • Horizontal scaling: Process many workflows in parallel
  • Clear boundaries: Easy to reason about each instance
  • Natural sharding: Distribute by workflow ID

Observable by Design

Everything about a workflow lives in one stream. Want to know why a checkout failed? Read the stream. Need to debug a stuck process? The entire history is there. This isn't observability bolted on - it's fundamental to how workflows work. When connected to OpenTelemetry, it should provide better visibility into business processes than regular solutions. Knowledge about abstractions like commands and events can provide durable context, not only for error debugging, but also for classical telemetry.

Durable by Default

Recovery isn't a special case - it's just reading the stream and rebuilding state. No checkpointing logic. The same code path handles normal processing and crash recovery. If we want to replay messages, we can just replay processing from a specific stream position.

Simple Testing

Workflows are pure functions. Testing them is straightforward:

// Unit test
const state = { status: 'Pending', guestStayAccountIds: new Map(...) };
const input = { type: 'GuestCheckedOut', data: { guestId: 'g1' } };
const outputs = decide(input, state);
// Assert expected outputs

You can test it without workflow runtime using regular tests, or built-in Emmett Behaviour-Driven Design testing capabilities, e.g.:

it('handles partial failures', () =>
  given([
    { type: 'GroupCheckoutInitiated', data: { guestIds: ['g1', 'g2'] } },
    { type: 'GuestCheckedOut', data: { guestId: 'g1' } },
  ])
    .when({
      type: 'GuestCheckoutFailed',
      data: { guestId: 'g2', reason: 'BalanceNotSettled' },
    })
    .then({
      type: 'GroupCheckoutFailed',
      data: { completedCheckouts: ['g1'], failedCheckouts: ['g2'] },
    }));

Read more in: Unit testing docs.

A similar pattern can be used for a full integration test.

No special, external workflow runtime is needed to test the workflow code. Compare this to other tools where you need test servers, activity mocks, and time manipulation. Emmett workflows are just functions processing messages.

Why Emmett Over Established Tools

  • Familiar Patterns: If you know Emmett's command handlers, you know workflows. Same decide/evolve pattern, just triggered by events too.
  • Progressive Complexity: Start with a simple workflow. Add timeout handling when needed. Add compensation if required. You don't front-load complexity.
  • Lightweight Runtime: No separate workflow server. No special deployment model. Just your application with background processors.

Design Tradeoffs

Latency: The Double-Hop Cost

Storing inputs before processing adds latency. A message goes:

  1. Source → Workflow stream (store)
  2. Workflow stream → Processor (execute)

For background processes, this extra hop is negligible compared to the business operations. For latency-sensitive operations, use regular command handlers instead.

Future versions could make this configurable, allowing for direct processing for low-latency needs and a double-hop approach for full observability. We'd need community feedback on whether this flexibility matters.

Storage: Everything Twice

Messages exist in both source streams and workflow streams. A GuestCheckedOut event appears in:

  • The guest's stream (source of truth)
  • The workflow's stream (for processing)

This duplication is intentional. Different retention policies can help - keep workflow streams shorter since they're derived data. The observability benefit outweighs the storage cost for most applications.

When Not to Use Workflows

Workflows aren't for everything:

  • Simple request-response: Use command handlers.
  • Read models creation: Use projections.
  • Simple message handling: Use reactors, they're simple enough for basic choreography,
  • Synchronous operations: Workflows can handle them, but for most cases, command handlers are simpler.

Workflows shine for multi-step, long-running processes that coordinate across streams.

This design is based on Yves Reynhout's Workflow Pattern, which has proven itself in production systems. We've adapted it to fit naturally with Emmett's abstractions.

Future Possibilities

Other Workflow Patterns

The current design provides a simpler alternative to traditional process managers. We could add Saga as a stateless router with compensation logic and simple event handlers for choreography.

These would build on the similar foundation.

Processing Mode Flexibility

Currently, we always store inputs first for full observability. Some workflows might prefer lower latency:

  • Full observability mode: Current double-hop design
  • Direct mode: Process immediately, only store outputs
  • Hybrid: Configure per message type

This needs careful design to maintain durability guarantees.

Enrichment Patterns

Workflows may sometimes need external data. How should they fetch it deterministically?

  • Option 1 - Pre-enrichment: Enrich at system edges before the workflow sees the message.
  • Option 2 - Enrichment Step: Dedicated enrichment workflows that add data.
  • Option 3 - Request/Response:

Workflow → FetchPrices query → Wait for Prices result.

The same can be used for external communication, and also for output enrichment/transformation.

Each has different tradeoffs for testing, performance, and complexity. We need user feedback.

Messaging System Support

Today, inputs come from event stores. Tomorrow they could also come from:

  • Kafka topics,
  • RabbitMQ queues,
  • SQS messages,
  • Supabase queues,
  • etc.

The workflow doesn't care about the source. Streams remain in the event store for durability.

Error Handling Evolution

Future versions could:

  • Store errors as events for analysis
  • Configure retry policies per workflow
  • Dead letter handling for poison messages
  • Timeout escalation strategies

Paid Tools (Making Emmett Sustainable)

I want to make Emmett sustainable through optional paid tooling:

  • Visual Workflow Debugger: Step through workflow execution with time travel. See state at each decision point. Understand complex flows visually.
  • Business Process Analytics: How long do checkouts take? Where do they fail? Which steps are bottlenecks? Dashboard for business users.
  • State Diagram Generation: Generate visual documentation from running workflows. See all possible paths. Spot missing transitions.
  • Performance Analysis: Which workflows consume most resources? Where are the hot paths? Optimisation recommendations.
  • Documentation Generation: Turn workflow definitions into readable docs. Keep business and code in sync.
  • Question for the community: Would these tools provide enough value to justify payment? This would fund continued Emmett development while keeping the core open source.

Conclusion

This workflow engine brings durable execution to Emmett using patterns developers already know. It allows you to just write TypeScript code. Event sourcing provides durability and observability naturally, not as add-ons, but as fundamental properties.

We avoid the complexity of existing tools while keeping power available when needed. Simple workflows stay simple. Complex workflows become possible. Testing remains straightforward. We want to compose, avoiding vendor locking.

Together, we can build workflows that are both powerful and pleasant to use. We're really open to your questions, suggestions and constructive criticism.

We'd also like to know if the additional paid add-ons mentioned are something you would consider paying for.

You can comment either here or through the RFC published in the Emmett repository: https://github.com/event-driven-io/emmett/pull/257.

Feel also invited to join our Discord Channel if you prefer chat.

Thank you in advance!

Cheers!

Oskar

p.s. that’s also an example of what I meant in last e-mail that I plan to share more explicitly on stuff I do.

p.s.2. Ukraine is still under brutal Russian invasion. A lot of Ukrainian people are hurt, without shelter and need help. You can help in various ways, for instance, directly helping refugees, spreading awareness, and putting pressure on your local government or companies. You can also support Ukraine by donating, e.g. to the Ukraine humanitarian organisation, Ambulances for Ukraine or Red Cross.

Graph RAG vs SQL RAG

Mike's Notes

Great explanation. Pipi uses both.

Resources

References

  • Reference

Repository

  • Home > Ajabbi Research > Library > Subscription > Towards Data Science
  • Home > Handbook > 

Last Updated

09/11/2025

Graph RAG vs SQL RAG

By: Reinhard Sellmair
Towards Data Science: 01/11/2025

.


Image generated with ChatGPT

I stored a Formula 1 results dataset in both a graph database and a SQL database, then used various large language models (LLMs) to answer questions about the data through a retrieval-augmented generation (RAG) approach. By using the same dataset and questions across both systems, I evaluated which database paradigm delivers more accurate and insightful results.

Retrieval-Augmented Generation (RAG) is an AI framework that enhances large language models (LLMs) by letting them retrieve relevant external information before generating an answer. Instead of relying solely on what the model was trained on, RAG dynamically queries a knowledge source (in this article a SQL or graph database) and integrates those results into its response. An introduction to RAG can be found here.

SQL databases organize data into tables made up of rows and columns. Each row represents a record, and each column represents an attribute. Relationships between tables are defined using keys and joins, and all data follows a fixed schema. SQL databases are ideal for structured, transactional data where consistency and precision are important — for example, finance, inventory, or patient records.

Graph databases store data as nodes (entities) and edges (relationships) with optional properties attached to both. Instead of joining tables, they directly represent relationships, allowing for fast traversal across connected data. Graph databases are ideal for modelling networks and relationships — such as social graphs, knowledge graphs, or molecular interaction maps — where connections are as important as the entities themselves.

Data

The dataset I used to compare the performance of RAGs contains Formula 1 results from 1950 to 2024. It includes detailed results at races of drivers and constructors (teams) covering qualifying, sprint race, main race, and even lap times and pit stop times. The standings of the drivers and constructors’ championships after every race are also included.

SQL Schema

This dataset is already structured in tables with keys so that a SQL database can be easily set up. The database’s schema is shown below:

SQL Database Design

Races is the central table which is linked with all types of results as well as additional information like season and circuits. The results tables are also linked with Drivers and Constructors tables to record their result at each race. The championship standings after each race are stored in the Driver_standings and Constructor_standings tables.

Graph Schema

The schema of the graph database is shown below:

Graph Database Design

As graph databases can store information in nodes and relationships it only requires six nodes compared to 14 tables of the SQL database. The Car node is an intermediate node that is used to model that a driver drove a car of a constructor at a particular race. Since driver – constructor pairings are changing over time, this relationship needs to be defined for each race. The race results are stored in the relationships e.g. :RACED between Car and Race. While the :STOOD_AFTER relationships contain the driver and constructor championship standings after each race.

Querying the Database

I used LangChain to build a RAG chain for both database types that generates a query based on a user question, runs the query, and converts the query result to an answer to the user. The code can be found in this repo. I defined a generic system prompt that could be used to generate queries of any SQL or graph database. The only data specific information was included by inserting the auto-generated database schema into the prompt. The system prompts can be found here.

Here is an example how to initialize the model chain and ask the question: “What driver won the 92 Grand Prix in Belgium?”

from langchain_community.utilities import SQLDatabase
from langchain_openai import ChatOpenAI
from qa_chain import GraphQAChain
from config import DATABASE_PATH

# connect to database
connection_string = f"sqlite:///{DATABASE_PATH}"
db = SQLDatabase.from_uri(connection_string)

# initialize LLM
llm = ChatOpenAI(temperature=0, model="gpt-5")

# initialize qa chain
chain = GraphQAChain(llm, db, db_type='SQL', verbose=True)

# ask a question
chain.invoke("What driver won the 92 Grand Prix in Belgium?")

Which returns:

{'write_query': {'query': "SELECT d.forename, d.surname
FROM results r
JOIN races ra ON ra.raceId = r.raceId
JOIN drivers d ON d.driverId = r.driverId
WHERE ra.year = 1992
AND ra.name = 'Belgian Grand Prix'
AND r.positionOrder = 1
LIMIT 10;"}} 
{'execute_query': {'result': "[('Michael', 'Schumacher')]"}}
 {'generate_answer': {'answer': 'Michael Schumacher'}}

The SQL query joins the Results, Races, and Drivers tables, selects the race at the 1992 Belgian Grand Prix and the driver who finished first. The LLM converted the year 92 to 1992 and the race name from “Grand Prix in Belgium” to “Belgian Grand Prix”. It derived these conversions from the database schema which included three sample rows of each table. The query result is “Michael Schumacher” which the LLM returned as answer.

Evaluation

Now the question I want to answer is if an LLM is better in querying the SQL or the graph database. I defined three difficulty levels (easy, medium, and hard) where easy were questions that could be answered by querying data from only one table or node, medium were questions which required one or two links among tables or nodes and hard questions required more links or subqueries. For each difficulty level I defined five questions. Additionally, I defined five questions that could not be answered with data from the database.

I answered each question with three LLM models (GPT-5, GPT-4, and GPT-3.5-turbo) to analyze if the most advanced models are needed or older and cheaper models could also create satisfactory results. If a model gave the correct answer, it got 1 point, if it replied that it could not answer the question it got 0 points, and in case it gave a wrong answer it got -1 point. All questions and answers are listed here. Below are the scores of all models and database types:

Model
Graph DB SQL DB;
GPT-3.5-turbo -2 4
GPT-4 7 9
GPT-5 18 18
Model – Database Evaluation Scores

It is remarkable how more advanced models outperform simpler models: GPT-3-turbo got about half the number of questions wrong, GPT-4 got 2 to 3 questions wrong but could not answer 6 to 7 questions, and GPT-5 got all except one question correct. Simpler models seem to perform better with a SQL than graph database while GPT-5 achieved the same score with either database.

The only question GPT-5 got wrong using the SQL database was “Which driver won the most world championships?”. The answer “Lewis Hamilton, with 7 world championships” is not correct because Lewis Hamilton and Michael Schumacher won 7 world championships. The generated SQL query aggregated the number of championships by driver, sorted them in descending order and only selected the first row while the driver in the second row had the same number of championships.

Using the graph database, the only question GPT-5 got wrong was “Who won the Formula 2 championship in 2017?” which was answered with “Lewis Hamilton” (Lewis Hamilton won the Formula 1 but not Formula 2 championship that year). This is a tricky question because the database only contains Formula 1 but not Formula 2 results. The expected answer would have been to reply that this question could not be answered based on the provided data. However, considering that the system prompt did not contain any specific information about the dataset it is understandable that this question was not correctly answered.

Interestingly using the SQL database GPT-5 gave the correct answer “Charles Leclerc”. The generated SQL query only searched the drivers table for the name “Charles Leclerc”. Here the LLM must have recognized that the database does not contain Formula 2 results and answered this question from its common knowledge. Although this led to the correct answer in this case it can be dangerous when the LLM is not using the provided data to answer questions. One way to reduce this risk could be to explicitly state in the system prompt that the database must be the only source to answer questions.

Conclusion

This comparison of RAG performance using a Formula 1 results dataset shows that the latest LLMs perform exceptionally well, producing highly accurate and contextually aware answers without any additional prompt engineering. While simpler models struggle, newer ones like GPT-5 handle complex queries with near-perfect precision. Importantly, there was no significant difference in performance between the graph and SQL database approaches – users can simply choose the database paradigm that best fits the structure of their data.

The dataset used here serves only as an illustrative example; results may differ when using other datasets, especially those that require specialized domain knowledge or access to non-public data sources. Overall, these findings highlight how far retrieval-augmented LLMs have advanced in integrating structured data with natural language reasoning.

Workspaces for Agriculture

Mike's Notes

This is where I will keep detailed working notes on creating Workspaces for Agriculture. Eventually, these will become permanent, better-written documentation stored elsewhere. Hopefully, someone will come up with a better name than this working title.

This replaces coverage in Industry Workspace written on 13/10/2025.

Testing

The current online mockup is version 1 and will be updated frequently. If you are helping with testing, please remember to delete your browser cache so you see the daily changes. Eventually, a live demo version will be available for field trials.

Learning

I was a member of the AgriTech NZ Geospatial Working Group in 2021-2022. It was a great learning opportunity to be part of an experienced team. There is an excellent GitHub repository storing the results of this work, and much of it is used here.

Why

Many of my friends are working farmers or rural workers, so anything that makes their lives easier must be a good thing.

Resources

References

  • Geospatial Interoperability Working Group
  • AgriTech NZ Data Specifications (GitHub)

Repository

  • Home > Ajabbi Research > Library >
  • Home > Handbook > 

Last Updated

8/11/2025

Workspaces for Agriculture

By: Mike Peters
On a Sandy Beach: 8/11/2025

Open-source

This open-source SaaS cloud system will be shared on GitHub and GitLab.

Dedication

This workspace is dedicated to the life and work of ...

"

 (To come)

" - Wikipedia

Existing products

  • AgriWebb
  • Figured
  • Halter

Features

This is a basic comparison of features found in agricultural software.

[TABLE]

Data Model

words

Database Entities

  • Facility
  • Party
  • etc

Entity Diagram

Spatial Model

Holding > Site > Plot

  • Holding (Operating farm)
  • Site (Land management unit)
  • Plot (Paddock, block, crop)

Spatial Diagram


Spatial Geodatabase

(To come)

Standards

The workspace needs to comply with all international standards and, at the same time, work with local standards in any part of the world.

  • (To come)

BMP+ Processes

(To come)

XSD

(To come)

API

(To come)

Support

There will be extensive free documentation sets tailored for users, developers, and data scientists.

Ajabbi will provide free support to developers with a paid DevOps Account who are supporting end users of Workspaces for Agriculture.

Workspace navigation menu

This default outline needs a lot of work. The outline can be easily customised by future users using drag-and-drop and tick boxes to turn features off and on.

  • Enterprise Account
    • Applications
      • Agriculture v.2
        • Aquaculture
          • (To come)
        • Farming (to sort)
          • Crop
          • Equipment
          • Farm
          • Fence
          • Field
          • Gate
          • Livetock
        • Forestry
          • Fire
          • Harvest
          • Plant
          • Silverculture
        • Horticulture
          • Crop
          • Fertiliser
          • Harvest
        • Customer (v2)
          • Bookmarks
            • (To come)
          • Support
            • Chats
            • Feedback
            • Requests
            • Tickets
          • Training
            • (To come)
        • Settings (v3)
          • Account
          • Billing
          • Deployments
            • Workspaces
              • Modules
              • Plugins
              • Templates
                • Beekeeping
                • Farm
                • Orchard
                • Vinyard
              • Users