Databricks LTAP: The 40-Year OLTP/OLAP Wall Just Got a Sledgehammer

Databricks LTAP: The 40-Year OLTP/OLAP Wall Just Got a Sledgehammer

Databricks claims LTAP collapses the four-decade divide between transactional and analytical databases into a single copy of data. No CDC, no ETL, no copies. Here’s why it matters and where it could fail.

Databricks LTAP: The 40-Year OLTP/OLAP Wall Just Got a Sledgehammer

For forty years, data teams have accepted the architectural tax of maintaining separate systems for transactions and analytics. Change data capture pipelines, ETL jobs, replicated storage, all of it just to keep two copies of the same data vaguely in sync. Databricks just threw a grenade into that arrangement with LTAP (Lake Transactional/Analytical Processing), an architecture that stores a single copy of data, in open formats, that serves both OLTP and OLAP workloads simultaneously. It’s a claim that could reshape the data infrastructure industry, or join the graveyard of well-intentioned unification attempts. Here’s what’s actually under the hood.


For forty years, the data industry has operated under an unspoken truce: transactional databases run the business, analytical databases understand the business, and never the twain shall meet without a CDC pipeline and a prayer.

That truce just got declared obsolete.

Databricks LTAP architecture diagram showing unified storage for OLTP and OLAP
Databricks LTAP unifies transactional and analytical processing on a single copy of data.

At its Data + AI Summit on June 16, Databricks CEO Ali Ghodsi unveiled LTAP, an architecture that promises to collapse the long-standing divide between OLTP and OLAP workloads into a single copy of data. The claim is audacious enough that it deserves scrutiny: “For the first time, we think we’ve cracked the unification code”, Ghodsi told Forbes.

But audacious claims require audacious evidence. Let’s dig into what LTAP actually is, how it works, and whether it’s the architectural revolution Databricks is pitching or just another attempt to sell you a bigger platform.

The Problem That Refuses to Die

The separation between transactional and analytical systems isn’t an accident, it’s a consequence of genuinely different workload profiles.

Transactional databases (PostgreSQL, MySQL, Oracle) are optimized for small, fast reads and writes. They excel at processing individual orders, updating inventory levels, and managing user sessions. Ask one to scan five years of historical sales data, and you risk bringing your production application to its knees.

Analytical systems (Snowflake, Redshift, the Databricks Lakehouse) are built for the opposite: massive scans, complex aggregations, and multi-table joins across billions of rows. They’d be terrible at handling a single payment transaction.

The traditional solution is expensive and brittle: run your application on PostgreSQL, replicate data through CDC pipelines to a warehouse, pay for storage in both systems, and accept that analytical results are always a bit stale. As one veteran engineer put it, “the pipeline layer becomes the ceiling almost immediately as an agent runs hundreds of times per task.”

What LTAP Actually Does

LTAP is not another HTAP attempt. That distinction matters.

Hybrid Transactional/Analytical Processing (HTAP) tried to force both workloads into a single query engine. The result was predictable: compromised performance for both workloads, expensive proprietary infrastructure, and limited adoption.

LTAP takes a fundamentally different approach: unify at the storage layer, not the compute layer. Lakebase, Databricks’ serverless PostgreSQL database built on its Neon acquisition, writes transactional data directly into Apache Iceberg and Delta formats on object storage. The same storage layer that powers the analytical Lakehouse now also hosts operational data.

The architecture follows three properties:

  1. Unified governance: All operational, analytical, and streaming data live in Unity Catalog under a single identity and permissions model.
  2. No performance tradeoffs: Transactional workloads run in standard Postgres with full ACID semantics. Analytical workloads run across the full Lakehouse at any scale. Each scales independently because compute is fully separated.
  3. No ETL pipelines: There are no pipelines synchronizing operational and analytical stores. The data is written once and immediately available for both workloads.

The internal team reportedly called it “minus-one ETL.”

The Lakebase Foundation

Lakebase isn’t theoretical. Launched just last year, it already serves thousands of customers including Block, Superhuman, and Zillow, and handles 12 million database launches per day.

The technical trick is making object storage, traditionally slow and unreliable for transactional workloads, fast enough for Postgres. Databricks accomplished this with a safekeeper architecture for reliability and page servers with caches for performance.

New capabilities announced alongside LTAP include cross-cloud disaster recovery, git-style branching and snapshots, and autonomous database operations that let agents monitor health, detect slowdowns, and propose indexes.

Why the Agentic Era Changes Everything

The justification for LTAP isn’t just architectural purity. It’s about what happens when AI agents become the primary consumers of data infrastructure.

Databricks reports that roughly 80% of databases on its platform are now created by agents rather than humans. That’s a staggering number if true, and it changes the constraints entirely.

“Agents don’t behave like people, or even like the apps we built for people”, said Michael Leone, principal analyst at Moor Insights and Strategy. “They read for context, loop, try things, then write something back, thousands of times over in ways you can’t fully predict.”

An agent that needs to create a new application, test it, analyze results, and iterate needs instant access to fresh data. It can’t wait for a CDC pipeline to sync a read replica. It needs the operational state of the business, immediately, alongside the historical context to make decisions.

Ghodsi put it bluntly: “You simply can’t have eighteen incompatible technologies, endless copies of data moving between them, multiple governance systems, and thousands of agents operating on top of all that complexity. That model doesn’t scale.”

How LTAP Compares to Previous Approaches

Approach Storage Layer Compute Layer Pipeline Requirement Workload Isolation
Traditional OLTP/OLAP Separate Separate CDC + ETL Full
HTAP (Single Engine) Shared Shared None Compromised
Zero ETL Separate Separate Hidden CDC Full
LTAP Shared (Open Formats) Separate None Full

The key insight is that LTAP doesn’t force a single engine to be good at everything, a trap that killed HTAP. Instead, it lets different engines (Postgres for transactions, Photon for analytics) operate on the same data simultaneously, each optimized for its workload.

The Skeptic’s Take: Where This Could Fall Apart

The technical community isn’t buying the hype without evidence. The most pointed criticism came from a Reddit thread where one user summarized the concern: “If everything is a priority, nothing is.”

The specific concerns worth watching:

Transactional guarantees on object storage: Object storage has higher latency and weaker consistency guarantees than local SSDs. Databricks has built safekeepers and page caches to mitigate this, but production benchmarks under real OLTP loads remain limited.

Mixed-workload isolation: LTAP promises that transactional and analytical workloads scale independently. But what happens when a massive analytical query competes for storage I/O with a latency-sensitive transaction? The architecture claims full isolation, but the proof will be in the commit-to-query latency numbers under real load.

Cross-region latency: The new disaster recovery capabilities span cloud regions, but Postgres-native transactions over geographic distances face fundamental physics constraints.

The lock-in question: While LTAP stores data in open formats like Iceberg, the compute engines and governance layer are deeply integrated with Databricks. An analyst noted that this approach “reduces some vendor lock-in compared to proprietary HTAP engines”, but moving 12 million daily database launches off the platform would be non-trivial.

Cost modeling: Running transactional workloads on object storage with caching layers has different cost dynamics than traditional Postgres on local SSDs. The total cost of ownership at scale remains a question mark.

What This Means for Data Teams

For practitioners, the implications ripple across several dimensions:

Pipeline design: If LTAP delivers on its promise, the CDC pipelines that consume 30-40% of many data engineering budgets become unnecessary. That’s not just cost savings, it’s a shift in how teams allocate talent.

Governance fragmentation: When data exists in multiple systems with separate governance, security gaps compound exponentially with agent-driven workloads. A single copy under Unity Catalog eliminates that surface area.

Agent development: Developers building AI agents no longer need to stitch together data from transactional systems, data warehouses, and vector databases through custom integrations. The data is already there, fresh, and governed.

As Cloudflare’s CEO has noted about related architectural shifts: the bottleneck in modern data systems isn’t compute or storage, it’s the integration tax of moving data between systems.

The Competitive Landscape Responds

Databricks isn’t alone in pursuing unified architectures. Google has AlloyDB, Snowflake is pursuing its own PostgreSQL strategy, and Microsoft is building Azure HorizonDB. But Ghodsi argues that most competitors are still approaching unification as integration rather than elimination.

“Snowflake is pursuing its own PostgreSQL strategy”, he noted. “But what we believe others still underestimate is that the future isn’t about stitching together more categories of software. It’s about eliminating categories.”

The challenge for competitors is architectural. Snowflake’s architecture separates compute from storage but doesn’t natively support Postgres-compatible transactions on the same storage layer. Google’s AlloyDB is promising for basic OLTP systems that want to do a bit of analytics, but as one engineer noted, “If you’re doing clickstream, IoT, or anything like that though, no chance.”

For organizations evaluating their options, the architectural tradeoffs LTAP aims to resolve will determine whether this is a genuine breakthrough or a well-engineered compromise.

Paradigm Shift or Vendor Play?

LTAP is both a genuine technical achievement and a competitive move designed to deepen Databricks’ platform moat. Those aren’t mutually exclusive.

The key question for data teams to evaluate: does the unification of OLTP and OLAP on a single copy of data solve a real pain point in your organization? For teams running AI agents that need live operational data alongside historical context, the answer is likely yes. For teams with simple OLTP needs and batch analytics, the answer is probably not yet.

The InfoWorld analysis captured the nuance well: “CIOs will still need to choose their data architecture based on latency, reliability, ecosystem fit, cost, compliance, and developer experience. The architecture looks sound on paper. The proof will be in the commit-to-query latency numbers under real load.”

For now, LTAP is coming soon as part of Lakebase. The architectural argument is compelling. The production evidence remains to be seen.


Databricks launches LTAP to unify OLTP and OLAP, showing a diagram of the unified architecture
Databricks LTAP unifies operational and analytical workloads on a single copy of data.

For a deeper dive into how this challenges traditional data engineering patterns, see how LTAP challenges the medallion architecture’s separation of concerns, and explore open-source alternatives that eliminate OLTP/OLAP separation without Databricks.

Share:

Related Articles