Database 2025: PostgreSQL’s Hegemony and the Great Data Architecture Reshuffle

If you blinked in 2025, you might have missed PostgreSQL quietly completing its world domination tour. While everyone was distracted by AI agents and vector embeddings, the 30-year-old database crept into every cloud, every startup, and somehow became the default answer to “what database should we use?” The real story isn’t about innovation, it’s about consolidation, commoditization, and the slow realization that most “breakthrough” database technologies are just features waiting to be absorbed into Postgres.

PostgreSQL: The Database That Ate the World

PostgreSQL 18 dropped in November 2025 with features that would have been revolutionary… in 2005. The new asynchronous I/O storage subsystem finally drags PostgreSQL away from its dependence on the OS page cache, a problem other databases solved decades ago. Skip scans, another headline feature, let queries use multi-key B+Tree indexes even when missing the leading keys. Oracle has had this since 9i in 2002.

This isn’t a knock on PostgreSQL. It’s a testament to its momentum. The database world isn’t rewarding innovation, it’s rewarding stability and ecosystem. When Databricks dropped $1 billion on Neon, they weren’t buying a revolutionary storage engine, they were buying insurance that their lakehouse story includes PostgreSQL compatibility. Snowflake’s $250 million CrunchyData acquisition followed the same logic: if your customers demand Postgres, you give them Postgres.

Even Microsoft, which has been playing database whack-a-mole with confusing product names for years, launched HorizonDB, a clean-slate PostgreSQL service that finally admits what everyone already knew: developers just want Postgres. The era of “we have a better proprietary database” is over. The era of “we have a better managed Postgres” has begun.

The Distributed PostgreSQL Delusion

The hottest trend in 2025 was trying to make PostgreSQL scale horizontally. Supabase hired the Vitess co-creator to build Multigres. PlanetScale announced Neki, their own Postgres sharding middleware. PgDog emerged as an open-source alternative.

Here’s the thing: we’ve been here before. Citus has been sharding PostgreSQL for analytics since 2010. Postgres-XC tried it for OLTP in 2010. Greenplum did it for data warehousing. They all hit the same wall: OLTP sharding is brutally hard, and middleware can only paper over the fundamental limitations of a single-master architecture.

The current crop of solutions is smarter about connection pooling and query routing, but they’re still fighting physics. When your query needs to join across shards, no amount of middleware magic eliminates that network round trip. The real solution, rewriting PostgreSQL’s storage and transaction layer to be truly distributed, requires forking the codebase, which is why YugabyteDB remains the most widely deployed sharded PostgreSQL system despite only supporting PostgreSQL 15 semantics.

MCP: The Database Access Protocol Nobody Asked For But Everyone Implemented

If 2023 was the year every database added a vector index, 2025 was the year every database added an MCP server. Anthropic’s Model Context Protocol, announced in November 2024, became the must-have accessory by March 2025 when OpenAI pledged support. Suddenly, every database vendor, from ClickHouse to MongoDB to Snowflake, shipped an MCP server.

The pitch is compelling: a standardized JSON-RPC interface that lets LLMs discover and interact with your databases without custom glue code. The reality is messier. Most MCP servers are thin proxies that translate JSON requests into SQL with minimal validation. It’s ODBC reinvented with more buzzwords and less security.

The security implications are already keeping SREs awake. Neon reported in July that agents create 80% of their databases, which sounds impressive until you realize these are likely test branches that may or may not have proper access controls. Supabase’s MCP documentation offers best practices, but they rely on humans following them, a strategy with a historically poor success rate.

Enterprise databases like Oracle and IBM are better positioned here, with existing guardrails like Database Firewall and Guardium that can anomaly-detect rogue agent queries. But for most open-source databases, MCP is a security incident waiting to happen.

Vector Databases: The Feature That Became a Company (And Is Returning to Being a Feature)

Remember when vector databases were going to be the next big category? 2025 proved that wrong. Pinecone replaced its CEO in September to prepare for an acquisition that hasn’t materialized. Meanwhile, PostgreSQL’s pgvector extension became good enough that most teams stopped considering separate vector stores.

The pattern is clear: specialized databases only survive when the primary database can’t or won’t absorb their functionality. Full-text search survived as Elasticsearch because PostgreSQL’s built-in search was adequate but not excellent. Time-series survived as TimescaleDB because PostgreSQL’s partitioning wasn’t purpose-built. But vector search? It’s just another index type, and PostgreSQL is very good at adding new index types.

The acquisitions tell the story: Databricks bought MosaicML for model training, not Pinecone for vector storage. Snowflake built vector search into their core engine. The standalone vector database is a temporary market inefficiency that’s being corrected.

File Format Wars: The Battle for Data Interoperability

While everyone obsessed over query engines, a quieter war raged over file formats. Parquet, dominant since 2013, faced five new challengers in 2025: F3, Vortex, FastLanes, AnyBlox, and Amudai. The problem they’re solving isn’t that Parquet is bad, it’s that the Parquet ecosystem is fragmented.

Our analysis found that 94% of Parquet files in the wild use only v1 features from 2013, even when created after 2020. The issue isn’t the format, it’s the dozen partially-compatible reader/writer libraries in different languages. Creating a file with v2 features is a gamble on whether downstream tools can read it.

The new formats take different approaches. F3, developed with input from Wes McKinney (Pandas creator), embeds WASM decoders directly in files, guaranteeing any tool can read any file at the cost of performance. AnyBlox won the VLDB Best Paper award by generating a single WASM program per file. Vortex, donated to the Linux Foundation by SpiralDB, focuses on GPU acceleration.

The irony? This is exactly the fragmentation problem Parquet solved in 2013. We’re reinventing the wheel because the wheel’s spec got too complicated. Expect most of these formats to die quietly, with Parquet slowly modernizing its tooling rather than being replaced.

The Commoditization Death Spiral

The most telling trend in 2025 was the commoditization of OLAP engines. Modern systems are so fast that performance differences for basic operations (scans, joins) are negligible. Differentiation has shifted to user experience and query planner quality, things that are hard to benchmark but easy to feel.

This is why we saw the OLAP acquisition spree: Quickwit to Datadog, HeavyDB to Nvidia, ClickHouse raising $350M. The engines themselves are becoming commodities, the value is in the integration, monitoring, and ecosystem.

For data architects, this means your technology choices matter less than your data modeling and query design. A well-tuned query on a mediocre database beats a poorly-tuned query on the best database. The skill set is shifting from database administration to query optimization and data modeling.

Legal Fights and Ecosystem Wars

MongoDB’s lawsuit against FerretDB reached federal court in 2025, and it’s a masterclass in why naming matters. MongoDB alleges FerretDB (originally named “MangoDB”, yes, just one letter different) infringes patents, copyrights, and trademarks. The “drop-in replacement” claims without authorization are the core issue.

The historical irony is thick. MongoDB’s filing claims they “pioneered the development of ‘non-relational’ databases”, which is laughable to anyone who’s heard of IMS (1966) or Versant (1988). And the name “MangoDB” wasn’t even original, there’s already a parody DBMS called MangoDB that writes everything to /dev/null.

Meanwhile, Microsoft donated its MongoDB-compatible DocumentDB to the Linux Foundation, using language similar to what MongoDB is suing FerretDB over. The legal outcome will shape whether protocol compatibility is fair use or infringement, a question that also echoes Oracle’s failed lawsuit against Google over Java APIs.

What Actually Died in 2025

The database graveyard got seven new headstones:

Fauna: Shut down its strongly-consistent, deterministically-controlled database in May. The technology was interesting, but the proprietary query language and GraphQL bet killed adoption.
PostgresML: Couldn’t convince users to migrate to their hosted platform for ML operations. One co-founder joined Anthropic, the other built PgDog.
Hydra: The DuckDB-inside-Postgres startup quietly folded as co-founders scattered.
MyScaleDB: A ClickHouse fork for vector search that couldn’t compete with native ClickHouse features.
Voltron Data: The $110M supergroup of GPU database engineers (RAPIDS, Pandas, BlazingSQL) that built Theseus but couldn’t ship. More evidence that GPU-accelerated databases remain a niche.
Apache Derby: Entered read-only mode in October after 28 years. Nobody noticed.
Couchbase: Got taken private by Haveli Investments after failing to compete with MongoDB.

The pattern is clear: if you’re building a specialized database, you need a moat deeper than “we’re faster on specific workloads.” The gravitational pull of PostgreSQL is too strong.

The Oracle Sideshow

Larry Ellison became the world’s richest person in September when Oracle stock jumped 40%, making him worth $393 billion, more than Rockefeller or Carnegie adjusted for inflation. Then he lost $130 billion in two months when the stock corrected. For data architects, the lesson isn’t about wealth fluctuation, it’s that databases are infrastructure, and infrastructure moves slowly. Oracle’s involvement in TikTok’s US acquisition and Paramount’s Warner Bros bid shows that database companies become holding companies for data assets.

But the real story is how Oracle’s database business is being propped up by AI data center deals while the core product receives incremental updates. The innovation isn’t in Redwood Shores anymore.

What’s Next: 2026 Predictions

If 2025 was about consolidation, 2026 will be about integration:

PostgreSQL will absorb vector search, time-series, and full-text search into core, making most extensions obsolete. The question is whether the community can maintain quality or if we’ll get a bloated mess.
MCP will have its first major security breach when an agent drops a production database. This will trigger a wave of MCP security proxies and the realization that standardizing access without standardizing authorization was a mistake.
The distributed PostgreSQL projects will merge or die. Multigres, Neki, and PgDog can’t all survive. Expect either a unified project or a retreat to single-master architectures.
File format innovation will stall as Parquet’s new governance model (kicked off by the 2025 competition) slowly addresses interoperability issues. F3 might survive as a research project.
Query planners become the new battlefield. As storage engines commoditize, the intelligence to rewrite queries and optimize across heterogeneous data sources will differentiate systems.

For data architects, the playbook is clear: bet on PostgreSQL compatibility, treat specialized databases as temporary, and focus your skills on data modeling and query optimization. The database wars aren’t over, but the victor is increasingly obvious.