storing files in database

The File Storage Heresy: Why Your Database Might Be Your Best File System

The controversial practice of storing files directly in SQL databases isn’t the cardinal sin developers claim. New research reveals when database storage actually beats traditional file systems, and why the ‘best practice’ might be wrong.

by Andre Banandre

The developer forums have spoken: storing files in your database is a “cardinal sin.” But what if that dogma is built on outdated assumptions and ignores critical context? Recent systematic benchmarking of PostgreSQL versus Azure Blob Storage reveals a more nuanced reality, one where database storage not only holds its own but decisively wins in specific scenarios that define modern application development.

The “Sin” That Started a Holy War

The controversy erupted when a developer working on a small Postgres/.NET project questioned why storing 1-10MB files directly in the database was considered heresy. With only three concurrent users, the theoretical arguments against database storage, scalability nightmares, backup complexity, performance degradation, felt disconnected from reality.

“I get that you might cause some overhead from having to go through another layer (the DB) to stream the content, but I feel like unless your application has a huge number of concurrent users streaming giant files, any reasonable modern server should handle this with ease.”, faze_fazebook, Reddit

The responses fell into two camps: the purists who insisted databases are for data, filesystems are for files, and the pragmatists who asked the crucial question: “How big are your files?”

The File Size Threshold That Changes Everything

Our benchmarking reveals a clear performance crossover point that demolishes one-size-fits-all advice. For files under 100KB, database storage demonstrates 65-83% better throughput than object storage. A 1KB file uploads at 0.11 MB/s through PostgreSQL versus 0.06 MB/s through Azure Blob Storage. The gap narrows but persists: even at 5MB, database storage maintains a 29% advantage.

File Size Database Storage (MB/s) Object Storage (MB/s) Winner
1 KB 0.11 0.06 DB (+83%)
100 KB 11.48 6.97 DB (+65%)
1 MB 76.92 48.78 DB (+58%)
5 MB 136.98 106.38 DB (+29%)

This flies in the face of conventional wisdom that databases “don’t scale” for file storage. The reality: modern databases scale just fine for the file sizes most applications actually use. Profile pictures, configuration files, thumbnails, small PDFs, these represent the majority of user-generated content in most systems.

The Hardware Reality Check

The performance difference isn’t accidental, it’s rooted in hardware architecture. PostgreSQL’s page-based storage (8KB pages) with TOAST (The Oversized-Attribute Storage Technique) automatically optimizes large values. For small files, this means metadata and data live together in the same buffer pool, creating cache locality that object storage can’t match.

But the real kicker is write amplification. Our measurements show:

  • Database storage: 1.2x-2.0x write amplification (WAL logging + page updates)
  • Object storage: 1.1x-1.5x write amplification (filesystem overhead)

For large files (>1MB), object storage’s sequential write patterns align better with SSD characteristics, reducing amplification. Yet for small files, the difference is negligible, while the network overhead of a separate object storage service becomes significant.

When Object Storage Actually Wins

Let’s be clear: object storage dominates specific scenarios. For files exceeding 1MB, especially in high-throughput environments, Azure Blob Storage’s architecture shines. A 275TB database storing files directly becomes “quite unmanageable”, as one engineer discovered. The operational burden of backups, replication, and maintenance at that scale favors specialized storage.

Object storage also wins on:
Geographic distribution: CDN integration for global users
Write-heavy large file workloads: Sequential writes reduce SSD wear
Independent scaling: Metadata and data scale separately
Cost at scale: Pay-as-you-go models beat dedicated database infrastructure

The Backup Complexity Myth

One of the most cited arguments against database storage is backup complexity. The claim: “Full backups of the system become more complicated.” The reality? It’s complicated.

Database storage backups are monolithic but atomic: one pg_dump captures everything. Simple, but increasingly slow as size grows.

Object storage backups require coordinating metadata (database) and blobs (storage). More complex, but allows incremental backups and parallelization. For a 10TB dataset, this difference could mean hours versus days.

The controversy isn’t about which is simpler, it’s about which complexity tradeoff you prefer: operational simplicity versus backup flexibility.

The Real-World Context: Azerbaijan’s Infrastructure

In Azerbaijan, where data localization requirements mandate local storage and international cloud latency ranges from 50-200ms, these tradeoffs become critical. An e-government platform storing citizen documents can’t tolerate the latency of cross-border object storage calls for small files. Database storage provides transactional consistency and local performance that object storage can’t match without significant infrastructure investment.

For a media platform serving global users, the calculus reverses: CDN-backed object storage delivers better performance and lower costs, making the operational complexity worthwhile.

Decision Framework: Stop Treating It as Dogma

Here’s the controversial take: The “best practice” depends entirely on your context, not on absolute rules.

Choose Database Storage When:

  • Files are predominantly <100KB (avatars, thumbnails, configs)
  • Transactional consistency is non-negotiable (financial docs, medical records)
  • Operational simplicity outweighs scalability concerns (<100GB total)
  • Low latency for small files is critical
  • Your team has stronger database expertise than distributed systems skills

Choose Object Storage When:

  • Files are predominantly >1MB (videos, high-res images, archives)
  • Scale exceeds 1TB or millions of files
  • Geographic distribution requires CDN
  • Write-heavy large file workloads dominate
  • Your team can manage multi-system complexity

The Hybrid Heresy

The most pragmatic solution? Store small files in the database, large files in object storage. This “heresy” gives you the best of both worlds: transactional consistency for metadata and small files, scalable storage for large files, and a unified API that abstracts the complexity.

public interface IFileStorageService
{
    Task<FileMetadata> UploadAsync(Stream fileStream, string filename, string contentType);
    // Implementation routes based on file size
    // < 100KB -> PostgreSQL BYTEA
    // > 100KB -> Azure Blob Storage
}

This approach is “controversial” because it violates the purity of both camps. But purity doesn’t ship products. Pragmatism does.

The Performance Data That Defies Dogma

Our systematic benchmarking with identical workloads reveals:

Upload Latency (P50):
– Database: 41ms
– Object Storage: 67ms

CPU Utilization:
– Database: 9.17% (includes WAL processing)
– Object Storage: 1.00% (network overhead only)

Memory Usage:
– Database: 207.5MB (buffer pool for data + metadata)
– Object Storage: 149.1MB (metadata only)

For small files, database storage is faster, more cache-efficient, and simpler. The overhead of WAL logging is offset by eliminating network round trips to object storage. For large files, the reverse is true, write amplification and memory pressure make object storage superior.

Why This Matters for AI Applications

Modern AI systems generate and consume massive numbers of small artifacts: model weights, embeddings, configuration files, and intermediate results. Storing these in object storage creates unnecessary latency for model loading and configuration access. Database storage provides atomic updates for model versions and transactional consistency for distributed training runs.

Yet the same AI systems also generate gigabyte-sized model checkpoints and datasets, perfect for object storage. The “sin” isn’t using one approach exclusively, it’s failing to architect a system that uses both appropriately.

The Bottom Line

The file storage debate reveals a deeper truth in software engineering: dogma is the enemy of good architecture. The “cardinal sin” of storing files in databases is only sinful if you ignore context.

The research is clear: for small files in low-to-medium scale applications, database storage is not just acceptable, it’s superior. For large files at scale, object storage is not just preferable, it’s necessary.

The real best practice? Measure your workload, understand your constraints, and choose the tool that fits, not the one that satisfies dogma.

Your database might be your best file system. And that’s okay.


Explore the full research: Database vs Object Storage: Performance, Reliability, and System Design

Reproduce the benchmarks: GitHub Repository

Learn about storage types: AWS Block vs Object vs File Storage

Related Articles