A developer building an internal CRM recently got told by their manager to shove all activity logs into MongoDB. The reasoning? MySQL allegedly buckles under large datasets and crashes. So now they’re juggling a relational database for users and a document store for logs, doubling their operational surface area because of a myth that refuses to die.
This isn’t an isolated incident. The “MySQL can’t scale” narrative has become a lazy default in tech meetings, often thrown around by people who’ve never watched a properly tuned MySQL instance chew through billions of rows without breaking a sweat. The real problem isn’t the database engine, it’s that we’re blaming the tool for what is fundamentally an architecture and understanding gap.
The 34-Billion-Row Elephant in the Room
Let’s put this myth to bed immediately. Production MySQL deployments routinely handle tables with billions of rows. One engineering team reported tables sitting comfortably at 34 billion rows. That’s not a typo. Thirty-four billion. These aren’t edge cases or Silicon Valley unicorns with infinite DBA budgets, they’re regular organizations that bothered to understand their data patterns before reaching for a different database.
The difference between “crashes” and “scales” rarely comes down to MySQL versus MongoDB. It comes down to whether you’ve designed your schema for your actual query patterns, whether your indexes match your access paths, and whether you’re trying to use a transactional row store as a time-series analytics engine.
When MySQL Actually Struggles (And Why It’s Usually Your Fault)
MySQL has real limitations, but they’re specific and predictable. The engine can bog down when you hit certain architectural anti-patterns:
1. The Single-Table Dumpster Fire
Shoving unstructured logs into a single activity_logs table with a JSON blob column, no partitioning, and a primary key on id is a recipe for pain. Your queries scan millions of rows to find yesterday’s errors. MySQL isn’t crashing, it’s doing exactly what you asked it to do, which is a full table scan on a monolith. The problem is your schema, not the storage engine.
2. The Index-Everything Panic
Throwing indexes at every column is like trying to speed up a highway by adding on-ramps every 10 feet. You end up with write throughput that grinds to a halt as MySQL updates 15 indexes per insert. Meanwhile, your storage balloons and your buffer pool can’t hold the working set. Again, not a crash, just physics.
3. The Time-Series Blind Spot
Storing high-frequency metrics or logs in a normalized transactional schema is like using a screwdriver as a hammer. MySQL can do it, but you’re fighting the tool. The database spends more time in lock contention and I/O waits than serving queries. This is where the myth gains traction, people try to fit square pegs into round holes, then blame the hole when it doesn’t work.
The MongoDB Misdirection
MongoDB gets pitched as the automatic solution for “large data” because its document model feels more forgiving. You can toss in arbitrary JSON without defining a schema. For activity logs, this is superficially attractive, just dump the payload and worry about it later.
But here’s what the manager’s recommendation misses: operational complexity has a cost. Running two database systems means two sets of connection pools, two backup strategies, two monitoring dashboards, two failure modes, and two expertise silos on your team. That “simple” decision to avoid a schema discussion just shifted the complexity from design time to runtime, where it’s more expensive to fix.
MongoDB shines when your data access patterns are document-centric, when you need to fetch a complex object graph in one shot, or when your schema evolves so rapidly that migrations become a bottleneck. Activity logs rarely fit this pattern. They’re append-only, time-ordered streams that you typically query by timeframe and source. That’s exactly what a partitioned relational table or a proper time-series database is built for.
The Real Decision Framework
Before you let anyone declare “MySQL can’t handle it”, force these questions into the conversation:
What are the actual use cases?
Are these logs for regulatory compliance, internal debugging, or customer-facing analytics? Compliance data might need the transactional guarantees MySQL provides. Debugging logs might have a 30-day TTL that makes truncation trivial. Each use case demands different storage tradeoffs.
What are the query patterns?
If you’re always asking “what happened between time X and Y for user Z”, that’s a range scan on a composite index, bread and butter for MySQL. If you’re asking “show me all logs where payload.user.preferences.theme equals ‘dark'”, then maybe a document store makes sense. But be honest: you probably don’t need that flexibility.
What’s the retention policy?
The 34-billion-row table didn’t stay that way forever. Smart teams implement rotation, archival to object storage, and downsampling. If you only need hot data for 7 days, MySQL’s performance characteristics are irrelevant beyond that window. A simple partition drop beats a database migration.
What’s the team’s operational expertise?
The Reddit discussion hit on a critical point: introducing MongoDB means someone has to understand MongoDB. In production. At 3 AM. When the OOM killer strikes. If your team has deep MySQL knowledge and shallow NoSQL experience, you’re not just adding complexity, you’re adding risk.
When NoSQL Actually Makes Sense
Let’s be balanced. There are legitimate reasons to reach for MongoDB or another NoSQL store:
- Rapid schema iteration: If your log structure changes weekly and migrations are blocking deployments, a schemaless approach can be pragmatic.
- Horizontal write scaling: When you need to ingest millions of events per second across geographic regions, MongoDB’s sharding model can outpace a single MySQL instance.
- Complex nested queries: If your primary use case is slicing and dicing deeply nested JSON without predefined access patterns, document stores index that structure more naturally.
But these are specific architectural requirements, not vague fears about size. The decision should be data-driven, not dogma-driven.
The Hybrid Trap
The original Reddit post mentions a “hybrid database” approach, MySQL for users, MongoDB for logs. This can work, but it’s a pattern you should resent, not celebrate. Every cross-database join becomes an application-level concern. Consistency guarantees go out the window. Your backup and recovery story turns into a distributed systems problem.
Before defaulting to hybrid, exhaust the options within your primary database. MySQL’s JSON columns can store flexible payloads while keeping the core structure relational. Partitioning can solve the size problem. Archive tables can handle cold data. Only when you’ve proven these patterns insufficient should you add a second system.
The Spicy Take
Here’s the uncomfortable truth: most “MySQL can’t scale” claims are cover for “I don’t want to think about data modeling.” MongoDB feels easier because you can punt on hard decisions about schema, indexes, and access patterns. But that deferred complexity accrues interest, and eventually you pay it back in production incidents, slow queries, and architectural gymnastics.
The 34-billion-row table isn’t a unicorn, it’s a consequence of deliberate design. Partitioning, pruning, indexing, and understanding your query patterns aren’t optional exercises. They’re the price of admission for any database at scale, relational or otherwise.
So next time someone says “MySQL crashes under large data”, ask them which specific limitation they’re hitting. Is it the maximum table size (64TB)? The row size limit (65,535 bytes)? The index cardinality? Or is it that they tried to SELECT * FROM logs WHERE timestamp > NOW() - INTERVAL 1 DAY on an unindexed 500GB table and called it a crash?
Final Word: Measure, Then Migrate
If you’re facing pressure to abandon MySQL for scale reasons, do the homework:
- Profile your queries: Use
EXPLAIN ANALYZEand the slow query log. Find the actual bottlenecks. - Model your data: Sketch your access patterns. See if your schema matches your queries.
- Benchmark alternatives: Load test MongoDB or PostgreSQL with your actual workload. Compare apples to apples.
- Calculate the operational cost: Factor in monitoring, backups, and team training.
Only when the data shows a fundamental mismatch, not when a manager repeats a myth, should you switch. The best database isn’t the one that scales infinitely, it’s the one your team can operate effectively while serving your specific use case.
MySQL isn’t crashing. Your architecture is just having a difficult conversation with you. Listen to it.




