The Thunderbolt NVMe drive strapped to your laptop can push 5 GB/sec. Your 10 Gbps network link hums along at 1 GB/sec. Yet there you are, watching rsync plod through a 60 GiB project folder like it’s 1998 and you’re dialing into a BBS. Eight minutes. Eight full minutes to sync what should take sixty seconds. The bottleneck isn’t your hardware, it’s an architectural decision made decades ago that we’re still cargo-culting today.
Jeff Geerling’s recent benchmark crystallized what many of us suspected but never bothered to measure: rclone outperforms rsync by roughly 4x on modern networks, not through incremental tweaks but by treating file synchronization as a first-class distributed systems problem. The implications stretch far beyond which command you alias in your shell.
The Serial Transfer Tax
Rsync’s brilliance has always been its delta-transfer algorithm. Send only the changed blocks. Compute rolling checksums. Minimize bits over the wire. It’s elegant, it’s proven, and it’s fundamentally single-threaded. When Geerling ran his standard sync across 3,564 files, 122 of them needing transfer, rsync processed them sequentially, one file at a time. The result? 8 minutes, 17 seconds. His network share peaked at 350 MB/sec, a fraction of its 1 GB/sec capacity.
Here’s the kicker: even for large files, rsync’s single-stream architecture can’t saturate a modern pipe. It’s not just about concurrency across files, it’s about failing to exploit the parallelism inherent in high-bandwidth, high-latency networks. The command looks innocent enough:
rsync -au --progress --stats /Volumes/mercury/* /Volumes/Shuttle/Video_Projects
But behind the scenes, it’s a lone worker moving boxes one at a time while a fleet of empty trucks idle in the loading dock. The 9.155 seconds spent generating the file list is comparable to rclone’s overhead. The transfer itself? Glacial.
This isn’t a bug. It’s a design philosophy rooted in rsync’s origin: optimizing for CPU and bandwidth-constrained environments where the delta algorithm’s efficiency mattered more than parallel I/O. That world is gone. Today’s constraints are different: object storage APIs, geographically distributed teams, and network pipes so wide that the primary challenge is filling them, not conserving them.
Concurrency as a Core Primitive
Rclone’s approach flips the script. Instead of “how do we minimize data transfer?” it asks “how do we maximize pipeline utilization?” The answer is brutally simple: transfer files in parallel. Geerling’s rclone configuration uses --multi-thread-streams=32, creating 32 concurrent transfer workers. The same 58.625 GiB sync completes in 2 minutes, 15 seconds, saturating his 10 Gbps link at 1 GB/sec.
rclone sync \
--exclude='**/._*' \
--exclude='.fcpcache/**' \
--multi-thread-streams=32 \
-P -L --metadata \
/Volumes/mercury/ /Volumes/Shuttle/Video_Projects
The architecture resembles a well-designed data pipeline more than a traditional file copy utility. Each worker operates independently. Failed transfers don’t block the queue. The system adapts to network conditions dynamically. This is the same thinking that powers S3’s multipart uploads, Kafka’s partition parallelism, and every modern distributed database worth its salt.
But rclone’s genius isn’t just concurrency, it’s protocol awareness. While rsync speaks SSH and SMB with a single connection per session, rclone natively understands cloud storage semantics: multipart uploads, resumable transfers, API rate limiting, and per-object metadata. It’s not wrapping legacy protocols, it’s speaking the language of distributed storage directly.
The Cache Backpressure Problem: When Distributed Systems Get Real
Parallelism solves one problem but introduces others. A recent rclone forum thread exposed a classic distributed systems challenge: backpressure under resource constraints. A user building a remote-backed NAS on a Raspberry Pi 3B+ with a 128GB USB cache discovered that rclone’s VFS cache has a soft limit (--vfs-cache-max-size) that can be exceeded during bulk uploads to protect in-flight transfers. When the cache hit a hard disk quota, writes failed with I/O errors instead of throttling gracefully.
The test setup was telling: 444MB source data, 20MB soft cache limit, 50MB hard disk quota. During an rsync into the rclone mount, the cache grew beyond its soft limit as expected, but when it breached 50MB, the system crashed rather than applying backpressure. The user proposed a rate-limiting mechanism to slow writes when approaching capacity, a standard pattern in streaming systems like Akka Streams or reactive pipelines.
The rclone maintainer’s response was instructive: this is a known issue tracked on GitHub, and the solution requires implementing flow control at the FUSE layer. Another user actually submitted a pull request implementing retry logic with exponential backoff when disk quotas are exceeded. This is distributed systems engineering in the open: identify failure modes, design resilience patterns, iterate.
This matters because it reveals rclone’s architectural maturity. It’s not just a faster rsync, it’s a system designed to operate in environments where partial failure is normal. The cache backpressure problem is the same challenge that causes Kafka consumers to lag under overload or Cassandra compactions to stall under disk pressure. Rclone lives in that world. Rsync doesn’t.
Network-Awareness as a Design Pattern
The broader lesson is that file synchronization at scale is a network-aware distributed systems problem, not a shell script problem. The pattern rclone embodies looks like this:
- Decompose work into independent units (files or chunks)
- Parallelize across available bandwidth (multi-threaded streams)
- Handle failures as first-class events (retry logic, partial transfers)
- Adapt to protocol constraints (API rate limits, multipart requirements)
- Manage local resources with backpressure (cache limits, disk quotas)
This is the same architecture that makes modern data pipelines reliable. Contrast this with rsync’s mental model: a single TCP stream, a single process, failure means start over. It works brilliantly for its original use case, incremental backups over slow links, but breaks down when the problem space shifts to hybrid cloud, edge caching, and multi-region replication.
The performance gap isn’t accidental. It reflects a fundamental shift in how we think about data movement. In the monolithic data center era, rsync’s assumptions held. In today’s distributed, API-driven, bandwidth-abundant world, they don’t.
When “Good Enough” Becomes a Liability
The seductive trap is thinking rsync is “good enough.” For many workflows, it is. Geerling notes that for small changes, rsync and rclone perform identically because metadata scanning dominates. The 18-second directory tree walk is the same for both tools. The difference emerges when you’re moving gigabytes of fresh data, exactly the scenario that defines modern media workflows, ML training data distribution, and cross-region backups.
This is where avoiding over-engineering in distributed systems design becomes a false economy. Choosing rsync because it’s familiar and “works fine” is like choosing a single-threaded web server because it handles your current request volume. The moment you need to scale, you’re not just swapping a tool, you’re rearchitecting a process.
The real cost isn’t the 6-minute wait. It’s the cognitive overhead of maintaining two systems: rsync for small changes, manual intervention for large ones. It’s the silent constraint on your workflow design, the hesitation before ingesting a 4K video project because you know the sync will take half your morning. These micro-frictions compound into architectural drag.
Beyond File Sync: Architectural Parallels
Rclone’s design DNA shows up in unexpected places. Meta’s adoption of the Steam Deck’s CPU scheduler for their million-server fleet is another example of unconventional but highly effective infrastructure optimizations at scale. Both cases reject the “good enough” incumbent in favor of architectures that treat resource constraints as dynamic variables to optimize, not static assumptions to accept.
Similarly, the debate mirrors architectural decisions that scale versus perceived database limitations. MySQL doesn’t crash at scale, single-threaded queries and lock contention do. The solution isn’t always a new database, it’s often redesigning how you interact with the existing one. Rsync isn’t slow, its single-threaded, serial transfer model is.
Even the cache backpressure discussion ties into runtime configuration rigor in reliable distributed systems. The difference between a soft limit and a hard limit isn’t a documentation footnote, it’s the difference between graceful degradation and catastrophic failure. Treating configuration as code means encoding these tradeoffs explicitly, not discovering them during a 2 AM page.
The Cloud-Native Imperative
Rclone’s advantage compounds in cloud-native environments. Its native support for S3, GCS, Azure Blob, and dozens of other providers isn’t just convenience, it’s architectural alignment. When you rclone sync to S3, it uses multipart uploads automatically, splitting large files into parallel chunks. It respects provider-specific rate limits. It handles eventual consistency gracefully.
Rsync over SSH to an EC2 instance, then aws s3 cp to bucket? That’s two architectural paradigms smashed together, each fighting the other. The SSH connection is a long-lived stateful pipe. The S3 upload is stateless and parallelized. The mismatch creates friction: temporary files, partial uploads on failure, manual cleanup scripts.
This is why evaluating legacy data orchestration tools versus modern needs is critical. Airflow wasn’t designed for Kubernetes-native, event-driven workflows. Rsync wasn’t designed for object storage. You can bolt them on, but the impedance mismatch shows in performance, reliability, and operational complexity.
Tradeoffs, Not Religion
None of this makes rsync obsolete. For incremental backups over VPNs, for syncing config files to 1,000 servers via Ansible, for any workflow where delta efficiency trumps bandwidth saturation, rsync remains unmatched. Its ubiquity is a feature. Every *nix box has it. Its delta algorithm is still brilliant for compressing changes.
The mistake is treating tool selection as tribal affiliation rather than architectural decision-making. Use rsync when the problem matches its assumptions: many small changes, bandwidth-constrained links, CPU-efficient delta calculation. Use rclone when the problem shifts: large files, abundant bandwidth, cloud APIs, parallel I/O.
The 4x performance gap isn’t a victory lap, it’s a case study in how automation in system design can undermine architectural integrity when we automate the wrong abstraction. Wrapping rsync in a for loop to parallelize it is a bash script, not a solution. Rclone’s concurrency is designed into its core, with proper error handling, progress reporting, and resource management.
Design for the Network You Have, Not the Network You Wish You Had
The real insight from Geerling’s benchmark isn’t that rclone is faster. It’s that network-aware design patterns beat algorithmic optimization when the network is the bottleneck. Rsync’s delta algorithm is mathematically elegant but network-naive. Rclone’s parallel transfers are algorithmically simple but network-native.
This pattern repeats across our stack. Database query planners that don’t account for network round trips. Microservices that chatty API calls. ML training pipelines that move data serially. The common thread is designing for local constraints while ignoring network reality.
File sync is just the canary. The same thinking that makes rclone 4x faster makes S3 outperform your NAS, makes gRPC beat REST for high-throughput APIs, makes Kafka outrun message queues. It’s distributed systems design, and it’s eating the world one protocol at a time.
So the next time you’re waiting for a sync to complete, don’t just blame the network. Ask whether your tools understand it. Your 10 Gbps link is a distributed system. Start treating it like one.
