You’ve got a vendor dumping data into your Snowflake account every few hours. You own the database, you own the pipeline, and you need to move that data from a source schema into a landing schema before it eventually lands in a dimensional model for reporting. The question hitting every data engineering forum right now: what’s the right tool for this job?
The answer, predictably, is “it depends.” But the nuance matters more than the platitude. Snowflake offers three credible paths for building a Change Data Capture (CDC) pipeline in a batch-oriented context: Streams, Dynamic Tables, and good old-fashioned Stored Procedures. Each one comes with trade-offs that can make your life either boringly simple or unnecessarily complex.
Let’s break down each approach with the kind of specificity you’d get from a senior engineer who’s already made these mistakes.
The Scenario: Boring, Relatable, and Surprisingly Tricky
Before we pick a winner, let’s ground this in a real situation. You have:
- A source database in Snowflake where a vendor writes data every few hours
- A landing database (same Snowflake account) that acts as a staging area
- A final dimensional model for reporting
The source → landing pipeline runs as batch jobs daily. Crucially, the queries aren’t trivial table copies, they involve joins and aggregations. When you’re loading data from source to landing, it might look something like:
INSERT INTO landing_db.fact_orders (order_id, customer_name, total_amount, last_updated)
SELECT o.order_id, c.customer_name, o.total_amount, o.last_updated
FROM source_db.orders o
INNER JOIN source_db.customers c ON o.customer_id = c.customer_id
WHERE o.last_updated > '2026-06-02';
The timestamp filter comes from a job control table that tracks the last successful pipeline run. This is the classic incremental load pattern.
Now, about those table sizes: for a typical mid-market scenario, most source tables are small, under 100,000 records. Think dimension tables. But there are usually 2-3 fact-like tables pushing 5, 20 million records. That’s the difference between a pipeline that costs you pocket change and one that eats your warehouse budget alive.
Option 1: Streams, The Elegant Workhorse
Snowflake Streams are the most purpose-built tool for CDC in this environment. They track insert, update, and delete operations on a table or view since the last offset was consumed. Their biggest advantage? They handle exactly-once semantics without you writing offset-tracking logic.
The catch with your scenario is the joins. Streams operate on base tables or views. If your CDC query requires joining source_db.orders to source_db.customers, you can’t put a stream directly on a join.
Here’s the pattern the community has landed on:
- Create a view that encapsulates the join logic
- Create a stream on the view
- Consume the stream in a task to load the landing table
-- Step 1: Create the view
CREATE OR REPLACE VIEW source_db.v_orders_with_customers AS
SELECT o.order_id, c.customer_name, o.total_amount, o.last_updated
FROM source_db.orders o
INNER JOIN source_db.customers c ON o.customer_id = c.customer_id;
-- Step 2: Create the stream
CREATE OR REPLACE STREAM source_db.stream_orders ON VIEW source_db.v_orders_with_customers
SHOW_INITIAL = TRUE; -- Captures existing data on creation
-- Step 3: Merge into landing table
MERGE INTO landing_db.fact_orders tgt
USING (
SELECT order_id, customer_name, total_amount, last_updated
FROM source_db.stream_orders
WHERE METADATA$ACTION IN ('INSERT', 'UPDATE')
) src
ON tgt.order_id = src.order_id
WHEN MATCHED THEN UPDATE SET
tgt.customer_name = src.customer_name,
tgt.total_amount = src.total_amount,
tgt.last_updated = src.last_updated
WHEN NOT MATCHED THEN INSERT (order_id, customer_name, total_amount, last_updated)
VALUES (src.order_id, src.customer_name, src.total_amount, src.last_updated);

The SHOW_INITIAL = TRUE flag is crucial, it seeds the stream with existing data so you don’t need a separate full-load step. For tables under 100K records, this works beautifully. For those 5-20M fact tables, the initial load will cost you some credits, but you’ll only pay that once.
When Streams win: You need reliable, incremental CDC with minimal code, and you’re okay with managing one view + one stream per pipeline.
When Streams hurt: You now have N views, N streams, and N tasks to manage for 15 tables. The object count adds cognitive overhead. Also, streams have a 14-day retention window by default, if your pipeline fails for two weeks, you’re doing a full re-load.
Option 2: Dynamic Tables, The “Set It and Forget It” Gambit
Dynamic Tables (DTs) are Snowflake’s declarative approach to data transformation. You define what you want, and Snowflake figures out how to incrementally refresh it. Sounds magical. The reality is more nuanced.
For your scenario, a Dynamic Table could look like this:
CREATE OR REPLACE DYNAMIC TABLE landing_db.dt_fact_orders
TARGET_LAG = '1 day'
WAREHOUSE = my_wh
REFRESH_MODE = FULL
AS
SELECT o.order_id, c.customer_name, o.total_amount, o.last_updated
FROM source_db.orders o
INNER JOIN source_db.customers c ON o.customer_id = c.customer_id;
The REFRESH_MODE = FULL is a key detail. Because your source data changes (vendor inserts/updates), a full refresh mode recomputes the entire result set. That’s fine for dimension tables under 100K records. It’s expensive for your 20M-record fact table.
There’s a better approach: use a separate Dynamic Table for the landing layer, then handle the incremental logic in your MERGE to the final dimension model.
-- Landing DT with full refresh (simple, but costly for large tables)
CREATE OR REPLACE DYNAMIC TABLE landing_db.dt_fact_orders
TARGET_LAG = '1 day'
WAREHOUSE = my_wh
REFRESH_MODE = FULL
AS SELECT * FROM source_db.orders;
-- Then in the final layer, incremental MERGE using timestamps
MERGE INTO final_db.dim_orders tgt
USING (
SELECT * FROM landing_db.dt_fact_orders
WHERE last_updated > (SELECT max_last_run FROM job_control)
) src
ON tgt.order_id = src.order_id
WHEN MATCHED THEN UPDATE ...
WHEN NOT MATCHED THEN INSERT ...;
This two-step approach is actually the most common pattern in production. The Dynamic Table handles the source→landing heavy lifting, and your manual MERGE handles the landing→final incremental logic.
When Dynamic Tables win: You want minimal code and Snowflake to manage refresh scheduling. Great for small dimension tables and teams that don’t want to build a scheduler.
When Dynamic Tables hurt: FULL refresh mode on large tables will cost you. Incremental refresh mode is more efficient but has limitations, it can’t handle certain complex transformations or source table DDL changes gracefully. Also, you’re paying for compute on every refresh cycle, whether there’s new data or not.
Option 3: Stored Procedures, The “I’ll Do It Myself” Approach
The most flexible and arguably most dangerous option. Writing a stored procedure gives you complete control over the pipeline logic, error handling, and scheduling. It also gives you enough rope to hang yourself.
A basic pattern looks like this:
CREATE OR REPLACE PROCEDURE load_fact_orders()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
last_run TIMESTAMP;
affected_rows INTEGER;
BEGIN
-- Get last successful run
SELECT COALESCE(MAX(last_successful_run), '2026-01-01')
INTO last_run
FROM job_control
WHERE job_name = 'load_fact_orders';
-- Perform the incremental load
MERGE INTO landing_db.fact_orders tgt
USING (
SELECT o.order_id, c.customer_name, o.total_amount, o.last_updated
FROM source_db.orders o
INNER JOIN source_db.customers c ON o.customer_id = c.customer_id
WHERE o.last_updated > :last_run
) src
ON tgt.order_id = src.order_id
WHEN MATCHED THEN UPDATE SET
tgt.customer_name = src.customer_name,
tgt.total_amount = src.total_amount,
tgt.last_updated = src.last_updated
WHEN NOT MATCHED THEN INSERT
(order_id, customer_name, total_amount, last_updated)
VALUES (src.order_id, src.customer_name, src.total_amount, src.last_updated);
affected_rows := SQLROWCOUNT;
-- Update job control
INSERT INTO job_control (job_name, last_successful_run, rows_affected)
VALUES ('load_fact_orders', CURRENT_TIMESTAMP(), :affected_rows);
RETURN 'SUCCESS: ' || affected_rows || ' rows processed';
END;
$$;
-- Schedule with a task
CREATE OR REPLACE TASK daily_load_fact_orders
WAREHOUSE = my_wh
SCHEDULE = 'USING CRON 0 2 * * * UTC'
AS
CALL load_fact_orders();
The stored procedure approach gives you:
– Exact control over transaction boundaries and error handling
– Job control tables for observability and restartability
– No Snowflake feature dependencies—this is portable SQL
– Performance optimization—you can add query hints, clustering, or parallel loads
When Stored Procedures win: You have complex business logic, need precise error handling, or want to avoid Snowflake-specific features for portability reasons.
When Stored Procedures hurt: You’re now maintaining code. For 15 tables, that’s 15 stored procedures, 15 tasks, and a job control table. The abstraction tax is real. Also, you’re reinventing wheels that Streams and Dynamic Tables already provide.
The Decision Matrix
Let’s make this concrete. Here’s how these approaches stack up for the most common batch CDC patterns:
| Criteria | Streams + Views | Dynamic Tables | Stored Procedures |
|---|---|---|---|
| Setup complexity | Medium (view + stream + task per table) | Low (one DT per table) | High (proc + task + control table per table) |
| Incremental joins | Natural (stream on view) | Limited (FULL refresh on complex joins) | Full control |
| Cost for small tables (<100K rows) | Low | Low to Medium (depends on refresh mode) | Low |
| Cost for large tables (5M+ rows) | Low (incremental) | High (FULL refresh) | Medium (depends on query efficiency) |
| Failure recovery | Easy (re-consume stream) | Automatic (next refresh) | Manual (update job control timestamp) |
| Object count (15 tables) | ~45 objects (views + streams + tasks) | ~15 objects (DTs) | ~45 objects (procs + tasks + control tables) |
| Observability | Built-in stream metadata | Built-in refresh history | Custom (must build) |
| Portability | Low (Snowflake-specific) | Low (Snowflake-specific) | Medium (standard SQL) |
The DevOps Tax Nobody Talks About
One commenter on data engineering forums pointed out something crucial: object count management is a hidden cost. If you’re maintaining 15 tables, the difference between 15 Dynamic Tables and 45 objects (Streams + Views + Tasks) is not trivial. Each object is a potential failure point, a potential drift issue, and something that needs to be in your Infrastructure as Code (IaC) scripts.
But here’s the counterpoint that Dynamic Tables advocates don’t emphasize enough: Dynamic Tables have a hard time with source schema changes. If your vendor adds a column, Streams handle it gracefully (the new data just appears), while Dynamic Tables might fail on the next refresh depending on configuration. The alternative CDC tool using Snowflake Openflow promises to handle this better, but as of 2026, it’s still on training wheels.
What About CDC vs. Microbatching?
If you’re building this pipeline, you’re implicitly choosing batch CDC over streaming. That’s fine for daily loads, but be aware of the CDC vs microbatching trade-offs relevant to pipeline design. The distinction matters more as your latency requirements tighten. If your vendor ever moves to near-real-time ingestion, the pipeline architecture you choose today will either enable or block that transition.
The Practical Recommendations
After watching teams burn through Snowflake credits and engineering hours, here’s what I’d suggest:
For the 12-13 small dimension tables (<100K rows): Use Dynamic Tables with FULL refresh. The simplicity is worth the marginal compute cost. Set TARGET_LAG = '1 day' and forget about them.
CREATE DYNAMIC TABLE landing_db.dt_dim_customer
TARGET_LAG = '1 day'
WAREHOUSE = small_wh
REFRESH_MODE = FULL
AS SELECT * FROM source_db.customers;
For the 2-3 large fact tables (5M-20M rows): Use Streams on a join view. The incremental behavior will save you significant credits, and the setup cost is worth the long-term savings.
CREATE VIEW source_db.v_fact_orders AS
SELECT o.*, c.customer_name
FROM source_db.orders o
JOIN source_db.customers c ON o.customer_id = c.customer_id;
CREATE STREAM source_db.s_fact_orders ON VIEW source_db.v_fact_orders
SHOW_INITIAL = TRUE;
For anything with complex error handling or multi-step transformation: Use Stored Procedures with proper job control. But only go here if you have a reason, don’t default to it because “that’s how we’ve always done it.”
The Cost Trap You Must Avoid
Every implementation choice has a cost dimension that’s easy to ignore during design but painful during month-end billing. If you choose Dynamic Tables for everything, those 2-3 large tables will spike your compute costs with each full refresh. If you choose Streams for everything, you’ll drown in object management. If you choose stored procedures for everything, you’ve created a maintenance monster.
The cost implications of different CDC implementation strategies are real. That junior engineer who dropped the auto-suspend timeout? Cute. The team that chose FULL refresh on a 50M-row table? That’s how you get a surprise $10K overage.
The Final Verdict
There is no one-size-fits-all answer, but there is a clear pattern for success: mix your approaches based on table characteristics. Use Dynamic Tables for small, simple tables where simplicity outweighs cost. Use Streams for large tables where incremental processing is essential. Use Stored Procedures only when you need custom logic that neither solution handles.
And for the love of all that is holy, think about what happens when you need to move this pipeline out of Snowflake someday. The vendor lock-in risks when building CDC pipelines in Snowflake are not hypothetical. If you ever migrate to Iceberg or a lakehouse, all those Dynamic Tables become dead weight. The operational considerations for lakehouse architectures that complement CDC patterns suggest you should keep your transformation logic as portable as feasible.
The best pipeline is the one that runs quietly, costs predictably, and lets you sleep through the night. Pick the tool that gets you there with the least ceremony.
Now go build something that doesn’t page you at 3 AM.




