tabpfn-scales-to-10-million-rows-a-new-era-for-tabular-data-models

Tabular AI Just Left Gradient Boosting in the Dust, Now What?

TabPFN’s scaling mode now handles 10+ million rows, moving from a research novelty to an enterprise staple. This rapid evolution signals a seismic shift in structured data modeling.

by Andre Banandre

Tabular AI has long been the underperforming cousin to its flashy siblings in computer vision and NLP. While tree-based models like XGBoost and LightGBM ruled the roost on spreadsheets and SQL tables, deep learning approaches struggled to keep up. That stalemate is officially over.

Prior Labs just announced that their TabPFN foundation model now scales to datasets with up to 10 million rows. Not tens of thousands, not hundreds of thousands, millions. In less than a year, they inflated the operational scope by 1,000 times. This isn’t a minor iteration, it’s a fundamental rewrite of what a tabular model can be.

The co-founders behind the breakthrough: Sauraj Gambhir, Frank Hutter, and Noah Hollmann
The co-founders behind the breakthrough: Sauraj Gambhir, Frank Hutter, and Noah Hollmann

One Forward Pass, No Training Required

Let’s be clear about what TabPFN is. It’s a transformer model pre-trained on over 100 million synthetic datasets to perform in-context learning. You give it a few labeled examples, and it outputs a predictive distribution for your test data, in a single forward pass. No hyperparameter tuning, no training loops, and native support for missing values, categorical features, text, and numerical data. This is zero-shot learning for structured data.

For its entire life, TabPFN’s biggest limitation was dataset size. From its origins only handling 1,000 rows, to the TabPFN v2 release in January 2025 supporting “up to 10K samples and 500 features”, the problem was always ceiling.

The Scaling Mode pipeline introduced last week isn’t just an incremental bump. It’s a new architecture that removes the row limit constraint altogether, only bound by your hardware. They have internal benchmarks showing “strong performance on datasets up to 10 million data points.” On that journey, performance gains remained competitive with gradient-boosting libraries CatBoost, XGBoost, and LightGBM, with no evidence of the gap shrinking as they scaled up.

The Inflection Point: From Prototype to Production

The timeline of progress is staggering:
TabPFN v1: Up to 1,000 rows.
TabPFN v2: Up to 10,000 rows.
TabPFN-2.5: Up to 100,000 rows.
Scaling Mode: 10+ million rows and climbing.

From ~10K rows in January 2025 to 10M rows in December 2025 represents a 1,000x increase in dataset size in less than a year. When you can move from spreadsheet-scale to enterprise-database-scale in a single rollout, you cross a critical threshold.

For the first time, you’re not debating whether a foundation model can handle “real” data. Companies like Hitachi are already using TabPFN for predictive maintenance across rail networks. A major financial institution is deploying it across dozens of applications for liquidity management. In some of these deployments, TabPFN is replacing over 100 pre-existing specialized models with a single foundation model.

The official technical report states that Scaling Mode “removes the fixed row limit: the system is designed to work with arbitrarily large training sets, constrained only by your compute and memory.”

The Benchmark Shakeup and the New Competitive Reality

TabPFN-2.5 is currently ranked #1 on TabArena, a living benchmark for machine learning on tabular data. But benchmarks are often run on datasets like those open to academics, the real test is production workloads that often exceed 100K rows. That’s where scaling mode changes the game.

This puts pressure on every other approach. Gradient boosting libraries don’t magically get faster or simpler as data grows. They require painstaking feature engineering, hyperparameter tuning (often via computationally expensive methods), and demand expertise to deploy at scale.

Imagine a new hire on your data team can now point a pre-trained foundation model at a 5-million-row dataset of customer transactions and get predictions that are competitive with a team of engineers running a month-long LightGBM tuning project. The economics of that shift are profound.

Skepticism De-Bunked: Does Scaling Actually Work?

Any engineer worth their salt will ask: “What’s the catch?” The primary one was scaling, now addressed. Another is explainability (good luck explaining a 100M-parameter transformer). But the stance from the research community is clear: “The development of tabular foundation models (TFMs) has accelerated in recent years, showing strong potential to outperform traditional ML methods for structured data.”

The recent paper “Robust Tabular Foundation Models” applies an adversarial training framework to TabPFN V2, showing up to a 6% increase in mean normalized AUC over the original model and traditional baselines, using less than 100k additional synthetic datasets. The models can be pre-trained entirely on synthetic data, unlocking the ability to “design data generators that encourage desirable model properties.”

This synthetic-first approach is a paradigm shift. It means your foundation model’s performance isn’t tied to the availability or cleanliness of real-world labeled data. You can adversarially generate challenging synthetic datasets to stress-test and fortify the model, a technique detailed in the study for improving robustness.

What Happens to the Data Science Workflow Now?

The immediate consequence is the collapse of the feature engineering and model tuning pipeline.

The classic workflow, data prep → feature creation → model selection → hyperparameter tuning → validation, is reduced to: data prep → feed to TabPFN. An entire swath of data science consultancy, tooling, and internal process is suddenly optional.

It also changes the cost model. Training a complex, bespoke gradient-boosted model at scale isn’t cheap. Running a forward pass through a pre-trained transformer on a GPU might be.

Of course, this isn’t a panacea. The capability is still predominantly for classification tasks. It doesn’t yet handle massive-scale regression or time-series forecasting out of the box. The open-source version has been downloaded 2.3M times, largely by engineers at Microsoft, Amazon, and Walmart exploring its limits, but the Scaling Mode is currently a waitlist feature. The revolution is here, but it’s still rolling out.

The Real Question Isn’t “If”, But “When”

The trajectory is undeniable. Scaling is solved. Benchmark leadership is achieved. Enterprise adoption is underway. The question is no longer if transformer-based models will dominate tabular data, but how quickly the ecosystem will reorganize around them.

Will XGBoost become the “legacy system” for tabular data by 2026? Will every major cloud provider offer a TabPFN-like foundation model as a managed service? The writing is on the wall, written in 10 million rows of data.

Related Articles