When Cosine Similarity Becomes a Weapon: The Solar-100B Open Provenance War

When Cosine Similarity Becomes a Weapon: The Solar-100B Open Provenance War

Upstage’s Solar-100B Open model faces accusations of being a repackaged GLM-Air-4.5, exposing the fragility of ‘from scratch’ claims and the desperate need for verifiable model provenance standards.

by Andre Banandre

The AI community’s trust infrastructure is cracking under its first major national scandal. When Korean AI startup Upstage unveiled Solar-100B Open last week, billed as a triumph of their government-backed “independent AI foundation model” initiative, the applause barely lasted 48 hours. A statistical analysis dropped on GitHub claims the model isn’t just inspired by Chinese competitor GLM-Air-4.5, but is mathematically indistinguishable from it. The accusation has ignited a firestorm over what “from scratch” actually means in an era where billion-parameter models converge on identical architectures.

The Statistical Smoking Gun That Might Not Be

The controversy erupted on January 1st when Sonic AI CEO Ko Seok-hyun published a GitHub report comparing Solar-100B Open and GLM-4.5-Air layer by layer. Using cosine similarity, a metric that measures how parallel two vectors are in high-dimensional space, the analysis found correlation coefficients hovering around 0.99 across multiple layers. For context, identical models score 0.99999. The implication: Solar isn’t just architecturally similar, it’s a fine-tuned derivative.

But here’s where the methodology gets messy. The report primarily analyzed normalization layer weights, not the attention mechanisms or MLP blocks that constitute a model’s core intellectual property. Reddit’s machine learning community immediately pounced on this weakness. Independent tests on other model families revealed that DeepSeek V3, V3.1, and V3.2 variants, unquestionably distinct models, also show 0.99+ similarity in norm layers. Even more damning, the same metric comparing Kimi K2 and Mistral Large 3 produces nearly identical scores.

Why? As one technical analysis explained, RMSNorm weights in deeper layers tend toward constant values because they perform minimal adjustments to already-normalized token vectors. The scale parameter collapses to a narrow range, making cosine similarity a blunt instrument that can’t distinguish convergence in training dynamics from code theft. It’s like accusing two cars of being the same model because their tire pressures are identical.

The Accusation’s Political Payload

The statistical debate masks a more explosive subtext. This isn’t just another open-source spat, it’s a geopolitical minefield. Upstage’s development was funded by South Korea’s government as part of a strategic push for AI sovereignty. The project promised a truly independent foundation model to reduce reliance on American and Chinese tech giants. If Solar-100B Open is merely a repackaged Chinese model, it represents not just technical embarrassment but potential misuse of taxpayer funds and a national security oversight failure.

Ko Seok-hyun’s statement cut deep: “It’s deeply regrettable that a model estimated to be a copy of a Chinese model with minor adjustments was submitted for a national project funded by taxpayer money.” The phrase “estimated to be” carries weight, it acknowledges the analysis is circumstantial, but the political damage is already done.

Upstage’s Unprecedented Countermove

Facing what could be an existential threat, Upstage CEO Kim Seong-hoon didn’t retreat behind legal threats. Instead, he announced a public execution of the verification process itself. On January 2nd at 3 PM KST, near Seoul’s Gangnam Station, Upstage will conduct a live audit of Solar-100B Open’s training lineage.

The promised transparency is radical: full checkpoint history and complete Weights & Biases (wandb) logs. For those unfamiliar, wandb tracks every hyperparameter adjustment, loss curve, and gradient update during training. If Solar was fine-tuned from GLM-Air, the logs would show a starting checkpoint that isn’t random initialization. If it was trained from scratch, the logs would reveal a clean lineage from random seeds through pre-training to the final model.

This move transforms the debate from statistical inference to empirical proof. It’s the AI equivalent of demanding DNA evidence instead relying on eyewitness testimony.

The Norm Layer Red Herring

The technical community’s skepticism about the cosine similarity approach reveals a deeper truth about modern LLM development. When models share the same fundamental architecture, Transformer decoder stacks with similar depth and width, they’re bound to exhibit statistical parallels in auxiliary components. The real fingerprint lies in the attention patterns and learned representations, not the scaling factors that keep training stable.

Critics on developer forums have pointed out that the accusation’s focus on norm weights is like proving plagiarism by comparing page margins. The repository mentioned in discussions also includes comparisons with Phi models, suggesting the accusers were casting a wide net for statistical matches rather than targeting specific architectural fingerprints.

The Evil Genius Marketing Theory

Some observers have floated a provocative alternative: what if this entire controversy is engineered? The timeline is suspiciously tight, accusation and dramatic public response within hours. In a crowded open-weight market where even excellent models fade into obscurity, a high-stakes public validation event guarantees headlines.

If Upstage’s logs prove clean, they don’t just clear their name, they establish a new gold standard for model transparency that competitors will be pressured to follow. The “accusation” becomes free quality assurance, and Upstage positions itself as the most trustworthy actor in a space plagued by openwashing. As one forum comment noted: “AI labs hate this simple trick to get them to release intermediate checkpoints!” The asterisked emphasis suggests admiration for what might be a brilliant manipulation of incentive structures.

The Coming Reckoning for “From Scratch” Claims

Regardless of the outcome, this incident has shattered the industry’s complacency around provenance claims. “From scratch” has become a marketing mantra without standardized verification. Companies release weights and call it “open”, but hide training data, refuse to disclose compute budgets, and obscure architectural inspirations.

The Solar-100B Open case could force a new compact. If Upstage’s public verification succeeds, expect pressure campaigns demanding similar transparency from Meta (LLaMA), Mistral, and other open-weight players. If it fails, the entire Korean independent AI initiative faces collapse, and governments worldwide will scrutinize AI funding programs more intensely.

The stakes extend beyond one company. This is about whether the AI community can build a verification infrastructure that matches its ambition. Right now, we’re flying blind, trusting corporate blog posts and hoping for the best. Cosine similarity attacks are crude, but they’re the best weapons available when training logs remain locked behind corporate firewalls.

What to Watch at the Public Verification

The January 2nd event will be livestreamed and dissected in real-time by the global AI community. Key signals to monitor:

  1. Checkpoint Integrity: Do the uploaded checkpoints show continuous training progression, or are there suspicious gaps where external weights could have been merged?

  2. Wandb Consistency: Are the experiment logs internally consistent? Do loss curves behave as expected for a from-scratch model of this size? Any signs of fine-tuning artifacts like sudden distribution shifts?

  3. Data Pipeline Evidence: While the full dataset may be too large to share, Upstage promised to reveal data composition and preprocessing details. Does it align with known GLM-Air training data?

  4. Community Audit: Independent researchers will download and analyze the artifacts. Watch for reports from trusted labs that replicate the training run or find architectural fingerprints.

  5. Legal Language: If Upstage’s claims hold, expect aggressive legal action against Sonic AI for defamation. Silence would speak volumes.

The outcome will ripple through policy circles, funding agencies, and research labs. For now, the AI world holds its breath, waiting to see if cosine similarity was a smoke alarm or a smoke bomb.


The real controversy isn’t whether Solar-100B Open is a copy, it’s that we have no reliable way to know until companies volunteer for public autopsies. Until provenance verification becomes as standard as model cards, every open-weight release will carry the whiff of suspicion. Upstage’s gamble might give us the template for trust, or it might prove that in AI, truly independent creation is becoming impossible.