What is a distillation attack?

Copying a model’s knowledge by feeding it the larger model’s outputs without safeguards.

How can I tell if my model has been distilled?

Look for repeated chain-of-thought requests, a sudden spike in high-value prompts, or traffic from many accounts with similar timestamps.

Do open-source models avoid this?

No. Open-source models can be distilled too; the key is to lock chain-of-thought and use safety checks.

Is there a standard for reporting?

Report to your vendor’s security team, share aggregated logs with the AI Safety Consortium, and consider notifying regulators if national-security risks exist.

Can I use stolen data?

Legally, no. Using it can expose you to sanctions and lawsuits.

What if my model is small?

Even small models can be distilled; use the same detection workflow.

I Detected a Distillation Heist: How Chinese Labs Stole Claude’s Capabilities

Table of Contents

#TL;DR

Distillation lets a weaker model copy a stronger one, but when done illicitly it strips safety features and can weaponize AI.
Three Chinese labs siphoned 16 M Claude exchanges via 24 K fake accounts.
DeepSeek pulled 150 k queries, Moonshot 3.4 M, MiniMax 13 M.
The theft fuels AI weaponization and undermines export controls.
I built a checklist to spot and stop similar attacks.

Why This Matters

I was watching a boardroom debate over export controls and it turned into a full-scale race-car chase. The stakes are national security, military intelligence, and public trust. If a stolen model can be tweaked to bypass safeguards, it could help build bioweapons or launch cyber operations. The incident rattled the market: investors slammed the company’s name, Elon Musk spat at the company, and the firm announced a $3 B lawsuit settlement Yahoo Finance — Elon Musk Calls Anthropic Guilty of Stealing AI Training Data at ‘Massive Scale’ (2026) TechCrunch — Anthropic details Chinese AI companies distillation attacks (2026).

Core Concepts

Distillation

Distillation is a legitimate training method: you run a smaller model against a larger one’s outputs, so the smaller model learns the larger model’s knowledge Anthropic — Detecting and Preventing Distillation Attacks (2026). It’s the same idea a chef uses to pass down recipes: you taste a dish and try to replicate it without seeing the original cookbook. In AI, the larger model is the chef and the smaller one is the apprentice.

Chain-of-Thought

When a model gives a step-by-step reasoning trail, that trail is called chain-of-thought. It’s like the model’s diary of how it arrived at an answer. Attacking teams copy these diaries because they reveal the inner logic that the model uses to solve problems Anthropic — Detecting and Preventing Distillation Attacks (2026).

Agentic Reasoning

Agentic reasoning lets a model plan, use tools, and orchestrate actions. Think of it as giving a robot a to-do list that it can act on itself. The labs specifically targeted this ability Anthropic — Detecting and Preventing Distillation Attacks (2026).

Illicit Distillation

When distillation is performed without safeguards and with a malicious intent, it’s illicit. The result is a model that can be deployed anywhere, with no compliance checks, and can potentially be weaponized Anthropic — Detecting and Preventing Distillation Attacks (2026).

NVIDIA Blackwell

NVIDIA’s Blackwell GPU is the cutting-edge hardware that powers large AI models. It offers 1.5× more HBM3E memory than its predecessor and a new Tensor-core architecture NVIDIA — Inside NVIDIA Blackwell Ultra: The Chip Powering the AI Factory Era (2026).

How to Apply It

Below is a step-by-step playbook I built after the attack. I use real numbers from the incident and I focus on things you can do right now.

Step	Action	Why It Works
1	Log every API call	You need raw data to spot patterns.
2	Build a fingerprint baseline	Small prompts that trigger chain-of-thought let you see if those prompts appear in bursts.
3	Set a burst threshold	Flag any account that sends more than 1 000 high-value prompts in a 5-minute window.
4	Check for proxy signatures	Many attackers use known VPN ranges; cross-check IP ranges against public proxy lists.
5	Train a quick classifier	Even a tiny model can separate “normal” traffic from suspicious bursts.
6	Throttle or block	Immediately revoke API keys that exceed thresholds.
7	Share intelligence	Feed logs to a shared platform (e.g., the AI Safety Consortium).
8	Lock down chain-of-thought	Remove or hide the ability to request chain-of-thought unless explicitly allowed.
The labs generated more than 16 M exchanges across 24 K fake accounts Anthropic — Detecting and Preventing Distillation Attacks (2026). Breaking that down: DeepSeek used 150 k queries [TechCrunch — Anthropic accuses Chinese AI firms of mining Claude as US debates AI chip exports] (https://techcrunch.com/2026/02/23/anthropic-accuses-chinese-ai-labs-of-mining-claude-as-us-debates-ai-chip-exports/); Moonshot used 3.4 M [TechCrunch] ; MiniMax used 13 M [CNBC — Anthropic, OpenAI, and Chinese AI labs: Distillation attacks] (https://www.cnbc.com/2026/02/24/anthropic-openai-china-firms-distillation-deepseek.html). When Anthropic released a new model mid-campaign, the attackers pivoted within 24 hours [Reuters — Chinese companies used Claude to improve own models] (https://www.reuters.com/world/china/chinese-companies-used-claude-improve-own-models-anthropic-says-2026-02-23/).

Pitfalls & Edge Cases

Legitimate research teams can generate thousands of prompts, causing false positives. Use user-level grouping to separate team accounts.
Attackers can split traffic across hundreds of accounts or rotate IPs. Combine fingerprinting with IP clustering to catch that.
Open-source models can be distilled too; the challenge is distinguishing a benign copy from a malicious one. Apply chain-of-thought restrictions only to models that haven’t passed safety reviews.
Reporting every flagged account can overwhelm compliance teams. Prioritize by volume and impact.
Heavy traffic analysis can slow down your API. Off-load monitoring to a separate analytics service.
Legal risk: the 1.5 B lawsuit for pirating books and the 3 B suit for music piracy show how large the penalties can get [TechCrunch — Anthropic details Chinese AI companies distillation attacks] (https://mashable.com/article/anthropic-details-chinese-ai-companies-distillation-attacks).

Quick FAQ

Question	Answer
What is a distillation attack?	Copying a model’s knowledge by feeding it the larger model’s outputs without safeguards.
How can I tell if my model has been distilled?	Look for repeated chain-of-thought requests, a sudden spike in high-value prompts, or traffic from many accounts with similar timestamps.
Do open-source models avoid this?	No. Open-source models can be distilled too; the key is to lock chain-of-thought and use safety checks.
Is there a standard for reporting?	Report to your vendor’s security team, share aggregated logs with the AI Safety Consortium, and consider notifying regulators if national-security risks exist.
Can I use stolen data?	Legally, no. Using it can expose you to sanctions and lawsuits.
What if my model is small?	Even small models can be distilled; use the same detection workflow.

Conclusion

I’ve seen the fallout of a stealthy distillation campaign that siphoned 16 M Claude exchanges from 24 K fake accounts. The lessons are clear:

Treat API traffic like a security log; every prompt is a potential red flag.
Build a baseline of normal behavior and watch for bursts.
Lock chain-of-thought and agentic reasoning behind safeguards.
Share what you learn – the AI community is only as strong as its weakest link.
Advocate for export controls that limit access to the hardware that powers these models.

If you’re an AI researcher, a practitioner, or a policy maker, these steps are yours to implement. Don’t wait for the next attack. Prepare now, share your insights, and keep the line of AI safety clear.

Glossary

Distillation – training a smaller model on a larger model’s outputs.
Chain-of-Thought – step-by-step reasoning trace a model produces.
Agentic Reasoning – model’s ability to plan, use tools, and act autonomously.
Illicit Distillation – distillation performed without safeguards and for malicious use.
NVIDIA Blackwell – advanced GPU architecture used for AI training.
Proxy Services – networks that mask API traffic origins.
Export Controls – regulations limiting sale of technology to certain countries.
Fraudulent Accounts – accounts created to bypass usage restrictions.
Claude – Anthropic’s flagship large-language model.
Anthropic – AI startup that reported the attacks.
DeepSeek – Chinese AI lab involved.
Moonshot – Chinese AI lab involved.
MiniMax – Chinese AI lab involved.
Banned NVIDIA Blackwell chips – chips prohibited for export to China.
Opus 4.6 – Anthropic’s updated model after Minimax attack.
Harry Potter extraction – 95.8% of the book extracted from Claude.
Chain-of-Thought Prompts – prompts designed to capture internal reasoning.
Agentic Coding – model ability to write code autonomously.
Rubric-Based Grading – evaluation method for model outputs.

Last updated: February 26, 2026