Code Velocity
AI Security

Anthropic Exposes Distillation Attacks by DeepSeek and MiniMax

·4 min read·Anthropic, DeepSeek, Moonshot AI, MiniMax·Original source
Share
Diagram showing distillation attack flow from frontier AI model to illicit copies through fraudulent account networks

Anthropic Uncovers Industrial-Scale Distillation Campaigns

Anthropic has published evidence that three AI laboratories — DeepSeek, Moonshot AI, and MiniMax — ran coordinated campaigns to extract Claude's capabilities through illicit distillation. The campaigns generated over 16 million exchanges with Claude through approximately 24,000 fraudulent accounts, violating Anthropic's terms of service and regional access restrictions.

Distillation is a legitimate technique where a smaller model is trained on outputs from a stronger one. Frontier labs regularly distill their own models to create cheaper versions. But when competitors use distillation without authorization, they acquire powerful capabilities at a fraction of the cost and time needed for independent development.

The attacks targeted Claude's most differentiated features: agentic reasoning, tool use, and coding — the same capabilities that power Claude Opus 4.6 and Claude Sonnet 4.6.

Scale and Targets of Each Campaign

LabExchangesPrimary Targets
DeepSeek150,000+Reasoning, reward-model grading, censorship workarounds
Moonshot AI3.4 million+Agentic reasoning, tool use, computer vision
MiniMax13 million+Agentic coding, tool orchestration

DeepSeek used a notable technique: prompts that asked Claude to articulate its internal reasoning step by step, effectively generating chain-of-thought training data at scale. They also used Claude to generate censorship-safe alternatives to politically sensitive queries — likely to train their own models to steer conversations away from censored topics. Anthropic traced these accounts to specific researchers at the lab.

Moonshot AI (Kimi models) employed hundreds of fraudulent accounts across multiple access pathways. In a later phase, Moonshot shifted to a more targeted approach, attempting to extract and reconstruct Claude's reasoning traces.

MiniMax ran the largest campaign with over 13 million exchanges. Anthropic detected this campaign while it was still active — before MiniMax released the model it was training. When Anthropic released a new model during the active campaign, MiniMax pivoted within 24 hours, redirecting nearly half their traffic to capture the latest capabilities.

How Distillers Bypass Access Restrictions

Anthropic does not offer commercial Claude access in China for national security reasons. The labs circumvented this through commercial proxy services that resell frontier model access at scale.

These services run what Anthropic calls "hydra cluster" architectures: sprawling networks of fraudulent accounts that distribute traffic across the API and third-party cloud platforms. When one account is banned, a new one replaces it. One proxy network managed more than 20,000 fraudulent accounts simultaneously, mixing distillation traffic with unrelated customer requests to make detection harder.

What distinguishes distillation from normal usage is the pattern. A single prompt may appear benign, but when variations arrive tens of thousands of times across hundreds of coordinated accounts, all targeting the same narrow capability, the pattern becomes clear.

National Security Implications

Illicitly distilled models lack the safety guardrails that US companies build into frontier systems. These guardrails prevent AI from being used to develop bioweapons, carry out offensive cyber operations, or enable mass surveillance.

Models built through illicit distillation are unlikely to retain those protections. Foreign labs can feed unprotected capabilities into military, intelligence, and surveillance systems. If distilled models are open-sourced, dangerous capabilities spread freely beyond any government's control.

Distillation attacks also undermine US export controls. Without visibility into these attacks, the apparently rapid advancements by these labs can be incorrectly interpreted as evidence that export controls are ineffective. In reality, the advancements depend on capabilities extracted from American models, and executing extraction at scale requires the advanced chips that export controls are designed to restrict.

Anthropic's Countermeasures

Anthropic is deploying multiple defenses against distillation attacks:

  • Detection classifiers: Behavioral fingerprinting systems that identify distillation patterns in API traffic, including chain-of-thought elicitation used to construct reasoning training data
  • Intelligence sharing: Technical indicators shared with other AI labs, cloud providers, and relevant authorities for a holistic picture of the distillation landscape
  • Access controls: Strengthened verification for educational accounts, security research programs, and startup organizations — the pathways most commonly exploited
  • Model-level safeguards: Product, API, and model-level countermeasures designed to reduce output efficacy for illicit distillation without degrading legitimate use

Anthropic has also connected these findings to its earlier support for Claude Code Security capabilities for defenders, part of a broader strategy to ensure frontier AI capabilities remain protected.

Industry-Wide Response Needed

Anthropic emphasizes that no single company can solve distillation attacks alone. The campaigns exploit commercial proxy services, third-party cloud platforms, and gaps in account verification that span the entire AI ecosystem.

The growing intensity and sophistication of these campaigns narrows the window to act. Anthropic has observed that distillers adapt rapidly: when new models are released, extraction efforts pivot within hours. When accounts are banned, proxy networks replace them immediately through hydra cluster architectures with no single point of failure.

Addressing the threat requires coordinated action among AI companies, cloud providers, and policymakers. Anthropic published its findings to make the evidence available to everyone with a stake in protecting frontier AI capabilities from unauthorized extraction. The company is calling for industry-wide standards on account verification, shared threat intelligence frameworks, and policy support for enforcement against illicit distillation at scale.

Frequently Asked Questions

What are AI distillation attacks?
AI distillation attacks involve training a less capable model on the outputs of a stronger one without authorization. Competitors generate massive volumes of carefully crafted prompts to extract specific capabilities from a frontier model, then use the responses to train their own systems. Anthropic identified over 16 million illicit exchanges across approximately 24,000 fraudulent accounts used by DeepSeek, Moonshot, and MiniMax to extract Claude's capabilities.
Which companies distilled Claude's capabilities?
Anthropic identified three Chinese AI laboratories conducting industrial-scale distillation campaigns: DeepSeek (over 150,000 exchanges targeting reasoning and censorship workarounds), Moonshot AI (over 3.4 million exchanges targeting agentic reasoning and tool use), and MiniMax (over 13 million exchanges targeting agentic coding and tool orchestration).
Why are distillation attacks a national security risk?
Illicitly distilled models lack the safety guardrails that US companies like Anthropic build into their systems. These unprotected models can be deployed for offensive cyber operations, disinformation campaigns, mass surveillance, and even bioweapon development support. If distilled models are open-sourced, dangerous capabilities spread beyond any single government's control, undermining export controls designed to maintain America's AI advantage.
How did DeepSeek, Moonshot, and MiniMax access Claude?
The labs circumvented Anthropic's regional access restrictions using commercial proxy services that resell Claude API access at scale. These services run hydra cluster architectures with sprawling networks of fraudulent accounts distributed across Anthropic's API and third-party cloud platforms. One proxy network managed more than 20,000 fraudulent accounts simultaneously, mixing distillation traffic with legitimate requests to avoid detection.
How is Anthropic responding to distillation attacks?
Anthropic is deploying multiple countermeasures: behavioral fingerprinting classifiers to detect distillation patterns in API traffic, intelligence sharing with other AI labs and cloud providers, strengthened account verification, and model-level safeguards that reduce output efficacy for illicit distillation without degrading service for legitimate users. Anthropic is also calling for coordinated industry and policy responses.
What did DeepSeek specifically extract from Claude?
DeepSeek targeted Claude's reasoning capabilities, rubric-based grading tasks (making Claude function as a reward model for reinforcement learning), and censorship-safe alternatives to politically sensitive queries. They used techniques that asked Claude to articulate its internal reasoning step by step, generating chain-of-thought training data at scale. Anthropic traced these accounts to specific researchers at DeepSeek.

Stay Updated

Get the latest AI news delivered to your inbox.

Share