How a top Chinese AI model overcame US sanctions

The AI community is abuzz over DeepSeek R1, a new open-source reasoning model.

The model was developed by the Chinese AI startup DeepSeek, which claims that R1 matches or even surpasses OpenAI’s ChatGPT o1 on multiple key benchmarks but operates at a fraction of the cost.

“This could be a truly equalizing breakthrough that is great for researchers and developers with limited resources, especially those from the Global South,” says Hancheng Cao, an assistant professor in information systems at Emory University.

DeepSeek’s success is even more remarkable given the constraints facing Chinese AI companies in the form of increasing US export controls on cutting-edge chips. But early evidence shows that these measures are not working as intended. Rather than weakening China’s AI capabilities, the sanctions appear to be driving startups like DeepSeek to innovate in ways that prioritize efficiency, resource-pooling, and collaboration.

To create R1, DeepSeek had to rework its training process to reduce the strain on its GPUs, a variety released by Nvidia for the Chinese market that have their performance capped at half the speed of its top products, according to Zihan Wang, a former DeepSeek employee and current PhD student in computer science at Northwestern University.

DeepSeek R1 has been praised by researchers for its ability to tackle complex reasoning tasks, particularly in mathematics and coding. The model employs a “chain of thought” approach similar to that used by ChatGPT o1, which lets it solve problems by processing queries step by step.

Dimitris Papailiopoulos, principal researcher at Microsoft’s AI Frontiers research lab, says what surprised him the most about R1 is its engineering simplicity. “DeepSeek aimed for accurate answers rather than detailing every logical step, significantly reducing computing time while maintaining a high level of effectiveness,” he says.

DeepSeek has also released six smaller versions of R1 that are small enough to run locally on laptops. It claims that one of them even outperforms OpenAI’s o1-mini on certain benchmarks.“DeepSeek has largely replicated o1-mini and has open sourced it,” tweeted Perplexity CEO Aravind Srinivas. DeepSeek did not reply to MIT Technology Review’s request for comments.

Despite the buzz around R1, DeepSeek remains relatively unknown. Based in Hangzhou, China, it was founded in July 2023 by Liang Wenfeng, an alumnus of Zhejiang University with a background in information and electronic engineering. It was incubated by High-Flyer, a hedge fund that Liang founded in 2015. Like Sam Altman of OpenAI, Liang aims to build artificial general intelligence (AGI), a form of AI that can match or even beat humans on a range of tasks.

Training large language models (LLMs) requires a team of highly trained researchers and substantial computing power. In a recent interview with the Chinese media outlet LatePost, Kai-Fu Lee, a veteran entrepreneur and former head of Google China, said that only “front-row players” typically engage in building foundation models such as ChatGPT, as it’s so resource-intensive. The situation is further complicated by the US export controls on advanced semiconductors. High-Flyer’s decision to venture into AI is directly related to these constraints, however. Long before the anticipated sanctions, Liang acquired a substantial stockpile of Nvidia A100 chips, a type now banned from export to China. The Chinese media outlet 36Kr estimates that the company has over 10,000 units in stock, but Dylan Patel, founder of the AI research consultancy SemiAnalysis, estimates that it has at least 50,000. Recognizing the potential of this stockpile for AI training is what led Liang to establish DeepSeek, which was able to use them in combination with the lower-power chips to develop its models.

Tech giants like Alibaba and ByteDance, as well as a handful of startups with deep-pocketed investors, dominate the Chinese AI space, making it challenging for small or medium-sized enterprises to compete. A company like DeepSeek, which has no plans to raise funds, is rare.

Zihan Wang, the former DeepSeek employee, told MIT Technology Review that he had access to abundant computing resources and was given freedom to experiment when working at DeepSeek, “a luxury that few fresh graduates would get at any company.”

In an interview with the Chinese media outlet 36Kr in July 2024 Liang said that an additional challenge Chinese companies face on top of chip sanctions, is that their AI engineering techniques tend to be less efficient. “We [most Chinese companies] have to consume twice the computing power to achieve the same results. Combined with data efficiency gaps, this could mean needing up to four times more computing power. Our goal is to continuously close these gaps,” he said.

But DeepSeek found ways to reduce memory usage and speed up calculation without significantly sacrificing accuracy. “The team loves turning a hardware challenge into an opportunity for innovation,” says Wang.

Liang himself remains deeply involved in DeepSeek’s research process, running experiments alongside his team. “The whole team shares a collaborative culture and dedication to hardcore research,” Wang says.

As well as prioritizing efficiency, Chinese companies are increasingly embracing open-source principles. Alibaba Cloud has released over 100 new open-source AI models, supporting 29 languages and catering to various applications, including coding and mathematics. Similarly, startups like Minimax and 01.AI have open-sourced their models.

According to a white paper released last year by the China Academy of Information and Communications Technology, a state-affiliated research institute, the number of AI large language models worldwide has reached 1,328, with 36% originating in China. This positions China as the second-largest contributor to AI, behind the United States.

“This generation of young Chinese researchers identify strongly with open-source culture because they benefit so much from it,” says Thomas Qitong Cao, an assistant professor of technology policy at Tufts University.

“The US export control has essentially backed Chinese companies into a corner where they have to be far more efficient with their limited computing resources,” says Matt Sheehan, an AI researcher at the Carnegie Endowment for International Peace. “We are probably going to see a lot of consolidation in the future related to the lack of compute.”

That might already have started to happen. Two weeks ago, Alibaba Cloud announced that it has partnered with the Beijing-based startup 01.AI, founded by Kai-Fu Lee, to merge research teams and establish an “industrial large model laboratory.”

“It is energy-efficient and natural for some kind of division of labor to emerge in the AI industry,” says Cao, the Tufts professor. “The rapid evolution of AI demands agility from Chinese firms to survive.”

Artificial Intelligence