Chinese AI lab DeepSeek has caused a global sensation by revealing the technical recipe behind its advanced R1 model. DeepSeek, a small Chinese AI lab, has caused a global sensation by revealing the technical specifications of its advanced R1 model.
The company, founded by hedge fund manager Liang Wenfeng, detailed its methodology for building an LLM on a shoestring budget. R1 has the ability to learn and improve automatically without human supervision. Using just 2,048 Nvidia H800 processors and an investment of $5.6 million, DeepSeek built a 671 billion-parameter model, at a fraction of the cost required by OpenAI and Google.
Liang started 2021 by buying thousands of Nvidia GPUs while managing his High-Flyer investment fund. His team’s experience in optimizing the use of processors for stock trading has proven invaluable in AI development. The company operates like a university research center, attracting top AI engineers from top Chinese universities.
DeepSeek, along with ByteDance, offers the highest salaries for AI engineers in China. The R1 revelation has sparked heated debate in Silicon Valley about the ability of American companies to maintain their technological lead. Ritwik Gupta, an AI policy researcher at the University of California, Berkeley, points out that “there are no watertight doors on AI capabilities.
Analysts say China has a larger pool of systems engineers who understand how to optimize computing resources to train models at lower costs. But American companies are not standing still. OpenAI announced a $100 billion joint venture with SoftBank to invest in AI infrastructure, while Elon Musk’s xAI is expanding its Colossus supercomputer to more than 1 million GPUs.
The U.S. ban on exporting advanced Nvidia processors to China has forced Chinese companies to develop innovative ways to maximize the computing power of available processors—a problem that Liang’s team had already solved. DeepSeek’s strategy of sharing its technological discoveries rather than protecting them for commercial gain makes it a dangerous competitor.
The company has not sought outside funding or made any significant moves to commercialize its models. According to recent figures, DeepSeek now employs more than 200 researchers and engineers in its offices in Beijing and Hangzhou.
Chinese AI startup DeepSeek has caused a stir in global tech markets with the recent release of its new model, which has already ranked among the top apps on Apple’s App Store. The model, developed using lower-performance chips, raises questions about the United States’ dominance in AI and the inflated valuations of companies like Nvidia. Union Bancaire Privee CEO Vey-Sern Ling said DeepSeek demonstrates that it is possible to develop powerful AI models at a lower cost, which could challenge the high-cost investment strategies of a few large companies.
DeepSeek was founded by Liang Wenfeng, who has invested in advanced technologies and has developed a model that is seen as competitive with those of OpenAI and Meta Platforms. Investor Marc Andreessen has called DeepSeek’s implementation one of the most impressive and innovative developments in the field. The transparency of its model, which shows its thought process when answering user questions, has won positive reviews from users. The release of the R1 model has caused stocks on the Nasdaq index to fall, with shares falling as much as 1.9%. In contrast, stocks on the Hong Kong market have risen, with the Hang Seng Tech index rising 2%.
AI-related companies in China, such as Merit Interactive Co., have seen significant growth as they have linked their activities to DeepSeek. The release of the DeepSeek model raises doubts about the assumption that Chinese AI technology is years behind that of the United States.
Despite the restrictions imposed by Washington, DeepSeek’s model was developed using open source, making it accessible. Charu Chanana, a strategic investor at Saxo Markets, said DeepSeek’s emergence suggests that competition in the AI space is heating up, and while it is not an immediate threat, future competitors will evolve faster. The Nasdaq’s decline comes during a critical week for earnings reports from major tech companies such as Apple and Microsoft.
Analysts expect earnings growth to have slowed, while valuations remain high, raising concerns about the recent AI-fueled rally. The Nasdaq 100 is trading at 27 times estimated earnings, compared with a three-year average of 24 times. DeepSeek has unveiled a new family of Janus-Pro AI models that the company says outperform OpenAI’s DALL-E 3. Chinese AI company DeepSeek, which recently went viral, has announced the release of a new family of multimodal AI models called Janus-Pro. The company says the new models, which are available for download on the Hugging Face platform, outperform OpenAI’s DALL-E 3. Janus-Pro models range in size from 1 billion to 7 billion parameters.
DeepSeek says the parameters roughly correspond to a model’s problem-solving abilities, with those with more parameters generally performing better. Importantly, Janus-Pro is available under the MIT license, allowing for unrestricted commercial use. DeepSeek describes Janus-Pro as a model that can both analyze and create new images. According to the data published by the company, the larger Janus-Pro-7B model outperforms DALL-E 3 as well as other well-known models such as PixArt-alpha, Emu3-Gen, and Stability AI’s Stable Diffusion XL in two important AI benchmarks: GenEval and DPG-Bench. Although most Janus-Pro models can only analyze small images with resolutions up to 384 x 384, their performance is considered impressive given their compact size.
As DeepSeek specifically states in its Hugging Face post, “Janus-Pro outperforms previous unified models and matches or exceeds the performance of specialized models.” DeepSeek, a Chinese AI lab funded primarily by quantitative trading firm High-Flyer Capital Management, gained widespread recognition this week when its chatbot app soared to the top of the Apple App Store charts.
DeepSeek’s language models, which were trained using efficient computing techniques, have led many Wall Street analysts and technologists to question whether the U.S. can
DeepSeek has unveiled a new family of Janus-Pro AI models that the company says outperform OpenAI’s DALL-E 3.
Chinese AI company DeepSeek, which recently went viral, has announced the release of a new family of multimodal AI models called Janus-Pro. The company says the new models, which are available for download on the Hugging Face platform, outperform OpenAI’s DALL-E 3. Janus-Pro models range in size from 1 billion to 7 billion parameters.
DeepSeek says the parameters roughly correspond to a model’s problem-solving abilities, with those with more parameters generally performing better. Importantly, Janus-Pro is available under the MIT license, allowing for unrestricted commercial use. DeepSeek describes Janus-Pro as a model that can both analyze and create new images.
According to the data published by the company, the larger Janus-Pro-7B model outperforms DALL-E 3 as well as other well-known models such as PixArt-alpha, Emu3-Gen, and Stability AI’s Stable Diffusion XL in two important AI benchmarks: GenEval and DPG-Bench. Although most Janus-Pro models can only analyze small images with resolutions up to 384 x 384, their performance is considered impressive given their compact size.
As DeepSeek specifically states in its Hugging Face post, “Janus-Pro outperforms previous unified models and matches or exceeds the performance of specialized models.” DeepSeek, a Chinese AI lab funded primarily by quantitative trading firm High-Flyer Capital Management, gained widespread recognition this week when its chatbot app soared to the top of the Apple App Store charts.
DeepSeek’s language models, which were trained using efficient computing techniques, have led many Wall Street analysts and technologists to question whether the U.S. can maintain its lead in the artificial intelligence race and whether demand for AI chips will sustain.