- Superintelligence.
- Posts
- DeepSeek's self-improving AI
DeepSeek's self-improving AI
"This Week in AI" is now Superintelligence.

DeepSeek’s self-improving AI
The TLDR
DeepSeek, in collaboration with Tsinghua University, is developing self-improving AI models using a new method called self-principled critique tuning. This approach, part of their new open-source DeepSeek-GRM models, aims to reduce training costs while better aligning AI with human preferences. The innovation could mark a major step forward in generalist reinforcement learning.
DeepSeek is making waves in the AI world again! The Chinese shooting star, which already caused a stir in January with its low-cost reasoning model, is now going one decisive step further. In collaboration with the renowned Tsinghua University, DeepSeek is developing AI models that can improve themselves - a breakthrough that could dramatically reduce training costs.

The innovation is called “self-principled critique tuning” and promises to take reinforcement learning to a new level. Unlike previous approaches, which only work well in narrow application areas, DeepSeek's method is aimed at broader fields of application. The result is AI models that better understand human preferences and require fewer computing resources.

The new models, called DeepSeek-GRM (Generalist Reward Modeling), will be made available as open source - a gift to the global AI community. This positions DeepSeek alongside heavyweights such as Alibaba and OpenAI, who are also working on self-improving AI capabilities. Will these self-optimizing models herald the next evolution of artificial intelligence? The AI world is eagerly looking to China, where a new star is rising in the AI firmament.
You’ve heard the hype. Now it’s time for results
After two years of siloed experiments, proofs of concept that fail to scale, and disappointing ROI, most enterprises are stuck. AI isn't transforming their organizations — it’s adding complexity, friction, and frustration.
But Writer customers are seeing a positive impact across their companies. Our end-to-end approach is delivering adoption and ROI at scale. Now, we’re applying that same platform and technology to bring agentic AI to the enterprise.
This isn’t just another hype train that doesn’t deliver. The AI you were promised is finally here — and it’s going to change the way enterprises operate.
See real agentic workflows in action, hear success stories from our beta testers, and learn how to align your IT and business teams.
Graph of the day

The number of AI papers has grown exponentially in recent years.

Inference-Time Scaling: Did DeepSeek find that self-learning AI?
Researchers have improved the evaluation of AI responses, especially for complex tasks without clear rules. They developed “DeepSeek-GRM”, a flexible reward model, trained with the new method “SPCT”. This learns to generate evaluation principles and criticism itself. Crucially, the model improves significantly when more computing power is made available to it during use – in some cases more effectively than just training larger models. This promises more intelligent, adaptable AI systems without the need for constant, expensive retraining.
s1: Logical reasoning breakthrough
AI researchers demonstrate a simple way to make language models better at logical reasoning. Instead of complex training, they used only 1000 high-quality examples and a technique called “budget forcing” to control the model's “thinking time” during application. More computing time leads to better results – sometimes more effectively than larger models. This opens up an efficient, open path to more powerful AI without constant retraining, making top performance more accessible.
Inference-Time Scaling for Complex Tasks: how to improve
The paper investigates how so-called “inference-time scaling” techniques, i.e. the increased computational effort required to execute large language models (LLMs), can improve their ability to solve complex tasks step by step. What is new is the broad analysis of these techniques across different challenging tasks (e.g. mathematics, calendar planning, spatial thinking and NP-hard problems). This is relevant because it shows that future technological advances and improved verification methods hold further potential for making models more effective at tackling complex challenges.
Poll of the Day
Do we need more research breakthroughs for AGI or do we have everything we need? |
In The News
DolphinGemma: DeepMind’s AI Takes on Dolphin Communication
Google DeepMind has introduced DolphinGemma, a groundbreaking AI model developed to decode dolphin communication. Built on insights from their open-source Gemma models and trained on acoustic data from wild Atlantic spotted dolphins. DolphinGemma can recognize sound patterns and predict sequences in dolphin vocalizations. This collaboration with Georgia Tech and Dolphin Project aims to bridge the gap between human and dolphin understanding. It marks a major step toward potential interspecies communication powered by artificial intelligence.
MineWorld Revolutionizes AI World Modeling in MinecraftMineWorld is a new real-time AI world model that generates future game scenes in Minecraft based on past visuals and player actions using a visual-action autoregressive Transformer. With a novel parallel decoding method, it produces 4–7 frames per second, enabling smooth and interactive gameplay. It significantly outperforms previous open-source models in both visual fidelity and action-following accuracy. | Sonic Transforms Portrait Animation with Audio-Driven MagicSonic introduces a groundbreaking method to animate static images using just one photo and any audio input, enabling lifelike speeches, singing, and more. By leveraging global audio context and decoupling motion elements, it achieves highly realistic lip-sync, diverse facial movements, and smooth frame transitions. Ideal for ads, vlogs, virtual influencers, and education, Sonic delivers industry-grade results with just a click. |
Quote of the Day

Hi All,
As you probably noticed, we’ve rebranded to Superintelligence! We have brought in a new Editor-in-Chief to bring you even more in depth analysis on all things AI & the future. We are also adding a Chart of the Day, Quote of the Day, and Question of the Day to make your reading experience more fun & interactive. Please feel free to email us with any feedback that you have!
Cheers,
Dan
Reply