- Superintelligence.
- Posts
- Who's Cheating in AI Tests?
Who's Cheating in AI Tests?
Your Daily Dose of AI Goodness
Featured Story
War of the benchmarks
The TLDR
War of the benchmarks rages after Grok-3's release, with companies trading fraud accusations on X. Noam Brown suggests cost-effectiveness metrics instead. Sam Altman's free GPT-5 strategy proves distribution trumps performance, leveraging his Y Combinator marketing expertise.
Just hours after the release of xAI's Grok-3, a heated battle for the title of the best LLM erupted on the social media platform X, marked by accusations of fraud and denial of each other's achievements. This highlights a growing issue: traditional benchmarks are increasingly inadequate for evaluating models.
Shortly after, Noam Brown, the mastermind behind OpenAI's computational model o1, offered a solution. Instead of constantly creating new benchmarks for various metrics, he suggested evaluating new models based on their cost-effectiveness relative to performance. This is a smart approach, as cost is often an overlooked but crucial factor.
Lower costs allow for wider distribution, boosting the company’s brand while encouraging more people, including those with little or no prior exposure, to try out AI models. This is exactly why GPT-5 will be permanently available in the free tier—a brilliant strategic move by Sam Altman.
In the end, success isn’t just about having the best model but about achieving the widest distribution through effective PR. Despite fierce competition, OpenAI stands out with Sam Altman’s exceptional marketing skills. It’s no coincidence that he was President of Y Combinator before becoming CEO of OpenAI.
Today’s Sponsor
We only support advertisers we believe in and use. To keep the newsletter free, please consider checking out our sponsors by clicking below (only if you think it will be useful). Thanks!
Gamma is maybe our favorite tool for quickly generating a website or powerpoint with AI. Super easy. Give it a look…
The future of presentations, powered by AI
Gamma is a modern alternative to slides, powered by AI. Create beautiful and engaging presentations in minutes. Try it free today.
In the News
Flying Car Production Set for 2026
Alef Aeronautics successfully tests its Model A flying car that drives and takes off vertically. The company targets production by early 2026.
Claude's Extended Thinking Matches o3-miniClaude 3.7 Sonnet with 16K thinking tokens achieves 28.6% performance, matching OpenAI's o3-mini. Longer thinking significantly improves results despite higher costs. | OpenAI Expands Deep Research AccessOpenAI extends Deep Research access to additional ChatGPT tiers, including Plus and Education. Users receive ten free monthly queries versus Pro's 120. |
Reply