(Written after Neurips 2025)

This year in San Diego, NeurIPS drew over 20,000 attendees. Despite the massive turnout, there was surprising convergence around two buzzwords: RL and agents. This echoes Sam Altman's prediction from early this year: 2025 is the year of agents.


Convergence

Everyone I met at company socials talked about RL at some point. References to post-training are also about RL. According to OpenAI employees, the terms "RL" and "reasoning" are now pretty much interchangeable. The broader term "agent" is even more ubiquitous. Nearly every company at the job fairs pitched "agents" when recruiting research scientists. Even mechanistic interpretability researchers want agents to do interpretability studies for them!

Topics that excite current RL researchers

Topics that excite current agent researchers

Chaos

The chaotic part underlying this RL convergence is that it's challenging to distinguish the correct takeaways. Quoting Yejin from her keynote: "conclusions from effortless RL ≠ effortful RL". She suggested that many RL papers are overclaiming due to poor experimental setups.

Screenshot from Yejin’s talk: “The Art of (Artificial) Reasoning”

Screenshot from Yejin’s talk: “The Art of (Artificial) Reasoning”

Despite the buzz around agents, this paper released during NeurIPS week found that production agents are still built using simple scaffolds. "68% execute at most 10 steps before requiring human intervention, 70% rely on prompting off-the-shelf models instead of weight tuning, and 74% depend primarily on human evaluation."

I also heard people disagreeing about whether and how we should invest more resources to scale up RL training. The field hasn't really agreed on how to resolve sample inefficiency and is exploring different directions:

Met Sara and learned more about her startup Adaption. Really excited for it to take off one day!

Met Sara and learned more about her startup Adaption. Really excited for it to take off one day!

Expos are crazy. Many cool demos and lots and lots of hiring for post-training researchers.

Expos are crazy. Many cool demos and lots and lots of hiring for post-training researchers.