I have not only read the Let's Verify step by Step paper released less than 24 hours ago, I have combed the release notes and appendix, read most of the linked papers and done my own tests.
It's true, performance is massively boosted, and not just for mathematics but science and other domains too.
I'll show you comparisons with GPT 3 and PaLM 2, and demonstrate that new records are coming soon. I will also cover the 'synthetic data event horizon' and what might have gone into GPT 4's training.
I'll show you how PRM works vs ORM, and why finetuning is still relevant. Plus I'll cover reaction from Jan Leike, Ilya Sutskever, Sam Altman and more.
I will also feature the highly relevant paper 'Language Models Don’t Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting'. I'll also give a glimpse from Rob Miles about just how weirdly GPT 4 might think.
 
                