What Are the Biggest Mistakes People Make Reading Hallucination Leaderboards?
https://camilasbrilliantnews.timeforchangecounselling.com/turning-gpt-5-2-conversations-into-real-deliverables-a-comparison-framework-for-consultants-and-researchers
I’ve spent 12 years building QA programs for enterprise knowledge systems. In the "old" days, we built deterministic rule engines where a bug meant a developer missed a regex