Learn how the best teams, including Ramp, Notion, and OpenAI, ship quality AI products. In this course, you'll build a customer support chatbot from scratch and improve it with evals.
LLMs are probabilistic. The same prompt can return different answers on different runs, and one prompt tweak can improve one example but make others regress. Evals are how teams measure output quality and catch regressions before users do.
In this course, you'll build a customer chatbot and learn to build evals the way top teams do to ship reliable products. Specifically, you'll work on writing deterministic and LLM-as-judge scorers, learn to compare prompt variants, and turn production traces into test cases.
14 modules across three sections — Learn, Build, Refine - taking about an hour. No prior evals experience needed.