Level up your career, level up your team. Hands-on practice, real exercises, and a growing community. Designed for product managers, UX professionals, designers, researchers and builders that are serious about figuring out this AI thing.
Hi there! I'm Peter.
I've been building AI products since 2023, and it's been a fascinating journey.
I've been using techniques like "evals", "observability", "synthetic data", "context engineering", a bunch of techniques that the AI community has figured out over the past few years.
The teams building the best AI products today rely on these core techniques every day, and they can look pretty complicated.
But it's not rocket science. These methods aren't magic. You don't have to be an engineer to use them. And in fact, I believe that the best AI products are built when diverse teams bring different perspectives to the table. That sounds corny but it is, perhaps surprisingly, more true in AI than any other technology I've worked with.
So I decided to build this course to demystify these techniques, giving you a practical, no-fluff guide to implementing the essential toolkit for creating great AI product experiences.
It's for product managers, UX people, researchers, strategists. Everyone that needs to understand how AI systems are actually built and made useful, without necessarily being an engineer.
It's still day 1, AI continues to evolve incredibly fast, and there is a ton of stuff to be invented. Right now is the best time to get started. If you want to jump in and join me on this journey, sign up.
See you there!
"This is opening up all sorts of new neural pathways for me to see under the hood more of how the sausage is made! 🙏"
"Very timely at my enterprise software company as evaluation of AI features scales."
"Everything I know about evals is from Peter’s talk, which is why I’m back to find out more!"
A deep dive introduction to model capabilities, context design and engineering and experience evaluation.
Let's do some context design. The model's context window is the key to creating useful and helpful output.
So we did some context design - now how does that become context engineering?
And the final missing piece: evals! But wait, do we even need them here?
What does it mean for models to be stateless? Let's build some intuition around that.
And what does it mean for models to be Stocastic? Why do they hallucinate? Can we ever get beyond that?
Some common misunderstandings about AI and Large Language Models can easily lead us astray.
What can LLMs do? How do we know what the capabilities of these models are? How are they trained? And how does that influence our product design decisions?
How are capabilities trained into models? How can we build intuition around these capabilities and best use them?
What is model character, how is it trained, and how can we learn to understand and use this beyond "Claude feels friendlier"?
Writing evals is a core skill for making AI products that actually work. Evals are our "definition of what good looks like". They are both harder than they seem to get right, and at the same time not rocket science at all - anyone can learn to write evals.
This is what we came for: some hands-on eval writing.
What are evals, why do we need them, and why isn't this just QA?
This is the fun part, hands-on writing evals together.
Evals can be tricky, and it's easy to make some very expensive (in terms of quality, end result and cost) mistakes.
One reason evals are tricky, is that it can be hard to define what Good looks like when working (as we are) in a team.
There are no evals without data sets. How do we create solid data sets? How many data points are enough? What about synthetic data?
Despite the "code" in its name, Claude Code is perhaps the most popular agentic AI system right now. Understanding and using it gives you a glimpse into what's coming the coming months and years in terms of agents. And it can be incredibly useful for non-coding tasks.
I'm not an engineer, I don't write code, what can I learn from playing around with Claude Code?
And learn a few tricks along the way.
Let's dive in and start creating.
We'll take some really interesting data on the US supreme court hearings, and build a website to explore this data with Claude Code.
Let's redesign the search we built to be clean and minimalistic. Also, what are slash commands?
We'll move from the command line, taking the app we built with Claude Code, and try out Google's Antigravity editor.
Let's try something harder - can we build a chatbot on top of this data? And get familiar with Google's Antigravity editor along the way.
Let's wrap up and review some lessons learnt.
If AI is different, and AI projects are different, how do we plan projects for AI? What are the roles and tracks we should consider? What are some common gotchas?
Context Design and Evals are two cornerstone activities for building great AI products. How to plan for them up front so they don't get squeezed at the end.
How to budget an AI project realistically: what's predictable, what isn't, and the line items teams routinely forget.
AI projects need a different skill mix. The roles and skillsets to hire, borrow, or grow when you're planning AI work.
A hands-on walkthrough of Claude Design — Anthropic's tool that creates real, code-based designs. Set up a design system, generate and refine a landing page, and see where designing-by-code shines: interactive, animated, production-quality design with a design-to-engineering handoff measured in minutes.
An introduction to Claude Design and this short, hands-on series: what the tool is, how it designs by writing code, and what we'll build together — a design system and a refined landing page for a fictional company.
Bootstrap a design system in Claude Design. Borrow an overall feel from a reference site (Stripe, via Gemini), generate a concise system for a fictional card company, and watch Claude write real code for colors, type, and components — with practical tips on choosing Sonnet vs Opus and keeping token usage under control.
Generate a landing page from your design system. Kick off a high-fidelity hero, switch to Opus for stronger design sense, steer Claude away from generic 'AI' styling, and stay productive by queuing the next prompt while it works — including a first isometric, code-based illustration.
Turn a rough illustration into a polished, animated one. Use Claude Design's markup, direct-edit, and comment tools, keep context clean with fresh chats, and iterate on an animation of cards flowing into a central approval hub — the interactive work where designing-by-code really pays off.
Step back and weigh the trade-offs of designing by writing code: complex animations and shaders within reach, design systems that PMs and engineers can build against, handoff in minutes instead of days, and Claude as a creative collaborator. Plus why it's worth experimenting with — ideally alongside your engineers and teammates.
How do you build evaluations for agents? Model capabilities are evolving fast, user expectations are shifting, and both inputs and outputs are highly variable. This series walks through how to think about agent evals — from the kinds of agents you might be building, to identifying risk, defining quality, and combining qualitative research with metrics.
Why evaluating agents is different from evaluating prompts or models. Set the stage for the series: the shifting ground (models, users, inputs/outputs) and what makes a good agent eval.
Agents whose job is to assist a person — copilots, researchers, summarizers. What "good" looks like when the human stays in the loop, and how that shapes what you measure.
Agents that do things in the world — book, send, write, deploy. The eval bar is higher: correctness, reversibility, and trust become first-class concerns.
Where agents can go wrong and which failures actually matter. A practical way to map risks so your evals cover what's costly, not just what's easy to measure.
What does "good" even mean for an agent? Turning fuzzy expectations into concrete, testable criteria that hold up across variable inputs and outputs.
Why you can't eval your way out of not understanding users. How qualitative research surfaces the failure modes and quality dimensions that metrics alone will miss.
Where metrics genuinely help, where they mislead, and how to build a metric set that complements — rather than replaces — human judgment.
Pulling the threads together: a practical playbook for building agent evals that survive model upgrades, shifting user expectations, and the inherent variability of agent work.
Content strategy is changing now that LLMs are reading, writing, and rewriting most of what we publish. This series is a practical walkthrough for content folks: setting up the right tools, structuring content as markdown, defining tone of voice and microcopy in ways an LLM can actually follow, and evaluating what comes out the other end.
What changes about content strategy when LLMs are both the writers and the readers. The shape of the series and who it's for.
The minimum viable toolkit for working with an LLM on content: where to write, where to store, how to keep humans and the model looking at the same source of truth.
Why markdown is the right substrate for LLM-era content, and how to structure your first files so they're useful to both humans and models.
How to capture tone of voice in a way an LLM can actually apply consistently — beyond vague adjectives, into concrete patterns and examples.
Generating and refining the small, high-stakes bits of text — buttons, errors, empty states — without losing voice or precision.
How to tell if the content an LLM produces is actually good. Lightweight evals for tone, accuracy, and fit, without drowning in process.
Pulling the threads together: a small, repeatable workflow for doing content strategy with LLMs in the loop.
I'm designing this course for professionals and teams who are serious about investing in their teams and careers.
$79 /month
Flexible monthly subscription. Cancel anytime.
$299 /month
Perfect for small teams getting started with AI. Cancel anytime.
$29.90/person for teams of 10
Get Team Access$799 /month
For larger teams who need more seats. Cancel anytime.
Less than $16/person for teams of 50
Get Team AccessFirst, I was pretty sceptical of the early 2021-2022 AI hype, until I saw my kids adopt AI within weeks. And it stay adopted. "This might be actually useful technology", I thought. I spent 2023, 2024 and 2025 building AI products for clients, and learning. The ins and outs of vector indexes. Why the hell did these models seem so smart?
1. There is some kind of weird, but nonetheless real "intelligence" embedded in these models. And it's getting smarter.
It took me a while to build some understanding and intuition around this. You can call it what you want. "Intelligence" is a strange word. I get why people feel uncomfortable with it. But the models do embed some strange kind of world model.
And more importantly, they are developing really different and weird, and at the same time very human-like capabilities. Can a model "reason"? Kind of no, but also kind of yes. Once I wrapped my head around this, I did become a bit more hype-y on the whole AI thing. Forget about the hype, but there is some kind of "there" there.
At the very least, it's interesting.
2. The World is slow to change, but Jobs aren't.
The idea here is that, yes, AI won't change the world overnight. Companies take time to adopt things. Societies take time. But Jobs can change rapidly. I'm already seeing how the way we worked the past 20 or so years, since the Internet, is changing. Team compositions are changing. Skillsets are changing. Roles naturally follow. So even though AI will take a long time to perculate through society, our jobs might change pretty fast.
3. Values get embedded in AI, so we need diversity.
The clearest example of how values get embedded in AI is Musk threatening to "rewrite history" to train Grok. Let's not go there. I've been in many AI product discussions where smart engineers were driving the decisions, because they understood the underlying material we are working with. But the moment you include researchers, product people, UX people, the amount of diversity in ideas and perspectives shoots up immediately.
And AI is such a strange and interesting technology in that it revolves a lot around language. The words you use in a prompt. The ideas around "what good looks like" that you embed in your evals. And so I've seen how much impact different perspectives have on these new kinds of products. So I started giving talks and teaching this stuff.
4. It's not rocket science.
The technology is very cool, but the actual product decisions being made are all about users, looking at data, language, and understanding this new material, and you don't need to be an engineer for that. That is my goal with this new platform. Involve everyone.
Level up your career, level up your team. Hands-on practice and real, hands-on exercises. Custom designed for product managers, UX professionals, designers, researchers and builders that are serious about figuring out this AI thing.