Introduction to Agentic Coding
A hands-on workshop by and for computing faculty.
With help from Magda, Mark, Steve, Wayne, and Aaron, I led a half-day workshop introducing agentic coding to the rest of the Allen School faculty. In preparation for this workshop, I attended several other workshops and was surprised by how much time was spent lecturing, and how little opportunity participants were given to collaborate and share with each other. Agentic coding tools are not hard to use at first. While you can spend lots of time on agentic engineering patterns, faculty workflows are much more diverse, including workflows that don’t normally involve large amounts of software development. My goal was to provide ample time for faculty to create things, share what they created with each other, and envision how agentic coding tools might support their work in the future.
- Introduction and Demonstrations (60 min): Starting with a zero-shot prompt to build a Python sorting algorithm visualizer, what happens as we push our ideas to their limits?
- Coffee Break (30 min): “I Taught My Dog to Vibe Code Games” and agentic feedback loops.
- Guided Project (90 min): Create a k-d tree assignment to improve 2-d nearest-neighbor search. Include student concept checks, starter code, an interface for manual testing, automated test cases, and an autograder with partial credit. As a challenge: Design the assignment so that students still learn something useful even if they outsource all the work to agentic coding tools.
- Lunch Brainstorming (45 min): Brainstorm problems you want to tackle in the next session!
- Choose Your Own Adventure (90 min): Build software for your teaching or research needs.
By the end of this workshop, participants could use agentic coding to architect small-scale software prototypes with a focus on iteratively refining ideas and reducing the barriers to implementing them.
Introduction and Demonstrations
Our facilitation team briefly introduced ourselves with the goal of seeding diverse examples of how we’ve used agentic coding over the past year, such as interactive concept checks for students, reimplementing past research projects or course materials using modern frameworks, and socratically-teaching ourselves the canon of other subfields through exploratory research projects (that may even be publishable). By introducing ourselves this way, we hoped to inspire ideas for use-cases beyond traditional software development.
Then, I setup a split screen to vibe-code a single-file Python script for visualizing sorting algorithms to compare an agentic coding tool against a chatbot on the same prompt. The agentic coding tool exhibited a different development loop that not only produced code, but also saved the code to a file in the current directory, ran the program in a shell with xvfb, and evaluated the running program accordingly. This raised a number of questions about both the interface and the process. For instance, how do agentic coding tools know how to read a file or run a shell command, and how do they select the right tool at each step? What are tokens and why do we care to count them? What do the download/upload symbols signify? And why does it say “Flambéing”? A key goal of this session was to show, not tell, the defaults built into agentic coding tools today.
What if we wanted to add a web visualization instead? When I posed this task to the agentic coding tool, it generated a new, self-contained HTML file with embedded styling and JavaScript. But what I really wanted was a Python backend serving a web app frontend so that we have different frontends for running the app. For small tasks such as this, agentic coding tools often default to diving into the task straightaway, but it is often helpful to switch into /plan mode to align your ideas with your agentic coding tool. This iterative process can be driven by the /plan mode or carefully-steered by the user through the flow of the conversation. In both cases, the outcome will become part of the model’s context: the tokens the model draws on during next-token generation. We observed that the model’s context window now reached nearly 100,000 tokens, which some commentators flag as the boundary between the ‘smart zone’ and the ‘dumb zone’ since performance can sometimes degrade sharply after this point.
In practice, if we actually wanted to share this visualizer with students, we would need to ensure that it’s accessible. How did our agentic coding tool fare when asked to identify accessibility improvements in a new session? While the issues it highlighted were not necessarily wrong (the lack of screenreader navigation), the recommended techniques to address them often reflected poor practices, like preferring ARIA labels over native HTML elements. And beyond basic compliance, accessible visualizations are an open research question (even more so for animated visualizations)! Techniques such as sonification can provide a sketch of the shape of the array during sorting. But it might not work if we later add other features like interactive concept checks! The architecture we choose embeds design decisions that can be difficult for agentic coding tools to redesign later even when prompted directly.
On the topic of proceeding carefully, Aaron presented a slide with key reminders about safeguarding protected data:
- Do not include PII, FERPA, grades, or student submissions in prompts; and, when possible, use synthetic or anonymized data when experimenting.
- AI-generated apps still require review; static sites are generally low risk; full-stack apps require review, security, maintenance, and ownership.
- Ultimately, use AI to accelerate ideas, not to bypass to privacy, security, or operational responsibility.
While agentic coding tools are easy to start using, more effective use of them requires an understanding of how pre-training influences results, how model capacity relates to context limits, and how post-training creates preferences for implementing tasks rather than questioning them. Agentic coding tools place a large generative model within a harness that you can customize and a context that you steer turn-by-turn. Domain knowledge and expertise is still necessary to assess the quality of the resulting software holistically, especially for out-of-distribution tasks and criteria.
To kick-off the coffee break, I briefly introduced Caleb Leak’s project, I Taught My Dog to Vibe Code Games. By playing Momo’s live coding video in the background, we observed what Claude Code achieved over a longer, multi-turn interaction with feedback loops to generate and refine a game from the hidden meaning behind Momo’s (Caleb’s dog’s) keystrokes.
The technical pieces—keyboard routing, treat dispenser, prompt engineering, feedback tools—were all solvable engineering problems. What surprised me was how little of the final result depended on Momo typing anything meaningful. The magic isn’t in the input. It’s in the system around it. A well-crafted prompt, strong guardrails, automated verification, and good tools can turn genuine nonsense into a playable game.
If there’s a takeaway beyond the spectacle, it’s this: the bottleneck in AI-assisted development isn’t the quality of your ideas—it’s the quality of your feedback loops. The games got dramatically better not when I improved the prompt, but when I gave Claude the ability to screenshot its own work, play-test its own levels, and lint its own scene files. The same tools that let a dog’s keyboard mashing produce a working game will make your own intentional work with AI significantly better.
Guided Project
The remainder of the workshop turned away from Q&A and toward hands-on building, with time reserved at the end of each block for sharing what we built. For this guided project, I tasked faculty to:
Create a k-d tree assignment to improve 2-d nearest-neighbor search. Include student concept checks, starter code, an interface for manual testing, automated test cases, and an autograder with partial credit. As a challenge: Design the assignment so that students still learn something useful even if they outsource all the work to agentic coding tools.
The k-d tree is a relatively niche data structure, but one that faculty quickly picked-up through online resources or socratic dialogue with their AI agent. This exact feature is also an optimization that I teach in my undergraduate Data Structures and Algorithms course project, Husky Maps, so faculty could choose to either build the assignment from scratch or extend my project codebase. Given the open-ended nature of the task, we had sustained discussion in small groups for the better part of an hour, and several faculty presented their work.
To kick-off lunch, I proposed faculty brainstorm applications that would be personally meaningful:
- Teaching
- Stress-test your assignments: How easy is it for an AI to solve your assignments now? How might we better emphasize key learning objectives?
- Research
- What other kinds of software work might fit into your faculty workflows? How might you support researchers to strategically develop AI skills?
Choose Your Own Adventure
After lunch, we reworked a list of sample ideas I prepared in advance to include:
- Research analytics: Write a program to support your own navigation of a conference or subfield; or, obtain a list of all your publication and turn it into an updated, interactive academic web profile.
- Course content transformation: Iterate through a messy folder of syllabi, exams, and lecture notes to prepare new practice for students and convert inaccessible course materials into more accessible forms.
- Personal productivity: Automate assigning students to project teams and table groups, tedious manual editing of LaTeX code, or generating visualizations of technical writing to assess the quality of explanations.
- Natural language interfaces: Turn your agentic coding tool into a natural language interface for interacting with software-defined systems using model context protocol integrations.
Appendix: Technical Background
Steve Seitz’s videos introduces some useful large language model (LLM) foundations:
- Large Language Models from scratch and Large Language Models: Part 2
- Reinforcement Learning: ChatGPT and RLHF
- Reasoning Models and DeepSeek R1 from scratch
One concept not explained in these videos is how an LLM knows to stop its response. During an LLM’s pre-training, a special end-of-sequence (EOS) token is used to denote the end of a training sample. The LLM learns to predict the EOS token like any other word or punctuation mark. Through supervised fine tuning prior to RLHF, LLMs learn to place these EOS tokens more strategically for specific tasks, such as answering conversational questions.
However, LLMs are only one half of the puzzle. Agentic coding tools place LLMs in agent harnesses such as Claude Code. The harness provides tools such as file I/O and shell access that LLMs that can utilize—presuming they are sufficiently trained for tool-calling. These tools provide the mechanisms for LLMs to move beyond the information directly provided in a chat box toward engaging with information on your computer or accessible on the internet.
Ultimately, all this information is loaded into the LLM’s context window. The field of agentic engineering currently focuses on context engineering: strategies for effectively curating and managing context. Much of this has yet to be formalized, but many have shared practices that seem to help.
- Agentic Engineering Patterns by Simon Willison
- No Vibes Allowed: Solving Hard Problems in Complex Codebases by Dex Horthy
- Software Fundamentals Matter More Than Ever by Matt Pocock