The Hurdles of AI Engineering: 5 Mistakes to Avoid When Building with Foundation Models
Have you ever been blown away by a generative AI demo—like a chatbot that writes poetry or a tool that churns out marketing copy—only to realize that turning it into something reliable is a different challenge?
Have you ever been blown away by a generative AI demo—like a chatbot that writes poetry or a tool that churns out marketing copy—only to realize that turning it into something reliable is a different challenge? If so, you're not alone. Many teams jump into building AI applications with foundation models (think GPT or Llama) with big dreams, only to trip over the same hurdles. Drawing from my experience and insights from Chip Huyen's book AI Engineering: Building AI Applications with Foundation Models, here are five common mistakes to dodge—and how to fix them.
1. Underestimating the Difficulty of Evaluation
Evaluating AI isn't like flipping a switch to see if it works. With generative models, the outputs can be subtle, like a slightly off-tone email or a summary that misses the point. These "silent failures" sneak by unnoticed at first, but they can chip away at user trust over time.
Fix it: Put evaluation front and center from the start. Think of it like taste-testing a recipe as you cook, not just at the end. Focus on metrics that tie to your goals—like whether your AI cuts customer support calls by 20%—rather than getting lost in geeky stats like "perplexity."
2. Not Taking Prompt Engineering Seriously
Prompts are the magic words that steer your AI. A small tweak—like changing "explain this" to "break this down simply"—can mean the difference between a spot-on answer and a wild tangent.
Fix it: Treat prompts like a key ingredient in your product, not a throwaway line. Test them, tweak them, and keep a record of what works. Imagine prompts as the steering wheel of your AI—grip it with intention.
3. Clinging to the Old Data-First Workflow
In the old days of machine learning, you'd hoard data like a squirrel before winter, then build your model. Foundation models flip that. They're pre-trained and ready to go, so you can start with a rough prototype and gather data later.
Fix it: Start with the user in mind. Sketch out what your AI should do—like drafting emails or answering FAQs—before obsessing over datasets. It's like building a house: get the frame up before worrying about the wallpaper.
4. Overestimating Model Context Windows
You might think a model with a huge "memory" (say, 100,000 tokens) can juggle everything you throw at it. But pile on too much, and it's like asking a chef to cook with a cluttered counter—things get messy, and mistakes creep in.
Fix it: Use Retrieval-Augmented Generation (RAG) to keep things tidy. It's like handing your AI a cheat sheet of just the info it needs, instead of a whole textbook. Less clutter, better results.
5. Ignoring the Complexity of AI Agents
Dreaming of an AI that books flights or troubleshoots tech issues step-by-step? It's exciting, but the reality is tricky. Multi-step tasks can unravel weirdly, like a GPS that takes you in circles.
Fix it: Start small and scale up. Test simple tasks first, like sending a reminder, before tackling a full workflow. Watch each step like a hawk and be ready to troubleshoot. Planning isn't an afterthought—it's the backbone.
Building AI with foundation models is full of promise, but it's also a minefield of pitfalls. Sidestep these five mistakes, and you'll be on firmer ground. Keep your focus on the user, stay disciplined in your process, and don't be afraid to experiment.