Agentic Development, Part 3: Catching Drift at the Source
This is the third post in a series on agentic development. Part 1 covers the foundation — GitHub, deploys, picking a harness. Part 2 covers conventions — CLAUDE.md, hooks, linters, formatters. This post is the deep end: the tooling I run when the basics aren't enough.
I've been letting AI write more and more of my code lately. Claude Code, Codex, agents running in parallel — the whole deal. And the output is genuinely good. Until it isn't.
The thing about AI-generated code is that it drifts. It starts clean, follows your conventions for the first few files, and then slowly introduces its own opinions. A console.log sneaks in here. An unused import there. A type assertion that should've been a proper narrowing. None of it is catastrophic on its own, but multiply it across a dozen agent sessions and suddenly your codebase has vibes you didn't ask for.
So I've been spending a lot of time on the opposite problem: not how to get AI to write code faster, but how to make sure the code it writes meets the same bar I'd hold myself to. Here's the stack I've landed on.
Lefthook: The Last Line of Defense
Lefthook is a git hooks manager written in Go. It's fast, polyglot, and configured with a single YAML file. I use it as the final checkpoint — nothing gets committed without passing formatting, linting, and type checks.
pre-commit:
parallel: true
commands:
format:
glob: "*.{js,ts,tsx,json,css}"
run: npx biome format --write {staged_files}
stage_fixed: true
lint:
glob: "*.{js,ts,tsx}"
run: npx biome lint --write {staged_files}
stage_fixed: true
typecheck:
run: npx tsgo --noEmit
python-lint:
glob: "*.py"
run: ruff check --fix {staged_files}
stage_fixed: true
python-format:
glob: "*.py"
run: ruff format {staged_files}
stage_fixed: trueThe stage_fixed: true bit is key — when a formatter auto-corrects something, lefthook re-stages the file so you don't end up with a dirty working tree after commit. And because everything runs in parallel, the whole pre-commit check finishes in a couple seconds even on larger projects.
Why lefthook over husky? Single binary, no Node.js dependency, works across any language. When your repo has TypeScript, Python, and config files, you don't want your git hooks tied to one ecosystem.
Biome: One Tool Instead of Two
For JavaScript and TypeScript, I've fully switched to Biome. It replaces both ESLint and Prettier with a single Rust binary.
The speed difference is absurd — formatting 10,000 files takes about 0.3 seconds versus Prettier's 12. Linting is similarly lopsided. But honestly, the speed isn't even the main selling point for me. It's that I configure one tool instead of two, there's no plugin ecosystem to manage, and the formatter and linter can't disagree with each other.
# The one command you need
biome check --write .That runs the formatter, linter, and import sorting in one pass. For CI, swap --write for biome ci and it'll fail on any violation without modifying files. I've published my biome.json config as a starting point if you want to adapt it for your own projects.
Ruff: The Same Idea, for Python
Ruff is to Python what Biome is to JavaScript — a single Rust-native tool that replaces an entire constellation of linters and formatters. Flake8, Black, isort, pyupgrade, autoflake — Ruff handles all of it with 800+ built-in rules.
The performance numbers are kind of silly. It lints 250,000 lines of Python in about 0.4 seconds. Pylint takes two and a half minutes on the same code. The formatter is 30x faster than Black.
ruff check --fix . # lint + auto-fix
ruff format . # format (Black-compatible)When you're running agents that generate Python, having a linter that finishes before you can blink means you can run it on every single file write without it feeling like friction.
ts-go: Type Checking That Doesn't Make You Wait
Microsoft is rewriting the TypeScript compiler in Go. It ships as TypeScript 7 eventually, but the preview is usable now via @typescript/native-preview.
The numbers speak for themselves: type-checking the VS Code codebase goes from 78 seconds to 7.5 seconds. That's a 10x improvement. For a pre-commit hook where you want --noEmit type checking, the difference between "fast enough to run on every commit" and "too slow so I'll skip it" is everything.
npm install -D @typescript/native-preview
npx tsgo --noEmitIt's still in preview — no .d.ts emit yet, no project references — but for type checking? It's ready. And it makes running type checks in pre-commit hooks practical in a way that tsc never was for larger projects.
Composing hooks: feeding errors back to the agent
Part 2 covered the basic PostToolUse hook — one matcher, one command, format on every write. Now that you have Biome, Ruff, ts-go, and (in a moment) ast-grep installed, the same mechanism can do a lot more than format. A few patterns I rely on:
Per-language routing. The default Biome hook is fine in a pure TS repo. In a polyglot repo, route by extension instead of running every linter on every file:
#!/usr/bin/env bash
file=$(jq -r '.tool_input.file_path')
case "$file" in
*.py) ruff check --fix "$file" && ruff format "$file" ;;
*.ts|*.tsx) npx biome check --write "$file" ;;
*.go) gofmt -w "$file" ;;
esacDrop that in .claude/hooks/format-file.sh, point PostToolUse at it, and the agent gets language-appropriate feedback without you maintaining four separate hook entries.
Chained checks that fail loudly. A hook isn't just for autofixing — it can also exit non-zero with a message on stderr, and Claude will see the message and self-correct on the next turn. So you can format, then lint with --no-fixes, then run ast-grep, and if any step finds something it can't fix, the agent learns about it immediately instead of at commit time:
npx biome check --write "$file"
npx biome lint "$file" || exit 2
ast-grep scan --filter "$file" || exit 2The exit 2 is the magic number — it tells Claude Code to surface stderr to the model rather than just logging it. Pair this with the ast-grep message pattern below and the agent reads "Do not use console.log — use the logger service instead" and fixes it on the spot.
PreToolUse for the things you don't want to fix later. PostToolUse runs after the damage is done. PreToolUse can block a tool call entirely by exiting non-zero. The single highest-value rule is "never run destructive Bash":
# .claude/hooks/guard-bash.sh
cmd=$(jq -r '.tool_input.command')
case "$cmd" in
*"rm -rf"*|*"git push --force"*|*"git reset --hard"*)
echo "Blocked: $cmd looks destructive. Ask before retrying." >&2
exit 2 ;;
esacThat one hook has caught me (well, the agent) more times than I'd like to admit. The agent decides it needs to "clean up the worktree," reaches for git reset --hard, and the hook stops it cold.
The shape across all of these is the same: hooks turn whatever tool you have on disk into a tight feedback loop the agent can actually learn from inside a single session. The linters from Part 2 are the input; the hooks are how that signal reaches the model.
Fallow: Catching What Linters Miss
I just started using Fallow and it fills a gap I didn't realize I had. It's a Rust-native codebase analyzer that finds dead code, unused exports, circular dependencies, and other structural issues that traditional linters don't catch.
Why does this matter for AI-generated code? Because agents are prolific creators of dead code. They'll refactor a function, create a new version, and forget to remove the old one. They'll add an export "just in case" that nothing ever imports. Fallow catches all of that:
fallow # analyze the whole project
fallow fix # auto-remove unused exports and dependenciesIt has 90 built-in framework plugins (Next.js, Remix, NestJS, etc.) that understand convention-based entry points, so it won't flag your page.tsx as unused just because nothing explicitly imports it. It also ships an MCP server and agent skills, so AI agents can query codebase health directly rather than you having to manually run it.
ast-grep: Custom Rules for Custom Conventions
This one's a game-changer for teams with strong conventions. ast-grep is a Rust-based tool that searches and lints code using AST patterns instead of regex. You write rules in YAML using actual code patterns as the match syntax.
I got into this after reading Fiberplane's blog post about how they use it. Their core insight is that CLAUDE.md instructions drift — the agent follows your conventions at first but gradually stops. The fix is to encode those conventions as ast-grep rules that fail hard in CI and pre-commit. When the agent hits the error, it reads the message, understands the fix, and self-corrects.
Here's what a rule looks like:
id: no-console-log
language: typescript
rule:
pattern: console.$METHOD($$$ARGS)
message: |
Do not use console.log — use the logger service instead.
Import { logger } from '@/lib/logger' and call logger.$METHOD().
note: |
// Instead of:
console.log("something happened")
// Use:
logger.info("something happened")
severity: errorThe key is writing the message as an instruction, not a description. "Do not use console.log — use the logger service instead" tells the agent exactly what to do. Pair it with a note showing the correct pattern and the agent fixes it on the first try.
Fiberplane uses this pattern for everything from enforcing Effect-TS conventions to keeping error types co-located. If your team has conventions that don't map to existing lint rules, ast-grep lets you codify them in minutes.
Conductor: Running It All in Parallel
If you're on a Mac and running multiple AI agents locally, you should really be using Conductor. It manages worktrees and lets you run Claude Code, Codex CLI, and other agents in parallel without them stepping on each other's files.
When you have this many guardrails in place — hooks, pre-commit checks, linting, type checking — running agents in parallel actually works because each agent operates in its own worktree with its own set of checks. The guardrails scale with the parallelism instead of fighting it.
The Stack, Summarized
Here's how it all fits together:
| Layer | Tool | When it runs |
|---|---|---|
| On every file write | Claude Code hooks → Biome / Ruff | During agent session |
| On commit | Lefthook → Biome, Ruff, ts-go, ast-grep | Pre-commit |
| Codebase health | Fallow | On demand / CI |
| In CI | All of the above | On push / PR |
The theme across all of these tools is the same: they're fast enough to run constantly without feeling like friction. Biome, Ruff, ts-go, ast-grep, and Fallow are all written in Rust or Go. They finish in milliseconds to low seconds. That speed is what makes the "run it on every file write" approach viable.
AI agents are going to write most of our code. That's happening. The question is whether the code they write meets your standards or just meets the bar of "it compiles." Guardrails are how you get the former without babysitting every session.
That's the series. Part 1 is the runway, Part 2 is the rules of the road, and this is what you reach for once you're flying enough sessions to need them. None of it is necessary on day one. All of it is worth it by month three.