Coding Agents in CI: What Breaks First
Teams often try the same workflow twice: an agent fixes a bug locally, then they ask the same tool to open a pull request from a CI job. The second step is where reality shows up.
In the editor, humans provide implicit context: the open file, recent diffs, and quick clarifications in chat. In CI, the agent only sees what you put in the prompt, the repo snapshot, and the failing log snippet. If tests are flaky, secrets are missing, or the failure is environmental, the model tends to “fix” symptoms instead of the contract your pipeline enforces.
A practical pattern is to separate concerns. Let automation produce a minimal repro (command, exit code, first failing assertion) and stash it beside the job log. Give the agent that bundle plus explicit rules: no network except package mirrors, no broad refactors, one logical change per commit. Then require the same green checks a human would need.
None of this replaces code review. It reduces churn: fewer giant diffs, fewer surprise dependency edits, and clearer boundaries between “repair build” and “redesign module.”
Comments
Discussion is powered by Giscus (GitHub Discussions). Add
repo,repoID,category, andcategoryIDunder[params.comments.giscus]inhugo.tomlusing the values from the Giscus setup tool.