Agentic Transformation.
Where teams started,
and what triggered the change.
Team size, release cadence, and how manual the pipeline was. Different for startups than for established companies. But three things were near-universal.
- Larger orgs: scaled-Agile / microservices, autonomous-ish teams.
- Roadmap-planning cadences "try to cut across all of them," and struggle.
- Startups: monolith. "Everybody's building everything." Workflow not as tight or rigorous.
- Larger orgs: long-lived branches, days to weeks.
- Smaller, more agile shops: held open "only a few days."
- Manual CI/CD, mostly. Releases tied to branches, not push-to-dev.
- Instacart: CEO-championed across ~1,000 engineers.
- Implementation: a dedicated AI-tooling subset of platform engineering.
- Owns purchasing, training, and adoption measurement.
Two true stories, sized very differently.
- 20–30 eng org, 5-person team carved off · 1 PM, 4 engineers, CTO active on the team.
- Stuck with familiar GitHub flow. Injected agents as the assignee on tickets. Copilot picks up the ticket, makes the feature branch, opens the PR.
- Most of their time writing tickets and reviewing agent output. Very little time on the code. Code review became structural; security scans run by a second agent.
- Engineers pushed into product/requirements. PM pushed into technical/architectural specs. Everybody stretched.
- Hierarchy collapsed flat. CTO's role: "acting as traffic cop." Sequencing async agent merges to minimize conflicts.
- Mood: excited, "maybe a little trepidatious." Greenfield + new team made it possible.
- Existing teams, existing process. AI enablement layered tooling, training, Slack channels, in-person sessions on top.
- Distribution of reactions: early adopters who loved it, real resistance from others, a long tail "not actively against, but really dragging their feet."
- "A whole mix" across a thousand-engineer org, and that's the point: scale guarantees a distribution.
- Enablement team's framing: "doing everything they could to drag people along." Literally flew to offices to sit with engineers in person.
- No project, no kickoff, no cutover point. The expectation was clear; the mechanism for arrival was diffuse.
Failure modes we (and the field) actually hit.
Early teams threw prompts at the model and watched output drift. Most are past this. Agents perform much better when fed context intentionally. Plan and execute on separate turns of the crank. Specdocs loaded upfront → much better results.
The fast person has the system in their head. As they ship, complexity grows; others can't keep up. "Context isn't fully self-contained, even if you check in a bunch of spec documents." The fix is counterintuitive: intentionally slow down to rebuild shared mental models.
| Stage | Hand-off | Why |
|---|---|---|
| Unit tests | mostly safe | Agents are good, except when they hallucinate stub-only tests that always pass. |
| Integration tests | human-led | Cross-component judgment; agents can't see the whole system. |
| Smoke tests | human-led | Live in your product daily; what the model can't simulate. |
| Manual / exploratory | human-led | "The major human gate between staging and prod." |
Two years ago: 30-second nudges. Now: 10–30-minute, sometimes hour-long tasks are routinely delegated. If your assumptions are based on the experience you had even six months ago, recalibrate.
Code review broke. Two new gates moved in.
You can no longer review code line by line. Volume + intent-vs-output mismatch. Faster teams (Stoa, Assignar) replaced it.
Share the spec, design, or intention with the team before the agent starts (or while it's working). Outcomes: shared mental model, alignment on what's changing, course-correct early when intent is wrong. Worst case, you scrap what the agent did and re-implement.
Doesn't gate. The agent can implement in parallel while feedback comes in.
Where humans "exert heavy influence." Live in the product daily in staging; decide what's release-worthy. Agents can write good unit tests when constrained, but cross-component, smoke, and exploratory tests stay human.
- ✓ Unit-test authoring (with humans on the spec)
- ✓ Structural code review
- ✓ Security scans (a second agent watches the first)
- ✓ Commit + PR mechanics, release-note writing
- ✓ Intent & spec authorship
- ✓ Integration / smoke / manual test design + execution
- ✓ Architectural-impact review of agent output
- ✓ Release-go decision at the staging→prod boundary
Heavy adoption. Fuzzy gains. Bottlenecks moving.
- Watch lead time only? Yes, it improved.
- Code-gen collapsed → code review is the new pinch (sheer volume).
- Teams still figuring out where else to put gates.
- Meaningful for shipping speed.
- Invisible to most macro dashboards.
- Still being re-baselined as the bottleneck shifts.
DORA's clock starts at git commit. The expensive thing now happens before commit: meetings, decisions, PRDs, tickets. ILT is the companion to DORA Lead Time. Together, end-to-end. Stoa is tracking it; we'd argue every transformation should baseline it this quarter.
Data exfiltration was the early fear.
Supply chain is the live one.
- DigitalOcean once blocked cloud AI models entirely.
- Resolved by cloud-provider partnerships + Bedrock + data terms.
- "Trusted computing for AI." Security teams got comfortable.
Subtle but trending. Agents touch more files, faster, with less ceremony.
Compromised packages slip in. Human review on third-party deps is dropping.
- Hard decision gate on third-party package inclusion.
- Continuous supply-chain scanning on every push.
- Treat secrets-scanning as a first-class CI step.
The methodology mostly stayed.
The artifacts got promoted.
We aren't making the case for ripping that out. We are making the case that the artifacts inside it have changed weight.
Things move so fast that team members "who used to be intimately involved start to lose context." Stakeholders, customers, internal partners. It's hard for people to keep up.
The same agents producing the code can produce and maintain the secondary artifacts: change logs, release notes, documentation, internal playbooks, support books. The best teams already do this.
| Artifact | Before agents | With agents |
|---|---|---|
| Code | source of truth | one valid impl |
| Release notes | "write & toss" | agent-maintained |
| Changelog / playbook | tertiary, drifts | agent-maintained |
| Spec / PRD | PDF nobody reopens | first-class |
| AS-BUILT architecture | didn't exist | critical |
"Before, we'd write them and toss them and say the code is the source of truth." Now specs and AS-BUILT architecture are absolutely critical. Agents need them as context; humans need them as shared mental model.
Roll the tools out wide.
Roll the workflow out narrow.
"Even if nothing else about your process changes, there is a difference in productivity" for the portion of your team that adopts. Pair the rollout with training, Slack channels, in-person sessions. Night-and-day vs. not using.
Fully agentic workflow (agents writing large chunks of code, not just autocompleting) needs a team that's bought in, ideally on a new product surface. Let them establish the approach inside your culture, then expand.
Stoa: greenfield, small team. Not fair to extrapolate. Across the field: start with one team, ideal circumstances, new product. Establish the approach. Then expand.
Force-top-down adoption across all engineers without a controlled team to shape the workflow → enablement burns cycles "dragging people along."
Two extremes. Both ends will burn you.
Capabilities have changed dramatically every six months. If your assumptions are based on the experience you had even six months ago. They're stale.
Countermove: a special team (or small set of teams) pushing agents past what you'd assume they can do. Give them latitude to break your old mental model.
Retain control over intent. Be explicit about what you want from agents. The intent is yours; the output isn't trustworthy without it.
Be rigorous and ruthless in assessing output quality. Not in terms of code style. In terms of functionality and architectural impact. That is the human's last and most important job.
The shift, in one sentence per layer.
| Layer | Before agents | With agents |
|---|---|---|
| Bottleneck | code-gen & deploy | intent (decision → commit) |
| Gate | line-by-line code review | intent review (soft) + integration / smoke / manual (hard) |
| Artifact | code is the source of truth; specs & arch tertiary | specs + AS-BUILT architecture become first-class |
| Measurement | DORA Lead Time (post-commit) | DORA + Intent Lead Time (pre-commit) |
| Org pattern | process is the lever | tools wide · workflow narrow · named AI-enablement function |
The point of building Stoa is to make this loop routine: capture intent live, version it next to the code, let agents pick the spec up directly, keep tests and integration sacred, and watch ILT fall.
Three artifacts. Three moves. Three open questions.
Agentic release-PR workflow
Drop-in GitHub Actions + gh-aw prompt. Release notes write themselves tonight.
Intent-driven PRD template
The shape of a spec an agent can pick up. Four sections. No ticket required.
AI-PDLC journey map
Which tool, which phase, who owns what, at every step of the loop.
Stand up an intent review.
Non-gating checkpoint upstream of any agent-driven implementation. The first artifact your AI-enablement team can ship.
Add a third-party-package gate.
Decision step + a software-supply-chain scanner on every push. The single highest-leverage AI-era guardrail.
Baseline ILT for one feature.
t(first commit) − t(decision). Anything is a baseline.
- → Which Blackbaud surface is most ready for a fully-agentic-workflow pilot? (Greenfield > brownfield, by a lot.)
- → What's your current AS-BUILT-doc posture? If "we don't have one," that's the first artifact to commission.
- → Where would you rather pay the coordination cost: at the tool layer (wide), or the workflow layer (narrow)?