A Q&A briefing for Blackbaud product & engineering leadership

Agentic Transformation.

Sourced from conversations with

Abacus Instacart Cargurus Fidelity Docker DigitalOcean GitHub Archimedes Variata Domino Assignar

+ 7 months of Stoa shipped fully agentic.

Greg Ceccarelli · Co-founder & CPO, SpecStory

Jake Levirne · Co-founder & CEO, SpecStory

Building withstoa.com · April 2026

02Q1 · The "before" picture

Where teams started,
and what triggered the change.

Team size, release cadence, and how manual the pipeline was. Different for startups than for established companies. But three things were near-universal.

01 · Org shape

Agile + SAFe at the top, monolith startups at the bottom.

Larger orgs: scaled-Agile / microservices, autonomous-ish teams.
Roadmap-planning cadences "try to cut across all of them," and struggle.
Startups: monolith. "Everybody's building everything." Workflow not as tight or rigorous.

02 · Git posture

Feature branches, foregone conclusion.

Larger orgs: long-lived branches, days to weeks.
Smaller, more agile shops: held open "only a few days."
Manual CI/CD, mostly. Releases tied to branches, not push-to-dev.

03 · Trigger to transform

CEO-led, top-down, was the fastest pattern.

Instacart: CEO-championed across ~1,000 engineers.
Implementation: a dedicated AI-tooling subset of platform engineering.
Owns purchasing, training, and adoption measurement.

"In the cases where it happened the fastest, the trigger decision came from top down. And in a lot of cases from the CEO directly."

03Q2 · How teams & people reacted

Two true stories, sized very differently.

Case A · the 5-person greenfield

Assignar · new construction-finance product

20–30 eng org, 5-person team carved off · 1 PM, 4 engineers, CTO active on the team.
Stuck with familiar GitHub flow. Injected agents as the assignee on tickets. Copilot picks up the ticket, makes the feature branch, opens the PR.
Most of their time writing tickets and reviewing agent output. Very little time on the code. Code review became structural; security scans run by a second agent.
Engineers pushed into product/requirements. PM pushed into technical/architectural specs. Everybody stretched.
Hierarchy collapsed flat. CTO's role: "acting as traffic cop." Sequencing async agent merges to minimize conflicts.
Mood: excited, "maybe a little trepidatious." Greenfield + new team made it possible.

Case B · the 1,000-engineer org

Instacart · top-down expectation, no kickoff

Existing teams, existing process. AI enablement layered tooling, training, Slack channels, in-person sessions on top.
Distribution of reactions: early adopters who loved it, real resistance from others, a long tail "not actively against, but really dragging their feet."
"A whole mix" across a thousand-engineer org, and that's the point: scale guarantees a distribution.
Enablement team's framing: "doing everything they could to drag people along." Literally flew to offices to sit with engineers in person.
No project, no kickoff, no cutover point. The expectation was clear; the mechanism for arrival was diffuse.

"Most of their time writing tickets and reviewing the output of agents. Very little time on the code itself."

04Q3 · Where it failed & what surprised us

Failure modes we (and the field) actually hit.

Failure 01

Prompt-based, not spec-based.

Early teams threw prompts at the model and watched output drift. Most are past this. Agents perform much better when fed context intentionally. Plan and execute on separate turns of the crank. Specdocs loaded upfront → much better results.

Failure 02

One teammate runs ahead. Shared mental model collapses.

The fast person has the system in their head. As they ship, complexity grows; others can't keep up. "Context isn't fully self-contained, even if you check in a bunch of spec documents." The fix is counterintuitive: intentionally slow down to rebuild shared mental models.

Hardest stages to hand to AI

Stage	Hand-off	Why
Unit tests	mostly safe	Agents are good, except when they hallucinate stub-only tests that always pass.
Integration tests	human-led	Cross-component judgment; agents can't see the whole system.
Smoke tests	human-led	Live in your product daily; what the model can't simulate.
Manual / exploratory	human-led	"The major human gate between staging and prod."

What surprised us, positively

The unit of hand-off keeps growing.

Two years ago: 30-second nudges. Now: 10–30-minute, sometimes hour-long tasks are routinely delegated. If your assumptions are based on the experience you had even six months ago, recalibrate.

"A need to intentionally slow down to increase shared understanding and shared mental models."

05Q4 · Decision gates: what stays human

Code review broke. Two new gates moved in.

You can no longer review code line by line. Volume + intent-vs-output mismatch. Faster teams (Stoa, Assignar) replaced it.

Gate 01 · before implementation

Intent review

soft · non-gating

Share the spec, design, or intention with the team before the agent starts (or while it's working). Outcomes: shared mental model, alignment on what's changing, course-correct early when intent is wrong. Worst case, you scrap what the agent did and re-implement.

Doesn't gate. The agent can implement in parallel while feedback comes in.

Gate 02 · between staging & prod

Integration / smoke / manual testing

hard · gating

Where humans "exert heavy influence." Live in the product daily in staging; decide what's release-worthy. Agents can write good unit tests when constrained, but cross-component, smoke, and exploratory tests stay human.

Moved to agents

✓ Unit-test authoring (with humans on the spec)
✓ Structural code review
✓ Security scans (a second agent watches the first)
✓ Commit + PR mechanics, release-note writing

Stayed human

✓ Intent & spec authorship
✓ Integration / smoke / manual test design + execution
✓ Architectural-impact review of agent output
✓ Release-go decision at the staging→prod boundary

06Q5 · Metrics & what actually changed

Heavy adoption. Fuzzy gains. Bottlenecks moving.

16.3%of 30,000 developers say AI agents significantly improve productivity · 2025 dev survey

75%of engineers use AI tools. Most orgs see no measurable performance gains · Faros studies

Why aggregate gains look fuzzy

Multiple bottlenecks. They move around.

Watch lead time only? Yes, it improved.
Code-gen collapsed → code review is the new pinch (sheer volume).
Teams still figuring out where else to put gates.

"What % improvement?" The honest answer

Meaningful for shipping speed.
Invisible to most macro dashboards.
Still being re-baselined as the bottleneck shifts.

The metric the field isn't measuring yet

Intent Lead Time

ILT = t(first commit) − t(decision captured)

DORA's clock starts at git commit. The expensive thing now happens before commit: meetings, decisions, PRDs, tickets. ILT is the companion to DORA Lead Time. Together, end-to-end. Stoa is tracking it; we'd argue every transformation should baseline it this quarter.

"Yes, there are improvements in lead time. But the bottlenecks move around."

07Q6 · Security, compliance, IP

Data exfiltration was the early fear.
Supply chain is the live one.

Mostly solved

Cloud AI data risk.

DigitalOcean once blocked cloud AI models entirely.
Resolved by cloud-provider partnerships + Bedrock + data terms.
"Trusted computing for AI." Security teams got comfortable.

Live concerns — growing, not shrinking

01 · Secrets leak more easily

Subtle but trending. Agents touch more files, faster, with less ceremony.

02 · Supply chain hygiene degrading

Compromised packages slip in. Human review on third-party deps is dropping.

Guardrails to put in place

Hard decision gate on third-party package inclusion.
Continuous supply-chain scanning on every push.
Treat secrets-scanning as a first-class CI step.

"Human review on those things has been dropping down. Software-supply-chain security scanners become absolutely critical, if not more so than in the past."

08Q7 · Idea → delivery → maintenance

The methodology mostly stayed.
The artifacts got promoted.

What's the same

Most teams: Agile / SAFe + a homegrown roadmap layer.

We aren't making the case for ripping that out. We are making the case that the artifacts inside it have changed weight.

The silent loser: stakeholder context

Things move so fast that team members "who used to be intimately involved start to lose context." Stakeholders, customers, internal partners. It's hard for people to keep up.

The hidden win

The same agents producing the code can produce and maintain the secondary artifacts: change logs, release notes, documentation, internal playbooks, support books. The best teams already do this.

Artifact promotion

Artifact	Before agents	With agents
Code	source of truth	one valid impl
Release notes	"write & toss"	agent-maintained
Changelog / playbook	tertiary, drifts	agent-maintained
Spec / PRD	PDF nobody reopens	first-class
AS-BUILT architecture	didn't exist	critical

"Before, we'd write them and toss them and say the code is the source of truth." Now specs and AS-BUILT architecture are absolutely critical. Agents need them as context; humans need them as shared mental model.

"Secondary or non-code artifacts end up taking on a higher weight than they ever had before."

09Q8 · Sequencing

Roll the tools out wide.
Roll the workflow out narrow.

Tool rollout · wide

Org-wide. Now.

force multiplier

"Even if nothing else about your process changes, there is a difference in productivity" for the portion of your team that adopts. Pair the rollout with training, Slack channels, in-person sessions. Night-and-day vs. not using.

Workflow rollout · narrow

One controlled team. Greenfield if you can.

controlled experiment

Fully agentic workflow (agents writing large chunks of code, not just autocompleting) needs a team that's bought in, ideally on a new product surface. Let them establish the approach inside your culture, then expand.

Pattern that worked

Stoa: greenfield, small team. Not fair to extrapolate. Across the field: start with one team, ideal circumstances, new product. Establish the approach. Then expand.

Pattern that didn't

Force-top-down adoption across all engineers without a controlled team to shape the workflow → enablement burns cycles "dragging people along."

Forcing AI adoption top-down across the whole engineering org cost more effort and more missteps than starting with one well-set-up team and letting them shape the workflow first.

10Q9 · If we were starting over

Two extremes. Both ends will burn you.

Extreme 01 · the timid end

Underestimating what agents can do.

Capabilities have changed dramatically every six months. If your assumptions are based on the experience you had even six months ago. They're stale.

Countermove: a special team (or small set of teams) pushing agents past what you'd assume they can do. Give them latitude to break your old mental model.

Extreme 02 · the careless end

Letting go of things you shouldn't.

Retain control over intent. Be explicit about what you want from agents. The intent is yours; the output isn't trustworthy without it.

Be rigorous and ruthless in assessing output quality. Not in terms of code style. In terms of functionality and architectural impact. That is the human's last and most important job.

"Be rigorous and ruthless in assessing the quality of the output: not in terms of the code, but the functionality and architectural impact."

11Synthesis · the thesis underneath all nine answers

The shift, in one sentence per layer.

Layer	Before agents	With agents
Bottleneck	code-gen & deploy	intent (decision → commit)
Gate	line-by-line code review	intent review (soft) + integration / smoke / manual (hard)
Artifact	code is the source of truth; specs & arch tertiary	specs + AS-BUILT architecture become first-class
Measurement	DORA Lead Time (post-commit)	DORA + Intent Lead Time (pre-commit)
Org pattern	process is the lever	tools wide · workflow narrow · named AI-enablement function

The point of building Stoa is to make this loop routine: capture intent live, version it next to the code, let agents pick the spec up directly, keep tests and integration sacred, and watch ILT fall.

12Take it home + open questions

Three artifacts. Three moves. Three open questions.

Agentic release-PR workflow

Drop-in GitHub Actions + gh-aw prompt. Release notes write themselves tonight.

takeaways/release-pr-automation/

Intent-driven PRD template

The shape of a spec an agent can pick up. Four sections. No ticket required.

takeaways/intent-driven-prd-template.md

AI-PDLC journey map

Which tool, which phase, who owns what, at every step of the loop.

takeaways/ai-pdlc-journey-map.md

Move 01

Stand up an intent review.

Non-gating checkpoint upstream of any agent-driven implementation. The first artifact your AI-enablement team can ship.

Move 02

Add a third-party-package gate.

Decision step + a software-supply-chain scanner on every push. The single highest-leverage AI-era guardrail.

Move 03

Baseline ILT for one feature.

t(first commit) − t(decision). Anything is a baseline.

Open questions I'd love to push on with you

→ Which Blackbaud surface is most ready for a fully-agentic-workflow pilot? (Greenfield > brownfield, by a lot.)
→ What's your current AS-BUILT-doc posture? If "we don't have one," that's the first artifact to commission.
→ Where would you rather pay the coordination cost: at the tool layer (wide), or the workflow layer (narrow)?

Greg Ceccarelli Co-founder & CPO Jake Levirne Co-founder & CEO SpecStory · withstoa.com