Agentic Transformation · Q&A for Blackbaud · 2026
A Q&A briefing for Blackbaud product & engineering leadership

Agentic Transformation.

Sourced from conversations with
Abacus Instacart Cargurus Fidelity Docker DigitalOcean GitHub Archimedes Variata Domino Assignar
+ 7 months of Stoa shipped fully agentic.
02Q1 · The "before" picture

Where teams started,
and what triggered the change.

Team size, release cadence, and how manual the pipeline was. Different for startups than for established companies. But three things were near-universal.

01 · Org shape
Agile + SAFe at the top, monolith startups at the bottom.
  • Larger orgs: scaled-Agile / microservices, autonomous-ish teams.
  • Roadmap-planning cadences "try to cut across all of them," and struggle.
  • Startups: monolith. "Everybody's building everything." Workflow not as tight or rigorous.
02 · Git posture
Feature branches, foregone conclusion.
  • Larger orgs: long-lived branches, days to weeks.
  • Smaller, more agile shops: held open "only a few days."
  • Manual CI/CD, mostly. Releases tied to branches, not push-to-dev.
03 · Trigger to transform
CEO-led, top-down, was the fastest pattern.
  • Instacart: CEO-championed across ~1,000 engineers.
  • Implementation: a dedicated AI-tooling subset of platform engineering.
  • Owns purchasing, training, and adoption measurement.
"In the cases where it happened the fastest, the trigger decision came from top down. And in a lot of cases from the CEO directly."
03Q2 · How teams & people reacted

Two true stories, sized very differently.

Case A · the 5-person greenfield
Assignar · new construction-finance product
  • 20–30 eng org, 5-person team carved off · 1 PM, 4 engineers, CTO active on the team.
  • Stuck with familiar GitHub flow. Injected agents as the assignee on tickets. Copilot picks up the ticket, makes the feature branch, opens the PR.
  • Most of their time writing tickets and reviewing agent output. Very little time on the code. Code review became structural; security scans run by a second agent.
  • Engineers pushed into product/requirements. PM pushed into technical/architectural specs. Everybody stretched.
  • Hierarchy collapsed flat. CTO's role: "acting as traffic cop." Sequencing async agent merges to minimize conflicts.
  • Mood: excited, "maybe a little trepidatious." Greenfield + new team made it possible.
Case B · the 1,000-engineer org
Instacart · top-down expectation, no kickoff
  • Existing teams, existing process. AI enablement layered tooling, training, Slack channels, in-person sessions on top.
  • Distribution of reactions: early adopters who loved it, real resistance from others, a long tail "not actively against, but really dragging their feet."
  • "A whole mix" across a thousand-engineer org, and that's the point: scale guarantees a distribution.
  • Enablement team's framing: "doing everything they could to drag people along." Literally flew to offices to sit with engineers in person.
  • No project, no kickoff, no cutover point. The expectation was clear; the mechanism for arrival was diffuse.
"Most of their time writing tickets and reviewing the output of agents. Very little time on the code itself."
04Q3 · Where it failed & what surprised us

Failure modes we (and the field) actually hit.

Failure 01
Prompt-based, not spec-based.

Early teams threw prompts at the model and watched output drift. Most are past this. Agents perform much better when fed context intentionally. Plan and execute on separate turns of the crank. Specdocs loaded upfront → much better results.

Failure 02
One teammate runs ahead. Shared mental model collapses.

The fast person has the system in their head. As they ship, complexity grows; others can't keep up. "Context isn't fully self-contained, even if you check in a bunch of spec documents." The fix is counterintuitive: intentionally slow down to rebuild shared mental models.

Hardest stages to hand to AI
StageHand-offWhy
Unit testsmostly safeAgents are good, except when they hallucinate stub-only tests that always pass.
Integration testshuman-ledCross-component judgment; agents can't see the whole system.
Smoke testshuman-ledLive in your product daily; what the model can't simulate.
Manual / exploratoryhuman-led"The major human gate between staging and prod."
What surprised us, positively
The unit of hand-off keeps growing.

Two years ago: 30-second nudges. Now: 10–30-minute, sometimes hour-long tasks are routinely delegated. If your assumptions are based on the experience you had even six months ago, recalibrate.

"A need to intentionally slow down to increase shared understanding and shared mental models."
05Q4 · Decision gates: what stays human

Code review broke. Two new gates moved in.

You can no longer review code line by line. Volume + intent-vs-output mismatch. Faster teams (Stoa, Assignar) replaced it.

Gate 01 · before implementation
Intent review
soft · non-gating

Share the spec, design, or intention with the team before the agent starts (or while it's working). Outcomes: shared mental model, alignment on what's changing, course-correct early when intent is wrong. Worst case, you scrap what the agent did and re-implement.

Doesn't gate. The agent can implement in parallel while feedback comes in.

Gate 02 · between staging & prod
Integration / smoke / manual testing
hard · gating

Where humans "exert heavy influence." Live in the product daily in staging; decide what's release-worthy. Agents can write good unit tests when constrained, but cross-component, smoke, and exploratory tests stay human.

Moved to agents
  • ✓ Unit-test authoring (with humans on the spec)
  • ✓ Structural code review
  • ✓ Security scans (a second agent watches the first)
  • ✓ Commit + PR mechanics, release-note writing
Stayed human
  • ✓ Intent & spec authorship
  • ✓ Integration / smoke / manual test design + execution
  • ✓ Architectural-impact review of agent output
  • ✓ Release-go decision at the staging→prod boundary
06Q5 · Metrics & what actually changed

Heavy adoption. Fuzzy gains. Bottlenecks moving.

16.3%of 30,000 developers say AI agents significantly improve productivity · 2025 dev survey
75%of engineers use AI tools. Most orgs see no measurable performance gains · Faros studies
Why aggregate gains look fuzzy
Multiple bottlenecks. They move around.
  • Watch lead time only? Yes, it improved.
  • Code-gen collapsed → code review is the new pinch (sheer volume).
  • Teams still figuring out where else to put gates.
"What % improvement?" The honest answer
  • Meaningful for shipping speed.
  • Invisible to most macro dashboards.
  • Still being re-baselined as the bottleneck shifts.
The metric the field isn't measuring yet
Intent Lead Time
ILT = t(first commit) − t(decision captured)

DORA's clock starts at git commit. The expensive thing now happens before commit: meetings, decisions, PRDs, tickets. ILT is the companion to DORA Lead Time. Together, end-to-end. Stoa is tracking it; we'd argue every transformation should baseline it this quarter.

"Yes, there are improvements in lead time. But the bottlenecks move around."
07Q6 · Security, compliance, IP

Data exfiltration was the early fear.
Supply chain is the live one.

Mostly solved
Cloud AI data risk.
  • DigitalOcean once blocked cloud AI models entirely.
  • Resolved by cloud-provider partnerships + Bedrock + data terms.
  • "Trusted computing for AI." Security teams got comfortable.
Live concerns — growing, not shrinking
01 · Secrets leak more easily

Subtle but trending. Agents touch more files, faster, with less ceremony.

02 · Supply chain hygiene degrading

Compromised packages slip in. Human review on third-party deps is dropping.

Guardrails to put in place
  • Hard decision gate on third-party package inclusion.
  • Continuous supply-chain scanning on every push.
  • Treat secrets-scanning as a first-class CI step.
"Human review on those things has been dropping down. Software-supply-chain security scanners become absolutely critical, if not more so than in the past."
08Q7 · Idea → delivery → maintenance

The methodology mostly stayed.
The artifacts got promoted.

What's the same
Most teams: Agile / SAFe + a homegrown roadmap layer.

We aren't making the case for ripping that out. We are making the case that the artifacts inside it have changed weight.

The silent loser: stakeholder context

Things move so fast that team members "who used to be intimately involved start to lose context." Stakeholders, customers, internal partners. It's hard for people to keep up.

The hidden win

The same agents producing the code can produce and maintain the secondary artifacts: change logs, release notes, documentation, internal playbooks, support books. The best teams already do this.

Artifact promotion
ArtifactBefore agentsWith agents
Codesource of truthone valid impl
Release notes"write & toss"agent-maintained
Changelog / playbooktertiary, driftsagent-maintained
Spec / PRDPDF nobody reopensfirst-class
AS-BUILT architecturedidn't existcritical

"Before, we'd write them and toss them and say the code is the source of truth." Now specs and AS-BUILT architecture are absolutely critical. Agents need them as context; humans need them as shared mental model.

"Secondary or non-code artifacts end up taking on a higher weight than they ever had before."
09Q8 · Sequencing

Roll the tools out wide.
Roll the workflow out narrow.

Tool rollout · wide
Org-wide. Now.
force multiplier

"Even if nothing else about your process changes, there is a difference in productivity" for the portion of your team that adopts. Pair the rollout with training, Slack channels, in-person sessions. Night-and-day vs. not using.

Workflow rollout · narrow
One controlled team. Greenfield if you can.
controlled experiment

Fully agentic workflow (agents writing large chunks of code, not just autocompleting) needs a team that's bought in, ideally on a new product surface. Let them establish the approach inside your culture, then expand.

Pattern that worked

Stoa: greenfield, small team. Not fair to extrapolate. Across the field: start with one team, ideal circumstances, new product. Establish the approach. Then expand.

Pattern that didn't

Force-top-down adoption across all engineers without a controlled team to shape the workflow → enablement burns cycles "dragging people along."

Forcing AI adoption top-down across the whole engineering org cost more effort and more missteps than starting with one well-set-up team and letting them shape the workflow first.
10Q9 · If we were starting over

Two extremes. Both ends will burn you.

Extreme 01 · the timid end
Underestimating what agents can do.

Capabilities have changed dramatically every six months. If your assumptions are based on the experience you had even six months ago. They're stale.

Countermove: a special team (or small set of teams) pushing agents past what you'd assume they can do. Give them latitude to break your old mental model.

Extreme 02 · the careless end
Letting go of things you shouldn't.

Retain control over intent. Be explicit about what you want from agents. The intent is yours; the output isn't trustworthy without it.

Be rigorous and ruthless in assessing output quality. Not in terms of code style. In terms of functionality and architectural impact. That is the human's last and most important job.

"Be rigorous and ruthless in assessing the quality of the output: not in terms of the code, but the functionality and architectural impact."
11Synthesis · the thesis underneath all nine answers

The shift, in one sentence per layer.

LayerBefore agentsWith agents
Bottleneck code-gen & deploy intent (decision → commit)
Gate line-by-line code review intent review (soft) + integration / smoke / manual (hard)
Artifact code is the source of truth; specs & arch tertiary specs + AS-BUILT architecture become first-class
Measurement DORA Lead Time (post-commit) DORA + Intent Lead Time (pre-commit)
Org pattern process is the lever tools wide · workflow narrow · named AI-enablement function

The point of building Stoa is to make this loop routine: capture intent live, version it next to the code, let agents pick the spec up directly, keep tests and integration sacred, and watch ILT fall.

12Take it home + open questions

Three artifacts. Three moves. Three open questions.

Move 01
Stand up an intent review.

Non-gating checkpoint upstream of any agent-driven implementation. The first artifact your AI-enablement team can ship.

Move 02
Add a third-party-package gate.

Decision step + a software-supply-chain scanner on every push. The single highest-leverage AI-era guardrail.

Move 03
Baseline ILT for one feature.

t(first commit) − t(decision). Anything is a baseline.

Open questions I'd love to push on with you
  • Which Blackbaud surface is most ready for a fully-agentic-workflow pilot? (Greenfield > brownfield, by a lot.)
  • What's your current AS-BUILT-doc posture? If "we don't have one," that's the first artifact to commission.
  • Where would you rather pay the coordination cost: at the tool layer (wide), or the workflow layer (narrow)?
Greg Ceccarelli Co-founder & CPO Jake Levirne Co-founder & CEO SpecStory · withstoa.com
navigate   Space next   Home / End jump
01  /  10