A talk for Alpine SG product leaders

How product development
works in an AI world.

Not a prediction. A practice.

Examples on these slides are drawn from one repo we shipped over the last 7 months.

Greg Ceccarelli · Co-founder & CPO, SpecStory · Building withstoa.com · April 2026

02The question

You've been asking four things.
I'm focusing in on one.

01

"How do I jolt my team to accelerate?"

02

"How do I go from individual AI to a team-wide system?"

03 → today

"How should product development work in an AI world?"

04

"How do I measure productivity beyond lines of code?"

The sub-questions inside today's theme

Are PRDs the future?
What does end-to-end AI product development actually look like?
What bridges product-AI workflows and engineering-AI workflows?
What happens after the prototype: security, handoff, release?

03Journey map · before agents

How product shipped, before agents.

A specialist at every step. The craft lived in each person's expertise and in the handoffs between them. Coding was the anchor because coding was the skill.

01 · Discover

Customer research

Interviews, surveys, market reads. Researchers, PMs, and founders in the field.

Human

02 · Ideation

Strategy meetings

Vision, OKRs, roadmap bets. Steering committees and quarterly planning.

Human

03 · Reqs

PRD in Notion

PM drafts the spec after the meeting. Reviewed with stakeholders, revised, approved.

Human

04 · Design

Figma & Jira

UX mockups, prototypes. Eng lead breaks the PRD into tickets.

Human

05 · Build

Code by hand

Engineer + IDE, pair coding, careful reviews. Days to weeks per ticket.

Human

06 · Test

QA validation

QA runs manual and automated tests against acceptance criteria.

Human

07 · Release

Release train

Weekly or monthly ship. PM writes release notes, comms sends the email.

Human

08 · Measure

Dashboards & retros

Analytics reviewed weeks later. Retros surface what to do differently next cycle.

Human

Specialists at every step

Researcher, PM, designer, engineer, QA, ops.

Each phase was its own craft. Coordinating across them was the job PMs and eng leads were hired for.

Coding was the anchor

Implementation took the most time.

Weeks per feature was expected, because careful human coding was the expertise the whole team queued behind.

04Journey map · with agents

Where AI fits. Where it doesn't.

Humans anchor the two endpoints. Tests stay human-owned, always. Agents drive the middles. Automate the DESIGN → BUILD handoff first; that's where Intent Lead Time compresses.

01 · Discover

Research with AI

Transcripts auto-summarized. Signals surfaced. Humans still frame the question.

HumanAgent

02 · Ideation

Problem sensing

Vision and bets stay human. AI synthesizes inputs; people make the calls.

Human

03 · Reqs

Live intent capture

Agents attend. Decisions timestamped, owner attributed, source-linked.

HumanAgent

04 · Design

Markdown, in repo

PRDs + design docs drafted live, versioned with the code.

HumanAgent

05 · Build

Agent in sandbox

Local CLI or cloud. Full session context. Humans intervene when needed.

Agent

06 · Test

Human-owned spec

"Never let AI specify your tests." Agents implement; humans declare correct behavior.

Human specAgent impl

07 · Release

Agentic PR

Release body written by the agent from the diff + intent. Human reviews, one click.

HumanAgent

08 · Measure

Agentic observability

Dashboards auto-summarize. Agents surface anomalies. Humans decide what to do next.

HumanAgent

Sacred rule

Never let AI specify your tests.

Tests encode shared domain judgment about what "correct" means. Ceding that cedes control. Agents implement; humans specify.

Automate first

The SPEC → BUILD handoff.

That's where activation latency lives. If the spec is picked up by the agent without a human re-routing it, most of your ILT disappears.

05The bottleneck that didn't collapse

Two collapses happened
on very different clocks.

The bottleneck moved. DORA can't see it by design. DORA's clock starts at git commit. The expensive thing now happens before commit. That gap has a name.

Two clocks, one gap

Phase	2015	2026
Intent capture	weeks	weeks still
Implementation	days to weeks	hours (agents)
Deploy	hours to days	minutes (CI/CD)

The new metric

ILT = t(first commit) − t(product decision captured)

Intent Lead Time

DORA Lead Time for Changes

decision captured

first commit

prod

ILT ends exactly where DORA's clock starts. No overlap. No gap. Two metrics, one pipeline, end to end.

06Inside the gap

Intent Lead Time. Four sub-metrics. Three go to zero.

decision
made

artifact
exists

ticket
created

assigned

first
commit

Capture

Sequencing

Pickup

Activation

Sub-components

Sub-metric	Clock	What slows it
Capture latency	decision made → recorded	"we'll document it later"
Sequencing latency	artifact → ticket	triage queue, sprint cadence
Pickup latency	ticket → assigned	backlog prioritization
Activation latency	assigned → first commit	spec ambiguity

Only the first is load-bearing in an intent-driven workflow. Sequencing / pickup / activation are scaffolding from the assembly-line era. When the picker is an agent, the spec is the assignment.

Bands

Elite

< 1 hour

Intent captured live. Agents in the decision session.

High

1 hour → 1 day

Structured specs. Same-day ticket flow.

Median

1 → 3 weeks

Meeting → doc → ticket → assign → commit.

Low

> 4 weeks

Decisions surface in Slack and never get structured.

First-principles estimates, not benchmark data. A metric lives or dies by what gets measured against it.

07The stack

PRDs aren't going away. The format is.

Static Google / Notion docs become a living markdown artifact in the same repo as the code, read by humans and agents on the same line.

Old stack

Code imperative, written

Tests verified at runtime

PRDs were largely static, infrequently reopened. Implementation was the expensive thing, so implementation became the artifact.

New stack (from the whitepaper)

Intent declared, versioned, live

Code one valid implementation

Tests executable spec of behavior

Intent is a first-class artifact, versioned in the repo. The spec is the assignment, picked up directly by humans and agents.

08The opinionated cut

Three practices die. One lives.

dies ×

Tickets

as the unit of work. Sequencing latency goes to zero when the picker is an agent.

dies ×

Long-lived branches

as the unit of collaboration. GitFlow assumed slow coders.

dies ×

PRs as review

scaffolding from the slow-implementation era. Now drag, not gate.

lives ✓

Trunk based development

One source of truth. Specs + code + tests versioned together. Near-instant implementation eliminates the need to defer integration.

"In a truly intent-driven workflow, sequencing, pickup, and activation will collapse to zero." — Intent Lead Time guide, April 2026

Your tickets-per-engineer counts drop while throughput climbs. That's the signal. The scaffolding around code was all sized for the old bottleneck. Remove it.

09Stoa as receipt

I don't just pitch this. I live it daily.

Every number on this slide was mined from the specstoryai/stoa monorepo at build time. Real commits, real doc counts, current as of the deck build.

168Design docs · Sep 2025 → Apr 2026

177Impl docs · Sep 2025 → Apr 2026

10,672Lines of AS-BUILT · across 5 components

Themes: design docs vs implementation docs

Architecture is intentionally over-specified vs. over-built (52 design → 28 impl). Billing/auth flips it (3 → 26). That's "after the prototype" territory: security, handoff, release. Everyone has it; few document it.

Docs written per month

March 2026 (38 design docs, 46 implementation docs) was our push into native Windows, the Intent→Stoa rebrand, and the meeting-document surface. The docs track the surges; they don't lag them.

10Live demo · the agentic release

A dev → main release PR that writes itself.

Humans don't read release diffs. Our fix: the PR writes itself. Four files in .github/workflows/, running on every push to dev.
This is the concrete example you can copy tonight. It's in your takeaways.

01

Orchestrator

release-pr-sync.yml

On every push to dev, find-or-create the single standing "Release dev to main" PR. One PR per release cycle, not one per push.

trigger push → dev

02

Agent · Claude via gh-aw

release-pr-body.md

Reads the full main…dev diff plus every touched docs/design/ and docs/implementation/ file. Synthesizes a structured release: Highlights, Major Change Areas, Operational Notes, Commits.

bounded tools repos · pull_requests · safe output update-pull-request × 1

03

Outputs

PR body + #release-stream

The PR body is the canonical release narrative. Same content reshaped to Slack mrkdwn and posted to the release channel. Humans review the summary, click merge.

reviewer merges in minutes

Proof it runs

PR	Title	State	Lifespan

Last five "Release dev to main" PRs from specstoryai/stoa. 4 of 5 merged; #16 currently open. Two merged in under 10 minutes, because the agent had already written the summary humans would have put off.

Guardrails

✓ One standing PR from dev into main. Always fresh.
✓ main-from-dev-only.yml rejects any PR into main whose head ≠ dev.
✓ Agent has bounded tool access. No shell. One safe output.
✓ The prompt is a markdown file humans edit. The YAML is compiled by gh aw compile.
✓ Slack gets the same canonical body. No duplicate writing.

11Take it home

Three assets to steal.
Three moves to make this week.

Agentic release-PR workflow

Drop-in GitHub Actions + gh-aw prompt. Your release notes write themselves tonight.

takeaways/release-pr-automation/

Intent-driven PRD template

The shape of a PRD an agent can actually pick up. Four sections. No ticket required.

takeaways/intent-driven-prd-template.md

AI-PDLC journey map

Poster version of slide 4. Which tool, which phase, who owns what, every step.

takeaways/ai-pdlc-journey-map.md

Move 01

Put one canonical workflow under the agent.

Release notes, incident summaries, RFC intake. Let it write the body.

Move 02

Version your intent.

One markdown doc, in the repo, with the decision timestamp in the commit.

Move 03

Measure ILT for one feature.

Anything is a baseline. t(first commit) − t(decision).

The clock starts the moment your team agrees.

Greg Ceccarelli Stoa / SpecStory withstoa.com/guides/intent-lead-time Beyond Code-Centric (2025) · deck: gregce.github.io/ai-product-development

APPENDIX · A On agents.md, claude.md, and skills

A minority report on agent conventions.

Everyone is curating AGENTS.md, CLAUDE.md, and custom skills. I don't think most of it earns its keep. Three things do.

What I don't buy

AGENTS.md / CLAUDE.md churn

Different format per tool, bloats toward a wiki nobody audits. Diminishing return past one paragraph.
Custom skill gardens

Too narrow. Age poorly. Maintenance cost exceeds per-use value for all but a handful.
Slash-command metadata

A tax you pay forever for a speedup you forget about in a week.

What actually compounds

01 · One doc per surface

AS-BUILT-ARCHITECTURE.md

One living doc per product surface: CLI, web, desktop, backend. Kept current. Agents @-reference it as their first pass. Receipt: Stoa ships 5 of these, 10,672 lines, all last-updated within 8 days of each other.

02 · Prose-thick commits

Commit messages as memory

Not fix bug. Paragraphs. The why, the constraint, the follow-up. Co-authored with the agent that did the work. Your git log becomes the search index nobody else's repo has.

03 · Two skills that earn it

impeccable.style · last30days

impeccable.style for design/style review; last30days for real-time research past the training cutoff. Both narrow enough to age well, general enough to use weekly.

A new markdown-file convention is not architecture. Your AS-BUILT doc is.

APPENDIX · B What an AS-BUILT.md looks like

stoa-cli / AS-BUILT-ARCHITECTURE.md 3,250 lines · 48 code fences · 21 ASCII diagrams · updated 2026-04-13

1

# Stoa: As-Built Architecture

3

*Last Updated: 2026-04-13*

5

## Table of Contents

19

## Overview

43

### Design Philosophy

55

## System Architecture

57

### High-Level Architecture

102

### Component Interaction Map

145

## Package Structure

197

### Package Responsibilities

240

## Data Flow Diagrams

242

### 1. Real-Time Change Flow (Event Bus + Outbox)

720

### 2. Offline Sync Flow (Catchup)

791

### 3. Episode Switching Flow

864

### 4. Episode Integration Flow (LLM-Powered Merge)

998

### 5. Remote Sync Flow

1107

## Core Components

1109

### 1. Service Manager (pkg/service/manager.go)

1164

### 2. File Watcher (pkg/service/watcher.go)

1239

### 3. Correlation Engine (pkg/correlation/engine.go)

1293

### 4. CRDT Manager + Automerge Engine

1368

### 5. Outbox Store (pkg/outbox/store.go)

1505

### 6. Journal (pkg/journal/sqlite.go)

1592

### 7. Timeline Service

1635

### 8. Ask Service (pkg/service/ask_service.go)

1689

### 9. Interactive Shell (pkg/shell/)

1759

### 10. Git Integration (pkg/git/)

1846

### 11. Episode Manager (pkg/episode/)

2045

### 12. Explorer TUI (pkg/explorer/)

2091

### 13. Snapshot Service

2127

### 14. Digest Service

2165

### 15. Exchange Watcher

2182

### 15b. Git Watcher

2197

### 16. Git-Mode Provenance (blame, trace, intentdb)

2233

### 17. Git Status Filter

2248

### 18. Agent Observer (pkg/agentobserver/)

2264

### 19. Path Utilities

2277

### 20. Actor Identity

2290

### 20a. Space Daemon Coordination

2305

### 21. Provenance Marks in Automerge

2341

### 22. TUI (pkg/tui/app.go)

2411

### 23. Remote Sync (pkg/remote/)

2557

## Storage Architecture

2559

### File System Layout

2683

### Global Configuration

2694

### Data Persistence Strategy

2723

## Network & Collaboration

2725

### Remote Sync Architecture

2781

### Skip List (Circular Detection Prevention)

2806

### Cloud Integration Architecture

3045

## Technology Stack

3047

### Core Technologies

3071

### Go Dependencies

3093

### Rust FFI Crate (automerge-ffi/)

3114

## Performance & Scale

3116

### Performance Characteristics

3145

### Scalability

3160

### Optimization Techniques

3171

## Security & Privacy

3173

### Security Model

3191

### Privacy Considerations

3213

## Future Architecture

APPENDIX · C Harness & workflow

Two harnesses, many terminals,
saved sessions.

How I actually run this day-to-day. Opinionated, not a review. These are the four practices that compound.

01 · Tandem harnesses

Claude Code + Codex

Not one or the other. Claude Code for orchestration and parallel agent teams; Codex with gpt-5.4-xhigh for complex implementations and second-opinion audits. Claude drafts → Codex checks. Or Codex implements the hard part while Claude reviews. The two compound.

02 · Plan before you touch

Violent plan mode + agent teams

Start with /plan. Let the agent enumerate before it edits. Fan out via agent teams (Claude) or sub-agents (Codex). The coordinator coordinates; it doesn't do the mechanical work. Plans are often the entire value. Edits are mechanical once the plan is right.

03 · Context lives in terminals

Many named terminals, in Cursor

Six to ten terminals open at once, each one named. One per repo, one per long-running agent. Stop holding context in your head. The names do it. Modern compaction means each session's context is cheap to carry; stop worrying about priming.

04 · History as context

SpecStory CLI for session memory

Every session saves to .specstory/history/. When a new session needs context from an old one, pipe the relevant markdown into the prompt. Zero re-priming. This is why I stopped writing dedicated "context docs" for agents. The history is the context.

Plan more. Type less.

End of appendix · fin · gregce.github.io/ai-product-development

How product developmentworks in an AI world.

You've been asking four things.I'm focusing in on one.

How product shipped, before agents.

Customer research

Strategy meetings

PRD in Notion

Figma & Jira

Code by hand

QA validation

Release train

Dashboards & retros

Where AI fits. Where it doesn't.

Research with AI

Problem sensing

Live intent capture

Markdown, in repo

Agent in sandbox

Human-owned spec

Agentic PR

Agentic observability

Two collapses happenedon very different clocks.

Intent Lead Time. Four sub-metrics. Three go to zero.

PRDs aren't going away. The format is.

Three practices die. One lives.

I don't just pitch this. I live it daily.

A dev → main release PR that writes itself.

Three assets to steal.Three moves to make this week.

Agentic release-PR workflow

Intent-driven PRD template

AI-PDLC journey map

Put one canonical workflow under the agent.

Version your intent.

Measure ILT for one feature.

A minority report on agent conventions.

Two harnesses, many terminals,saved sessions.

How product development
works in an AI world.

You've been asking four things.
I'm focusing in on one.

Two collapses happened
on very different clocks.

Three assets to steal.
Three moves to make this week.

Two harnesses, many terminals,
saved sessions.