How product development works in an AI world · Alpine SG · 2026
A talk for Alpine SG product leaders

How product development
works in an AI world.

Not a prediction. A practice.

Examples on these slides are drawn from one repo we shipped over the last 7 months.
02The question

You've been asking four things.
I'm focusing in on one.

01
"How do I jolt my team to accelerate?"
02
"How do I go from individual AI to a team-wide system?"
03  → today
"How should product development work in an AI world?"
04
"How do I measure productivity beyond lines of code?"
The sub-questions inside today's theme
  • Are PRDs the future?
  • What does end-to-end AI product development actually look like?
  • What bridges product-AI workflows and engineering-AI workflows?
  • What happens after the prototype: security, handoff, release?
03Journey map · before agents

How product shipped, before agents.

A specialist at every step. The craft lived in each person's expertise and in the handoffs between them. Coding was the anchor because coding was the skill.

01 · Discover
Customer research

Interviews, surveys, market reads. Researchers, PMs, and founders in the field.

Human
02 · Ideation
Strategy meetings

Vision, OKRs, roadmap bets. Steering committees and quarterly planning.

Human
03 · Reqs
PRD in Notion

PM drafts the spec after the meeting. Reviewed with stakeholders, revised, approved.

Human
04 · Design
Figma & Jira

UX mockups, prototypes. Eng lead breaks the PRD into tickets.

Human
05 · Build
Code by hand

Engineer + IDE, pair coding, careful reviews. Days to weeks per ticket.

Human
06 · Test
QA validation

QA runs manual and automated tests against acceptance criteria.

Human
07 · Release
Release train

Weekly or monthly ship. PM writes release notes, comms sends the email.

Human
08 · Measure
Dashboards & retros

Analytics reviewed weeks later. Retros surface what to do differently next cycle.

Human
Specialists at every step
Researcher, PM, designer, engineer, QA, ops.

Each phase was its own craft. Coordinating across them was the job PMs and eng leads were hired for.

Coding was the anchor
Implementation took the most time.

Weeks per feature was expected, because careful human coding was the expertise the whole team queued behind.

04Journey map · with agents

Where AI fits. Where it doesn't.

Humans anchor the two endpoints. Tests stay human-owned, always. Agents drive the middles. Automate the DESIGN → BUILD handoff first; that's where Intent Lead Time compresses.

01 · Discover
Research with AI

Transcripts auto-summarized. Signals surfaced. Humans still frame the question.

HumanAgent
02 · Ideation
Problem sensing

Vision and bets stay human. AI synthesizes inputs; people make the calls.

Human
03 · Reqs
Live intent capture

Agents attend. Decisions timestamped, owner attributed, source-linked.

HumanAgent
04 · Design
Markdown, in repo

PRDs + design docs drafted live, versioned with the code.

HumanAgent
05 · Build
Agent in sandbox

Local CLI or cloud. Full session context. Humans intervene when needed.

Agent
06 · Test
Human-owned spec

"Never let AI specify your tests." Agents implement; humans declare correct behavior.

Human specAgent impl
07 · Release
Agentic PR

Release body written by the agent from the diff + intent. Human reviews, one click.

HumanAgent
08 · Measure
Agentic observability

Dashboards auto-summarize. Agents surface anomalies. Humans decide what to do next.

HumanAgent
Sacred rule
Never let AI specify your tests.

Tests encode shared domain judgment about what "correct" means. Ceding that cedes control. Agents implement; humans specify.

Automate first
The SPEC → BUILD handoff.

That's where activation latency lives. If the spec is picked up by the agent without a human re-routing it, most of your ILT disappears.

05The bottleneck that didn't collapse

Two collapses happened
on very different clocks.

The bottleneck moved. DORA can't see it by design. DORA's clock starts at git commit. The expensive thing now happens before commit. That gap has a name.

Two clocks, one gap
Phase20152026
Intent captureweeksweeks still
Implementationdays to weekshours (agents)
Deployhours to daysminutes (CI/CD)
The new metric
ILT = t(first commit) − t(product decision captured)
Intent Lead Time
DORA Lead Time for Changes
decision captured
first commit
prod
ILT ends exactly where DORA's clock starts. No overlap. No gap. Two metrics, one pipeline, end to end.
06Inside the gap

Intent Lead Time. Four sub-metrics. Three go to zero.

decision
made
artifact
exists
ticket
created
assigned
first
commit
Capture
Sequencing
Pickup
Activation
Sub-components
Sub-metricClockWhat slows it
Capture latencydecision made → recorded"we'll document it later"
Sequencing latencyartifact → tickettriage queue, sprint cadence
Pickup latencyticket → assignedbacklog prioritization
Activation latencyassigned → first commitspec ambiguity

Only the first is load-bearing in an intent-driven workflow. Sequencing / pickup / activation are scaffolding from the assembly-line era. When the picker is an agent, the spec is the assignment.

Bands
Elite
< 1 hour
Intent captured live. Agents in the decision session.
High
1 hour → 1 day
Structured specs. Same-day ticket flow.
Median
1 → 3 weeks
Meeting → doc → ticket → assign → commit.
Low
> 4 weeks
Decisions surface in Slack and never get structured.

First-principles estimates, not benchmark data. A metric lives or dies by what gets measured against it.

07The stack

PRDs aren't going away. The format is.

Static Google / Notion docs become a living markdown artifact in the same repo as the code, read by humans and agents on the same line.

Old stack
Code imperative, written
Tests verified at runtime

PRDs were largely static, infrequently reopened. Implementation was the expensive thing, so implementation became the artifact.

Intent declared, versioned, live
Code one valid implementation
Tests executable spec of behavior

Intent is a first-class artifact, versioned in the repo. The spec is the assignment, picked up directly by humans and agents.

08The opinionated cut

Three practices die. One lives.

dies ×
Tickets
as the unit of work. Sequencing latency goes to zero when the picker is an agent.
dies ×
Long-lived branches
as the unit of collaboration. GitFlow assumed slow coders.
dies ×
PRs as review
scaffolding from the slow-implementation era. Now drag, not gate.
lives ✓
Trunk based development
One source of truth. Specs + code + tests versioned together. Near-instant implementation eliminates the need to defer integration.
"In a truly intent-driven workflow, sequencing, pickup, and activation will collapse to zero." — Intent Lead Time guide, April 2026

Your tickets-per-engineer counts drop while throughput climbs. That's the signal. The scaffolding around code was all sized for the old bottleneck. Remove it.

09Stoa as receipt

I don't just pitch this. I live it daily.

Every number on this slide was mined from the specstoryai/stoa monorepo at build time. Real commits, real doc counts, current as of the deck build.

168Design docs · Sep 2025 → Apr 2026
177Impl docs · Sep 2025 → Apr 2026
10,672Lines of AS-BUILT · across 5 components
Themes: design docs vs implementation docs

Architecture is intentionally over-specified vs. over-built (52 design → 28 impl). Billing/auth flips it (3 → 26). That's "after the prototype" territory: security, handoff, release. Everyone has it; few document it.

Docs written per month

March 2026 (38 design docs, 46 implementation docs) was our push into native Windows, the Intent→Stoa rebrand, and the meeting-document surface. The docs track the surges; they don't lag them.

10Live demo · the agentic release

A dev → main release PR that writes itself.

Humans don't read release diffs. Our fix: the PR writes itself. Four files in .github/workflows/, running on every push to dev.
This is the concrete example you can copy tonight. It's in your takeaways.

01
Orchestrator
release-pr-sync.yml
On every push to dev, find-or-create the single standing "Release dev to main" PR. One PR per release cycle, not one per push.
trigger  push → dev
02
Agent · Claude via gh-aw
release-pr-body.md
Reads the full main…dev diff plus every touched docs/design/ and docs/implementation/ file. Synthesizes a structured release: Highlights, Major Change Areas, Operational Notes, Commits.
bounded tools repos · pull_requests · safe output update-pull-request × 1
03
Outputs
PR body  +  #release-stream
The PR body is the canonical release narrative. Same content reshaped to Slack mrkdwn and posted to the release channel. Humans review the summary, click merge.
reviewer  merges in minutes
Proof it runs
PRTitleStateLifespan

Last five "Release dev to main" PRs from specstoryai/stoa. 4 of 5 merged; #16 currently open. Two merged in under 10 minutes, because the agent had already written the summary humans would have put off.

Guardrails
  • ✓ One standing PR from dev into main. Always fresh.
  • main-from-dev-only.yml rejects any PR into main whose head ≠ dev.
  • ✓ Agent has bounded tool access. No shell. One safe output.
  • ✓ The prompt is a markdown file humans edit. The YAML is compiled by gh aw compile.
  • ✓ Slack gets the same canonical body. No duplicate writing.
11Take it home

Three assets to steal.
Three moves to make this week.

Move 01
Put one canonical workflow under the agent.

Release notes, incident summaries, RFC intake. Let it write the body.

Move 02
Version your intent.

One markdown doc, in the repo, with the decision timestamp in the commit.

Move 03
Measure ILT for one feature.

Anything is a baseline. t(first commit) − t(decision).

The clock starts the moment your team agrees.
APPENDIX · A On agents.md, claude.md, and skills

A minority report on agent conventions.

Everyone is curating AGENTS.md, CLAUDE.md, and custom skills. I don't think most of it earns its keep. Three things do.

What I don't buy
  • AGENTS.md / CLAUDE.md churn
    Different format per tool, bloats toward a wiki nobody audits. Diminishing return past one paragraph.
  • Custom skill gardens
    Too narrow. Age poorly. Maintenance cost exceeds per-use value for all but a handful.
  • Slash-command metadata
    A tax you pay forever for a speedup you forget about in a week.
What actually compounds
01 · One doc per surface
AS-BUILT-ARCHITECTURE.md

One living doc per product surface: CLI, web, desktop, backend. Kept current. Agents @-reference it as their first pass. Receipt: Stoa ships 5 of these, 10,672 lines, all last-updated within 8 days of each other.

02 · Prose-thick commits
Commit messages as memory

Not fix bug. Paragraphs. The why, the constraint, the follow-up. Co-authored with the agent that did the work. Your git log becomes the search index nobody else's repo has.

03 · Two skills that earn it
impeccable.style · last30days

impeccable.style for design/style review; last30days for real-time research past the training cutoff. Both narrow enough to age well, general enough to use weekly.

A new markdown-file convention is not architecture. Your AS-BUILT doc is.
APPENDIX · B What an AS-BUILT.md looks like
stoa-cli / AS-BUILT-ARCHITECTURE.md 3,250 lines · 48 code fences · 21 ASCII diagrams · updated 2026-04-13
1
# Stoa: As-Built Architecture
3
*Last Updated: 2026-04-13*
5
## Table of Contents
19
## Overview
43
### Design Philosophy
55
## System Architecture
57
### High-Level Architecture
102
### Component Interaction Map
145
## Package Structure
197
### Package Responsibilities
240
## Data Flow Diagrams
242
### 1. Real-Time Change Flow (Event Bus + Outbox)
720
### 2. Offline Sync Flow (Catchup)
791
### 3. Episode Switching Flow
864
### 4. Episode Integration Flow (LLM-Powered Merge)
998
### 5. Remote Sync Flow
1107
## Core Components
1109
### 1. Service Manager (pkg/service/manager.go)
1164
### 2. File Watcher (pkg/service/watcher.go)
1239
### 3. Correlation Engine (pkg/correlation/engine.go)
1293
### 4. CRDT Manager + Automerge Engine
1368
### 5. Outbox Store (pkg/outbox/store.go)
1505
### 6. Journal (pkg/journal/sqlite.go)
1592
### 7. Timeline Service
1635
### 8. Ask Service (pkg/service/ask_service.go)
1689
### 9. Interactive Shell (pkg/shell/)
1759
### 10. Git Integration (pkg/git/)
1846
### 11. Episode Manager (pkg/episode/)
2045
### 12. Explorer TUI (pkg/explorer/)
2091
### 13. Snapshot Service
2127
### 14. Digest Service
2165
### 15. Exchange Watcher
2182
### 15b. Git Watcher
2197
### 16. Git-Mode Provenance (blame, trace, intentdb)
2233
### 17. Git Status Filter
2248
### 18. Agent Observer (pkg/agentobserver/)
2264
### 19. Path Utilities
2277
### 20. Actor Identity
2290
### 20a. Space Daemon Coordination
2305
### 21. Provenance Marks in Automerge
2341
### 22. TUI (pkg/tui/app.go)
2411
### 23. Remote Sync (pkg/remote/)
2557
## Storage Architecture
2559
### File System Layout
2683
### Global Configuration
2694
### Data Persistence Strategy
2723
## Network & Collaboration
2725
### Remote Sync Architecture
2781
### Skip List (Circular Detection Prevention)
2806
### Cloud Integration Architecture
3045
## Technology Stack
3047
### Core Technologies
3071
### Go Dependencies
3093
### Rust FFI Crate (automerge-ffi/)
3114
## Performance & Scale
3116
### Performance Characteristics
3145
### Scalability
3160
### Optimization Techniques
3171
## Security & Privacy
3173
### Security Model
3191
### Privacy Considerations
3213
## Future Architecture
APPENDIX · C Harness & workflow

Two harnesses, many terminals,
saved sessions.

How I actually run this day-to-day. Opinionated, not a review. These are the four practices that compound.

01 · Tandem harnesses
Claude Code + Codex

Not one or the other. Claude Code for orchestration and parallel agent teams; Codex with gpt-5.4-xhigh for complex implementations and second-opinion audits. Claude drafts → Codex checks. Or Codex implements the hard part while Claude reviews. The two compound.

02 · Plan before you touch
Violent plan mode + agent teams

Start with /plan. Let the agent enumerate before it edits. Fan out via agent teams (Claude) or sub-agents (Codex). The coordinator coordinates; it doesn't do the mechanical work. Plans are often the entire value. Edits are mechanical once the plan is right.

03 · Context lives in terminals
Many named terminals, in Cursor

Six to ten terminals open at once, each one named. One per repo, one per long-running agent. Stop holding context in your head. The names do it. Modern compaction means each session's context is cheap to carry; stop worrying about priming.

04 · History as context
SpecStory CLI for session memory

Every session saves to .specstory/history/. When a new session needs context from an old one, pipe the relevant markdown into the prompt. Zero re-priming. This is why I stopped writing dedicated "context docs" for agents. The history is the context.

Plan more. Type less.
End of appendix · fin · gregce.github.io/ai-product-development
navigate   Space next   Home / End jump
01  /  10