How product development
works in an AI world.
Not a prediction. A practice.
You've been asking four things.
I'm focusing in on one.
- Are PRDs the future?
- What does end-to-end AI product development actually look like?
- What bridges product-AI workflows and engineering-AI workflows?
- What happens after the prototype: security, handoff, release?
How product shipped, before agents.
A specialist at every step. The craft lived in each person's expertise and in the handoffs between them. Coding was the anchor because coding was the skill.
Customer research
Interviews, surveys, market reads. Researchers, PMs, and founders in the field.
Strategy meetings
Vision, OKRs, roadmap bets. Steering committees and quarterly planning.
PRD in Notion
PM drafts the spec after the meeting. Reviewed with stakeholders, revised, approved.
Figma & Jira
UX mockups, prototypes. Eng lead breaks the PRD into tickets.
Code by hand
Engineer + IDE, pair coding, careful reviews. Days to weeks per ticket.
QA validation
QA runs manual and automated tests against acceptance criteria.
Release train
Weekly or monthly ship. PM writes release notes, comms sends the email.
Dashboards & retros
Analytics reviewed weeks later. Retros surface what to do differently next cycle.
Each phase was its own craft. Coordinating across them was the job PMs and eng leads were hired for.
Weeks per feature was expected, because careful human coding was the expertise the whole team queued behind.
Where AI fits. Where it doesn't.
Humans anchor the two endpoints. Tests stay human-owned, always. Agents drive the middles. Automate the DESIGN → BUILD handoff first; that's where Intent Lead Time compresses.
Research with AI
Transcripts auto-summarized. Signals surfaced. Humans still frame the question.
Problem sensing
Vision and bets stay human. AI synthesizes inputs; people make the calls.
Live intent capture
Agents attend. Decisions timestamped, owner attributed, source-linked.
Markdown, in repo
PRDs + design docs drafted live, versioned with the code.
Agent in sandbox
Local CLI or cloud. Full session context. Humans intervene when needed.
Human-owned spec
"Never let AI specify your tests." Agents implement; humans declare correct behavior.
Agentic PR
Release body written by the agent from the diff + intent. Human reviews, one click.
Agentic observability
Dashboards auto-summarize. Agents surface anomalies. Humans decide what to do next.
Tests encode shared domain judgment about what "correct" means. Ceding that cedes control. Agents implement; humans specify.
That's where activation latency lives. If the spec is picked up by the agent without a human re-routing it, most of your ILT disappears.
Two collapses happened
on very different clocks.
The bottleneck moved. DORA can't see it by design. DORA's clock starts at git commit. The expensive thing now happens before commit. That gap has a name.
| Phase | 2015 | 2026 |
|---|---|---|
| Intent capture | weeks | weeks still |
| Implementation | days to weeks | hours (agents) |
| Deploy | hours to days | minutes (CI/CD) |
Intent Lead Time. Four sub-metrics. Three go to zero.
made
exists
created
commit
| Sub-metric | Clock | What slows it |
|---|---|---|
| Capture latency | decision made → recorded | "we'll document it later" |
| Sequencing latency | artifact → ticket | triage queue, sprint cadence |
| Pickup latency | ticket → assigned | backlog prioritization |
| Activation latency | assigned → first commit | spec ambiguity |
Only the first is load-bearing in an intent-driven workflow. Sequencing / pickup / activation are scaffolding from the assembly-line era. When the picker is an agent, the spec is the assignment.
First-principles estimates, not benchmark data. A metric lives or dies by what gets measured against it.
PRDs aren't going away. The format is.
Static Google / Notion docs become a living markdown artifact in the same repo as the code, read by humans and agents on the same line.
PRDs were largely static, infrequently reopened. Implementation was the expensive thing, so implementation became the artifact.
Intent is a first-class artifact, versioned in the repo. The spec is the assignment, picked up directly by humans and agents.
Three practices die. One lives.
Your tickets-per-engineer counts drop while throughput climbs. That's the signal. The scaffolding around code was all sized for the old bottleneck. Remove it.
I don't just pitch this. I live it daily.
Every number on this slide was mined from the specstoryai/stoa monorepo at build time. Real commits, real doc counts, current as of the deck build.
Architecture is intentionally over-specified vs. over-built (52 design → 28 impl). Billing/auth flips it (3 → 26). That's "after the prototype" territory: security, handoff, release. Everyone has it; few document it.
March 2026 (38 design docs, 46 implementation docs) was our push into native Windows, the Intent→Stoa rebrand, and the meeting-document surface. The docs track the surges; they don't lag them.
A dev → main release PR that writes itself.
Humans don't read release diffs. Our fix: the PR writes itself. Four files in .github/workflows/, running on every push to dev.
This is the concrete example you can copy tonight. It's in your takeaways.
dev, find-or-create the single standing "Release dev to main" PR. One PR per release cycle, not one per push.main…dev diff plus every touched docs/design/ and docs/implementation/ file. Synthesizes a structured release: Highlights, Major Change Areas, Operational Notes, Commits.| PR | Title | State | Lifespan |
|---|
Last five "Release dev to main" PRs from specstoryai/stoa. 4 of 5 merged; #16 currently open. Two merged in under 10 minutes, because the agent had already written the summary humans would have put off.
- ✓ One standing PR from
devintomain. Always fresh. - ✓
main-from-dev-only.ymlrejects any PR into main whose head ≠dev. - ✓ Agent has bounded tool access. No shell. One safe output.
- ✓ The prompt is a markdown file humans edit. The YAML is compiled by
gh aw compile. - ✓ Slack gets the same canonical body. No duplicate writing.
Three assets to steal.
Three moves to make this week.
Agentic release-PR workflow
Drop-in GitHub Actions + gh-aw prompt. Your release notes write themselves tonight.
Intent-driven PRD template
The shape of a PRD an agent can actually pick up. Four sections. No ticket required.
AI-PDLC journey map
Poster version of slide 4. Which tool, which phase, who owns what, every step.
Put one canonical workflow under the agent.
Release notes, incident summaries, RFC intake. Let it write the body.
Version your intent.
One markdown doc, in the repo, with the decision timestamp in the commit.
Measure ILT for one feature.
Anything is a baseline. t(first commit) − t(decision).
A minority report on agent conventions.
Everyone is curating AGENTS.md, CLAUDE.md, and custom skills. I don't think most of it earns its keep. Three things do.
-
AGENTS.md / CLAUDE.md churnDifferent format per tool, bloats toward a wiki nobody audits. Diminishing return past one paragraph.
-
Custom skill gardensToo narrow. Age poorly. Maintenance cost exceeds per-use value for all but a handful.
-
Slash-command metadataA tax you pay forever for a speedup you forget about in a week.
One living doc per product surface: CLI, web, desktop, backend. Kept current. Agents @-reference it as their first pass. Receipt: Stoa ships 5 of these, 10,672 lines, all last-updated within 8 days of each other.
Not fix bug. Paragraphs. The why, the constraint, the follow-up. Co-authored with the agent that did the work. Your git log becomes the search index nobody else's repo has.
impeccable.style for design/style review; last30days for real-time research past the training cutoff. Both narrow enough to age well, general enough to use weekly.
Two harnesses, many terminals,
saved sessions.
How I actually run this day-to-day. Opinionated, not a review. These are the four practices that compound.
Not one or the other. Claude Code for orchestration and parallel agent teams; Codex with gpt-5.4-xhigh for complex implementations and second-opinion audits. Claude drafts → Codex checks. Or Codex implements the hard part while Claude reviews. The two compound.
Start with /plan. Let the agent enumerate before it edits. Fan out via agent teams (Claude) or sub-agents (Codex). The coordinator coordinates; it doesn't do the mechanical work. Plans are often the entire value. Edits are mechanical once the plan is right.
Six to ten terminals open at once, each one named. One per repo, one per long-running agent. Stop holding context in your head. The names do it. Modern compaction means each session's context is cheap to carry; stop worrying about priming.
Every session saves to .specstory/history/. When a new session needs context from an old one, pipe the relevant markdown into the prompt. Zero re-priming. This is why I stopped writing dedicated "context docs" for agents. The history is the context.