Voice-Note-to-Brief: Sub-30s Latency from ≤90s Voice Note to Structured Campaign Brief
Campaign ideation attrition is concentrated in the transition from founder cognition to written brief template. This design dossier specifies a system that ingests a ≤90-second voice note and returns a structured campaign brief plus three candidate opening lines in <30 seconds end-to-end. Implementation pending.
- Statusdesign dossier
- Clearanceδ-17
- SurfaceFIELD · capture
- Read2 min read
- Whisper large-v3verbatim transcription
- GPT-4obrief + opening-line synthesis
- ElevenLabs10-second audio summary back
- Tavily APIaudience + competitor context
- Twiliovoice-note capture from phone
H1: Founders generate higher-quality campaign ideation via spoken vs. written modality. H2: Reducing the brief-production latency below the founder's locomotion window increases campaigns-shipped-per-founder and preserves authorial voice in shipped artefacts.
- Input: a phone-recorded voice note, duration ≤90 seconds
- Output: a single-page structured brief (problem · audience · offer · channels) plus three candidate opening lines
- Target end-to-end latency: <30 seconds
- Return-channel: ≤10-second audio summary, enabling in-motion approval
The end-to-end recipe. Follow it top to bottom; each step assumes the previous one ran cleanly.
Preserve verbatim founder phrasing in transcription
The brief reproduces the founder verbatim at high-leverage points — offer wording and specific buyer pain. These phrases constitute the primary artefact; paraphrasing destroys signal and is suppressed by design.
Fig.The 30-second loop - 0190s voice notephone, on a walk
- 02Transcriptverbatim
- 03Structured briefproblem · audience · offer
- 04Audio summary backapprove while walking
Emit a structured brief, not free-form prose
The model is not prompted for 'a brief'; it returns a fixed set of named fields (problem, audience, offer, three opening lines) which are then rendered into the brief artefact. The constant schema enables cross-campaign comparison at the team level.
Return a ≤10-second audio summary for in-motion approval
Once rendered, a short audio summary is delivered to the founder for asynchronous approval without requiring desk return. Requiring desktop confirmation collapses the entire latency budget and invalidates the design.
- 01Idea on a walktoday: it dies here
- 02Open a template20 min of friction
- 03Write a briefor don't
- 04Hand to AI agentdays later
The status quo is the reason most campaigns never ship.
- No implementation as of this revision — this is a design dossier, not an executed experiment.
- Nearest comparators (voice-to-Notion tooling) terminate at transcription and omit the synthesis step, which carries the decisive value.
- Proposed pilot: n=10 founders × 2 weeks, with the primary outcome being incremental briefs shipped vs. their pre-pilot baseline.
The dominant failure mode to monitor is over-synthesis — the agent produces a fluent brief decoupled from the founder's actual content. The audio return-channel is the lowest-cost mechanism for catching this prior to campaign launch. At the team level the upside is a continuous supply of founder-voiced briefs without scheduled extraction meetings.
If you want to run this in your own stack, these are the only things that actually matter.
Retain the verbatim transcript
Founder trust in the brief is higher when the underlying verbatim transcript is visible beneath it. The transcript functions as evidentiary substrate.
Fix the brief schema before prompting
Allowing the model to determine brief structure produces incompatible artefacts across runs and precludes cross-campaign comparison. The schema must be specified ex ante.
Ship the audio return-channel from v1
The product's value proposition is in-motion launch. Omitting the audio summary forces desktop return and collapses the system to the status quo.
- [1]Internal: FIELD capture surface notes
- [2]Field notes: why founder briefs die (enso interviews, 2026)







