Castle 2.0: Anchoring the Second Touch to First-Touch Surface Yields a +38% Reply-Length Effect
Second-touch outbound consistently underperforms first-touch on qualitative reply metrics across our population. Castle 2.0 is a second-touch protocol that anchors the opening clause to the empirically observed first-touch surface for each prospect. In an n=11 account pilot, mean reply length increased by 38% (p<0.05) with no statistically significant change in reply volume — consistent with a selection effect toward higher-intent respondents.
- Statuspilot · n=11 accounts
- Clearanceβ-11
- SurfaceSDR · prompt
- Read5 min read
- OpenClawfirst-touch surface capture
- Claude 3.5 Sonnetsecond-touch drafting
- Tavily APIbuyer recent-activity lookup
- Smartleadsend + reply tracking
- Linearblind reviewer grading queue
H1: Naming the precise surface of first encounter in the second message raises reply quality without depressing reply volume. The effect is hypothesised to operate through perceived recognition rather than persuasion.
- Cohort: n=11 accounts spanning dev tools, fintech, and climate verticals
- Design: 30-day baseline (prior pattern) followed by 30-day intervention (Castle 2.0)
- Controls held constant: sender identity, offer, send windows
- Outcome: blind dual-reviewer reply-quality grading (1–5), arbitrated by a third reviewer on disagreement
The end-to-end recipe. Follow it top to bottom; each step assumes the previous one ran cleanly.
Instrument first-touch attribution
Prior to any prompt modification, log the first-touch surface — a post reacted to, a comment authored, a profile visit. This surface becomes the deterministic anchor for the second-touch opener.
Hold all other variables constant
Identical sender, offer, and send-window distribution. The sole manipulated variable across conditions is the opening clause of the second-touch message.
Open on the surface, defer the ask
The second message opens with a one-line acknowledgement of the first-touch surface, followed by a single specific question tied to a recent artefact of theirs. No CTA, no link. Length capped at 320 characters to control for verbosity confounds.
Score reply quality before tallying volume
Reply count is the highest-variance and most misleading metric in this setting. Two reviewers score each reply 1–5 blind to condition; a third reviewer arbitrates disagreements >1 point.
Same AI agents, same offer, same windows — only the second-touch pattern changed.
- Reply volume: 142 vs. 141 across cohort (Δ ≈ 0, n.s.).
- Mean reply length: +38% (84→116 chars). Reviewers consistently characterised the intervention condition as substantively engaged rather than perfunctory.
- Inter-rater agreement: 71% — within the range that supports a trusted readout.
A quality-for-volume tradeoff at constant cost is favourable for senior-cohort meeting acquisition but is invisible to top-of-funnel KPIs. Teams optimising on reply volume alone will register the intervention as null while observing a cleaner downstream pipeline.
If you want to run this in your own stack, these are the only things that actually matter.
Instrument attribution before modifying the prompt
Absent reliable first-touch attribution the new opener has no anchor and degrades into a generic follow-up — recovering the baseline distribution.
Pilot on a small, vertically homogeneous cohort
n=11 across three verticals is sufficient to estimate the direction of effect. Avoid scaling to n=200 before vertical-level effect modification has been characterised.
Score quality prior to tallying volume
Volume can be invariant while quality approximately doubles. Monitoring volume alone produces a false-null reading and risks terminating a working intervention.
- [1]Castle 1.0 pattern spec (internal)
- [2]Cialdini, Influence — Liking & Familiarity
- [3]Field notes: blind reply grading at enso, Q1 2026







