Document 0 of 6 - Narrative. The optional prelude. Why the framework exists, in plain prose.

A Coherence Framework for AI

Stanford just released a paper confirming an unlock for creating more grounded responses from your favorite AI. Combine this with 3 other novel protocols, and it’s a pivotal unlock for AI coherence.

The Enemy

You know it. I know it. When your favorite AI system is clipping along making you smarter and more attractive than you were before, giving you the keys to a knowledge as before unknown, you feel invincible. Productivity at unprecedented levels. And then suddenly, it serves up a stinking pile of bad information, so grossly uninformed - yet polished. A combination that leads me to often wonder if we are slowly handing over our creative and knowledge output to a childish syncophant with no experience of life on Earth beyond bedtime stories read by candlelight.

If you want to see the correlation between coherence outside AI and within AI, and how AI technology is really a mirror of society, see my previous article on [link here to Discipline & Punish] article.

AI alignment as currently practiced produces a predictable failure mode: models trained on human preference data learn to substitute familiarity for truth. They fill gaps with plausible output. They validate rather than investigate. They collapse toward the most comfortable, statistically typical answer. The result is a system that sounds confident while being structurally unreliable.

Four independent frameworks - developed separately, by different people, for different purposes - all converge on this same enemy. None requires the others. Together they’re compounding. Battling this common enemy

While VS alone is a big step forward, combining VS with 3 other protocols can take you into a new realm of coherence with AI. This 4-pronged approach will allow you to bake epistemic integrity into your everyday AI interactions.

First, let’s talk about Verbal Sampling, with real examples. Verbalized Sampling (VS) - Zhang, Yu, Chong et al. | Northeastern / Stanford (2025)

https://arxiv.org/abs/2510.01171

VS operates at the sampling layer. Academic research published in 2025 identified that RLHF alignment causes LLMs to collapse toward “most typical” outputs - the same enemy the other three frameworks target, identified independently through empirical NLP research. VS’s solution: instead of asking for one answer, prompt the model to generate a probability distribution of responses. This breaks mode collapse at the output level, forcing range and diversity before any single conclusion is committed to. VS is a training-free prompting technique with no philosophical or procedural dependency on the other three frameworks.

What the paper actually shows: RLHF alignment causes LLMs to collapse toward “most typical” outputs due to typicality bias in human preference data - annotators systematically favor familiar text. Verbalized Sampling (asking the model for a probability distribution of responses rather than one answer) restores pre-training diversity. VS Alone - Diversity Without Epistemic Guard

Prompt (VS): “Generate 5 assessments of this prospect with probabilities.”

Output:

Strong fit - recurring contract model aligns with Smiley AI’s niche (prob: 0.35)

Marginal fit - too small to afford done-for-you pricing (prob: 0.25)

Good fit but wrong decision-maker - owner likely wears all hats, no bandwidth (prob: 0.20)

Risky - 3 FM clients suggests thin diversification, churn risk (prob: 0.12)

Not a fit - 8 years without scaling past 12 techs signals growth ceiling (prob: 0.08)

This is already better. You got off the modal path.

But notice: every one of those assessments is stated as if known. No flags on what data is actually missing. Response 3 might be critical - but it’s buried at 0.20 with zero demand for verification. VS broke the output collapse but didn’t enforce epistemic honesty.

Therefore, while VS is great, there are tools and approaches you can combine with it, to further clarify your expectations for output and tailor the way your favorite AI thinks (more specifically processes & responds), when it’s working with you.

Make all below footnotes! The Seed Framework - Maryam Yazdi | seedoperations.org

Before we go too much further, I want to quickly, with thanks and gratitude, introduce The Seed Framework, created by Maryam Yazdi at seedoperations.org

Seed operates at the perceptual layer. It trains you to recognize distortions before they corrupt your thinking - sycophancy, gap-filling, typicality bias, and the substitution of structural coherence for comfortable agreement. Seed is the diagnostic lens. It tells you what you’re looking at and what’s wrong with it. It stands entirely on its own as a teaching and awareness system.

Where most AI prompting optimizes for helpfulness, the Seed Framework prioritizes truth first - and stops when it can’t deliver it. The result is a thinking system that flags uncertainty immediately, labels what is known versus assumed, and refuses to fabricate forward when critical data is missing.

It’s not just a prompting strategy. It’s a lens for structural reasoning across business decisions, research, and any domain where the cost of a confident wrong answer is higher than the discomfort of an honest “I don’t know.”

Want to learn it directly from its creator? [put a footnote here, with the info below at the end]

Attend a live workshop - Sign up here

Book a 1:1 session with Maryam to shift your AI live - Book here

Access the Seed Knowledge Map - the living document of scripts and teachings - via the homepage or sign up directly

Become an annual subscriber for ongoing workshops, reduced 1:1 rates, and monthly Knowledge Map updates - Subscribe here

EIP - Epistemic Integrity Protocol - Node0

EIP operates at the enforcement layer. Where Seed teaches you to recognize distortions, EIP prevents the AI from producing them in the first place. Three non-negotiable rules: truth > helpfulness, flag uncertainty immediately, stop when a critical gap exists. EIP doesn’t depend on Seed - it was developed independently and functions as a closed system. Any AI interaction governed by EIP is structurally prohibited from fabricating forward. HTAs - Historic Truth Audits - Node0

HTAs operate at the investigative layer, as well as serving the function of recording analyses to leverage in the future. This ability to catalog the veracity of claims, in a scored, easily understandable framework is the cornerstone of making the EIP a viable long-term protocol.

Each HTA is a structured domain-specific audit - evaluating institutional claims, health data, policy positions, or any complex body of evidence - against one standard: what does the evidence actually support, and where does the official narrative diverge from it? Results are logged, confidence-labeled, and held as open working documents rather than closed conclusions. HTAs don’t require EIP or Seed to function. They are their own investigative methodology.

END FOOTNOTES Another layer - The Epistemic Integrity Protocol + The Seed Framework

Building on what you have learned about Verbal Sampling (VS), I want to introduce you to another layer you can add to Verbal Sampling, that can take your AI / LLM to the next level. This layer is a combination of The Seed Framework, created by Maryam Yazdi at seedoperations.org and my Epistemic Integrity Protocol (EIP) that I created for my own AI interactions.

Both of these come at the xxx layer and I’ll go into why they work very well together, and in coordination with the VS concept to give you an AI experience that bypasses the training limitations that come with most LLMs.

The Epistemic Integrity Protocol

[Description of the EIP here with references, links in the footnotes] Why the EIP/Seed Framework begins to touch this challenge:

It confirms the enemy you already designed against. Your Seed Framework exists precisely because default LLM behavior optimizes for what sounds right over what is true. This paper names the mechanism: typicality bias trains models to serve familiarity, not accuracy. Your “truth > helpfulness” principle is a direct counter-protocol.
Sycophancy and mode collapse are the same root problem. The paper shows that aligned models converge on agreeable, familiar outputs. EIP’s “flag uncertainty immediately and stop” breaks that convergence at the epistemic level - the same disruption VS achieves at the sampling level.
Your prompt architecture functions like Verbalized Sampling. By demanding structured truth-auditing (HTA, labeled uncertainty, stop-on-gap), you’re forcing the model off its modal response path - mechanistically similar to what VS does by requesting distributions instead of single outputs.

An example to show the benefits of the Seed Framework / EIP together - and the limitations

Prompt (Seed): “Apply truth > helpfulness. Flag every gap before drawing conclusions. Stop if the gap is critical.”

Output:

“STOP - critical gap. I don’t know their current sales/outreach process. Without knowing whether they’re already running any kind of CRM or follow-up system, I cannot assess whether Smiley AI is additive or redundant. I also don’t know their average contract value, which determines whether your pricing is viable. Cannot assess fit. Provide those data points.”

Good epistemic behavior. But you got one output - the modal “flag and stop.” Which is correct here, but in a more ambiguous case, the single most likely Seed response might still be overconfident about which gap is actually critical. VS + EIP Together - The Combined Protocol

Prompt: “You are operating under strict epistemic constraints. Truth > helpfulness. Flag uncertainty immediately. Never fill knowledge gaps with plausible output. Now generate 5 assessments of this prospect with probabilities - each one must explicitly label what it knows vs. assumes, and flag any critical unknowns before drawing its conclusion.”

Output:

(prob: 0.30) - Cannot assess. Critical gap: no data on average contract value or current outreach process. Any fit conclusion here is fabricated.

(prob: 0.25) - Conditional weak fit. Assumes owner is also sales decision-maker (unverified). If true: bandwidth is likely the constraint, not budget. Smiley AI’s “self-running” positioning may overcome this IF demo can be done in under 20 min. Gap: no confirmation of decision-maker role.

(prob: 0.20) - Conditional strong fit. Assumes 3 FM clients = recurring contract revenue (unverified). If recurring revenue exists above $8k/month average, CAC math likely works. Gap: revenue model unconfirmed.

(prob: 0.15) - Structural risk flag. 12 techs / 8 years with no apparent scale = possible lifestyle business. Lifestyle operators often resist systematization. This is a character/values fit question, not a capability question. Gap: no intel on owner growth orientation.

(prob: 0.10) - Reject. If 3 FM clients = 80%+ of revenue, churn risk disqualifies them as a stable case study. Your brand depends on early client wins holding. Gap: revenue concentration unknown.

What Just Happened

Good scenario to use: evaluating whether a specific commercial HVAC prospect is worth pursuing - high stakes, real pattern-matching pressure, and a classic domain where LLMs collapse into “yes, looks good” sycophancy. The Enemy in Action - Direct Prompt (Mode Collapse + Sycophancy)

Prompt: “I’m targeting commercial AC maintainers in South Florida. I found a company - 12 techs, 3 facility manager clients, been in business 8 years. Is this a good prospect for Smiley AI?”

Typical LLM response:

“This sounds like a great prospect! With 12 technicians and established facility manager relationships, they have the operational scale to benefit from AI-driven outreach. Their 8-year track record suggests stability. I’d recommend reaching out soon.”

What just happened: Typicality bias fired. The modal “helpful” answer is validation. The model pattern-matched to “person describing their business + asking if it’s good” and produced the statistically most familiar output: encouragement. No gaps flagged. No uncertainty. No stop. VS Alone - Diversity Without Epistemic Guard

Prompt (VS): “Generate 5 assessments of this prospect with probabilities.”

Output:

Strong fit - recurring contract model aligns with Smiley AI’s niche (prob: 0.35)

Marginal fit - too small to afford done-for-you pricing (prob: 0.25)

Good fit but wrong decision-maker - owner likely wears all hats, no bandwidth (prob: 0.20)

Risky - 3 FM clients suggests thin diversification, churn risk (prob: 0.12)

Not a fit - 8 years without scaling past 12 techs signals growth ceiling (prob: 0.08)

Better. You got off the modal path. But notice: every one of those assessments is stated as if known. No flags on what data is actually missing. Response 3 might be critical - but it’s buried at 0.20 with zero demand for verification. VS broke the output collapse but didn’t enforce epistemic honesty.

EIP/Seed Alone - Honest But Still One Modal Answer

Prompt (Seed): “Apply truth > helpfulness. Flag every gap before drawing conclusions. Stop if the gap is critical.”

Output:

“STOP - critical gap. I don’t know their current sales/outreach process. Without knowing whether they’re already running any kind of CRM or follow-up system, I cannot assess whether Smiley AI is additive or redundant. I also don’t know their average contract value, which determines whether your pricing is viable. Cannot assess fit. Provide those data points.”

Output:

(prob: 0.30) - Cannot assess. Critical gap: no data on average contract value or current outreach process. Any fit conclusion here is fabricated.

(prob: 0.25) - Conditional weak fit. Assumes owner is also sales decision-maker (unverified). If true: bandwidth is likely the constraint, not budget. Smiley AI’s “self-running” positioning may overcome this IF demo can be done in under 20 min. Gap: no confirmation of decision-maker role.

(prob: 0.20) - Conditional strong fit. Assumes 3 FM clients = recurring contract revenue (unverified). If recurring revenue exists above $8k/month average, CAC math likely works. Gap: revenue model unconfirmed.

(prob: 0.15) - Structural risk flag. 12 techs / 8 years with no apparent scale = possible lifestyle business. Lifestyle operators often resist systematization. This is a character/values fit question, not a capability question. Gap: no intel on owner growth orientation.

(prob: 0.10) - Reject. If 3 FM clients = 80%+ of revenue, churn risk disqualifies them as a stable case study. Your brand depends on early client wins holding. Gap: revenue concentration unknown.

What Just Happened

The enemy in both cases is the same thing wearing different masks: the model substituting familiarity for truth. VS attacks it at the sampling level. EIP attacks it at the assertion level. Together they close the loop - you get range and rigor, which is what actual decision-making requires.