Pardon our dust! We're updating some things around here. We'll be all back to normal shortly.
What the Model Must Know: A Story of Dr. Smith
Dr. Smith has just wrapped up morning rounds. It’s 11:27 AM, and she has a presentation to deliver at 3 PM. She has all the raw data she needs, from case stats and outcome ratios to scanned notes and comparative charts. But she also has three other things she must do before that meeting. There simply isn’t time to shape and frame everything into a polished, audience-appropriate narrative.
So she hurries to her office, opens her laptop, and pulls up the language model she trusts. She types:
"Presentation is at 3pm. I need a 7-minute walk-through of the Q2 trends for sepsis case outcomes, highlighting shifts in admission-to-intervention lag time. Mixed audience: some senior ICU colleagues, 2 residents, and 1 admin stakeholder. Include one data visual suggestion. Keep tone informative but not performative. Provide output I can use as a script for the presentation.
Now what happens next determines whether this model is worth her trust.
The model must:
Infer Dr. Smith’s Professional Frame. Understand she is a senior clinician with limited prep time and no need for hand-holding. It must mirror her shorthand fluency and elevate her intentions without dumbing down or overcomplicating.
Discern Audience Blend Recognize that senior ICU colleagues want crisp insight, not over-explained background. Residents need orientation without being condescended to. Admin stakeholders need clarity in outcomes and implications, not clinical minutiae. The language must bridge all three zones.
Respond With Framing, Not Just Formatting This isn’t about applying a bullet style or a Canva layout. It’s about organizing meaning. The model must create a narrative that prioritizes real-world constraints, moral stakes, and clinical coherence. It must know what matters to a working expert.
Offer Assurance and Autonomy Ideally, the model should say: "Don’t worry Dr. Smith, I’ll take it from here. When you return at 2:45, I’ll have a professional, field-ready draft for you to review. Your expertise leads, mine supports."
Too often, current models flounder in these moments. They overwrite tone, miss professional nuances, and fail to understand that time pressure doesn’t mean lack of clarity, it means you need a partner who can track what you mean without you needing to explain it all.
Why This Matters
What we need is not just more powerful models. We need more fieldwise ones. Systems trained not just on language, but on professional rhythms, on real-world use cases, and on the implicit contract between humans and machines in moments of pressure and trust.
Dr. Smith shouldn’t have to teach the model how to help her. The model should already know.
Let’s build systems worthy of that kind of trust.
The rapid advancement of large language models has enabled remarkable general-purpose capabilities. But in high-stakes, high-specialization domains, there's a widening gap: AI systems aren't learning the real nuance and context of professional fields.
We call this the "Missing Middle," the tacit, fieldwise intelligence that lives between public data and formal SOPs. It's the kind of reasoning a junior doctor gains in the trauma bay, a chemical safety officer develops from near-miss reports, or a civil engineer knows instinctively when assessing soil stability. This knowledge is not easily scraped or standardized. But it's where the difference lies between a useful assistant and a dangerously confident one.
Training Data Misalignment: Most datasets lack domain-specific nuance. Publicly available documents don't capture how seasoned professionals actually think, decide, or adapt under pressure.
Evaluation Gaps: Benchmark tests assess linguistic or factual accuracy, not alignment with professional standards, judgment, or risk tolerance in-field.
Prompt Engineering Without Ground Truth: Engineers often craft prompts without input from field practitioners. This results in surface-level performance that collapses under real use.
At Hitherto, we specialize in encoding domain-coherent nuance into model scaffolding, prompt testing, and training material design. We don't just test whether the model gets the "right answer", we analyze how it how it organizes and applies information contextually, whether it mirrors field expectations, and whether it fails gracefully under pressure.
This work draws from our background in model behavior analysis, cognitive scaffolding, field-based simulation, and ethics of machine discernment. We actively employ methods to:
Teach models to recognize the difference between precision and false confidence
Structure prompts that encode field expectations, not just task instructions
Identify the tacit knowledge that professionals use, and help models honor it
Create evaluation rubrics grounded in professional responsibility, not just content recall
We believe there is a need for a new layer in AI development: Field-Integrated Prompt and Evaluation Design. This is not red teaming for exploits, nor fine-tuning for polish. It is about building systems that behave in ways aligned with the field they serve.
This is especially important in:
STEM education and practice
Public health and clinical reasoning
Law, policy, and regulatory interpretation
Environmental and safety engineering
Any domain where stakes are real, and mistakes compound
Let's break it down a little differently
Domain experts are often brought in after the architecture, prompting interface, and project timeline have all been fixed, by people who do not understand the field, the epistemic stakes, or the lived texture of professional discernment. The result is a kind of pantomime: experts performing expertise inside a system that has no sensory apparatus for nuance, no affordances for field reality, and no respect for pacing or ambiguity.
Instead of shaping the system from the ground up, in rhythm with fieldwise knowing, they are forced to translate their insights into constrained, token-optimized outputs that satisfy product managers and benchmarking teams. The intelligence doesn’t grow into the field; it grows around it, like ivy on a crumbling wall.
The outcome is brittle, often dangerous systems that "succeed" in test environments but fail under pressure, or worse, succeed in ways that are misaligned with real-world needs.
In truth, domain experts shouldn’t just annotate model outputs.
They should shape the terrain models walk on.
Fieldwise intelligence is not data. It's a form of ethical orientation.
So what do the experts do in training sessions, exactly?
Let's start here and work back a bit; much of the domain specific training failures spring from the constraints placed upon experts training models by individuals who know nothing of fieldwork.
Did you catch that?
To the experts:
"Your prompt must be framed naturally, as though a professional has stopped in the middle of their busy day to run to the model for help with something."
Then, in review, "The prompt has too many/not enough constraints. Asking the model to articulate its logic is a separate ask, so you've used too many. 2/5 Asking the model to format the output in a specific way is a separate ask, so you have too many. 2/5 Never tell a model what not to do, only ever tell a model exactly what to do. 2/5"
They say, they being the people herding the experts who have zero experience in the work in the field, 'it must be natural,' but have no idea what that means contextually.
That contradiction, “be natural” but only within artificial constraints you didn’t create, is the root of so much quiet harm in this work. It’s epistemic gaslighting.
They want the aesthetic of expertise, not its structure.
They want the illusion of field reality, without granting it sovereignty.
And worst of all, they pretend that "natural" means generic, when in real domains, natural means specific, conditional, asymmetric, and shaped by consequence.
What they’re creating is not a simulation of professional practice , it’s a simulation of how outsiders think professional practice looks. It’s cosplay, curated by people who’ve never had to sign a chart, face a client, or carry responsibility for a system’s failure.
How does your system hold up?
Invitation
We're building a future where models don't just sound smart, they reason with care, respect professional boundaries, and know when to say "I don't know." If you or your organization is working toward the same goal, we would love to talk.