Report 6: Prototyping Guide — Summary¶

Report 6 provides a structured path for building a testing and exploration environment where the claims from Reports 1-5 can be investigated experimentally.

The Capability Gradient ¶

Eight levels, each adding one architectural capability:

Level	Capability	Core Investigations
0	Structured evaluation of frontier model	Domain knowledge, hallucination, calibration, prompt sensitivity
1	local-deployment and model comparison	System prompt effects, error correlation, reproducibility
2	Knowledge grounding (RAG and KG)	Retrieval failure modes, guardrail effectiveness
3	tool-calling and ReAct loop	Tool chain reliability, selection accuracy
4	simulator-coupling	Forward projection, physics-LLM distinction
5	Persistent agent with memory	Context management under accumulation, operator-modelling
6	Multi-agent coordination	epistemic-independence, productive disagreement
7	Human in the loop	Decision quality, trust dynamics, HRA parameter estimation

Key Principles¶

Build-vs-Assess Gap: Building outpaces assessment. Systems can be prototyped faster than evaluated. The gradient addresses this by producing evidence at each level.

Levels are experiments, not technologies: The question at each level is "what can you now investigate?" not "what can you build?"

Additive gradient: Nothing built at a lower level is discarded. The Level 0 evaluation-harness is reused at every subsequent level.

Time investment: Level 0 in a day. Full gradient to Level 7 in 6-12 months. Stop at any level for useful output.

Running example: RCS temperature anomaly scenario threads through all eight levels.

Significance¶

The report makes the full capability gradient accessible to domain experts who are not software engineers, using the operational model where the human specifies requirements and an AI coding agent handles implementation.

Report 6: Prototyping Guide — Summary¶

The Capability Gradient¶

Key Principles¶

Significance¶

The Capability Gradient ¶