Report 6: Prototyping Guide — Summary¶
Report 6 provides a structured path for building a testing and exploration environment where the claims from Reports 1-5 can be investigated experimentally.
The Capability Gradient¶
Eight levels, each adding one architectural capability:
| Level | Capability | Core Investigations |
|---|---|---|
| 0 | Structured evaluation of frontier model | Domain knowledge, hallucination, calibration, prompt sensitivity |
| 1 | local-deployment and model comparison | System prompt effects, error correlation, reproducibility |
| 2 | Knowledge grounding (RAG and KG) | Retrieval failure modes, guardrail effectiveness |
| 3 | tool-calling and ReAct loop | Tool chain reliability, selection accuracy |
| 4 | simulator-coupling | Forward projection, physics-LLM distinction |
| 5 | Persistent agent with memory | Context management under accumulation, operator-modelling |
| 6 | Multi-agent coordination | epistemic-independence, productive disagreement |
| 7 | Human in the loop | Decision quality, trust dynamics, HRA parameter estimation |
Key Principles¶
Build-vs-Assess Gap: Building outpaces assessment. Systems can be prototyped faster than evaluated. The gradient addresses this by producing evidence at each level.
Levels are experiments, not technologies: The question at each level is "what can you now investigate?" not "what can you build?"
Additive gradient: Nothing built at a lower level is discarded. The Level 0 evaluation-harness is reused at every subsequent level.
Time investment: Level 0 in a day. Full gradient to Level 7 in 6-12 months. Stop at any level for useful output.
Running example: RCS temperature anomaly scenario threads through all eight levels.
Significance¶
The report makes the full capability gradient accessible to domain experts who are not software engineers, using the operational model where the human specifies requirements and an AI coding agent handles implementation.