Methodology vocabulary

The terms we use to describe our own delivery practice.

Provenance-first delivery
A delivery practice in which every claim in a deliverable resolves to an identified source row — with tier (A–E), confidence rating, and timestamp — before it can enter the report.
Workstream
A scoped, parallel branch of an engagement, with its own evidence requirements and milestones. We decompose a brief into workstreams in the first session and present the decomposition back for alignment.
Source-tier framework (A–E)
The classification we apply to every source: A primary (direct measurement, signed test report); B registry (patent filings, official databases); C academic (peer-reviewed papers); D press (trade reporting, releases); E user-generated (forums, social, community).
Evidence row
The atomic unit of a Zhianrui deliverable. Contains: the claim, a delimiter, the tier, the confidence rating, the source identifier, and the snapshot timestamp. The dossier is assembled from accepted evidence rows by a deterministic step.
Confidence rubric
A four-step rating attached to every claim: high (multiple independent A/B-tier sources agree), medium (single strong source or several weaker), low (plausible but under-sourced, flagged as provisional), unverified (claimed elsewhere but not confirmed; lives only in the open-questions register).
Open-questions register
A structured list of things we could not verify, contradictions we could not resolve, and assumptions we had to make. Every Zhianrui deliverable includes one.

AI / LLM engineering

Terms used in our case studies, insights, and lab notes.

LLM (large language model)
A class of AI model trained on text that takes a prompt and produces a continuation; the underlying technology behind systems like ChatGPT, Claude, and Gemini.
Agentic system / agent
An LLM-driven process that takes actions in sequence — calling tools, retrieving documents, evaluating its own output — to perform a task. An agentic pipeline is one in which several such agents coordinate.
RAG (retrieval-augmented generation)
An architecture in which a model is given relevant documents at inference time and generates an answer grounded in those documents — rather than relying solely on what the model memorised in training.
Eval harness
The automated test infrastructure that judges whether a new model version is good enough to ship. A harness is more than a benchmark: it is wired into the deployment pipeline as a gate.
Multi-judge scoring
Using more than one independent judge per evaluation case — typically two LLM judges from different model families plus a human-in-the-loop sample — to catch failures a single judge would miss.
Bi-encoder retrieval
The standard fast method in RAG systems: queries and documents are independently embedded into the same vector space, and nearest neighbours are returned. Adequate alone for prose; insufficient for technical corpora with many superficially-similar chunks.
Re-ranker
A second model (or scoring function) that re-orders the top results from initial retrieval, with more domain-specific signals than the first-pass embedding alone provides.
Citation hallucination
A failure mode in RAG systems where the model produces an answer with citations that look valid but the cited content does not actually support the claim. Standard relevance evals miss this.
Deploy gate
An eval harness wired into the deployment pipeline as a release blocker. A new model version cannot reach production unless it passes the configured thresholds (aggregate, per-subset, calibration).
Constrained generation
Forcing an LLM's output to conform to a predefined data structure — JSON schema, regular expression, or grammar. Implemented via provider-native function-calling, grammar-constrained decoding, or schema-guided sampling.
Function-calling
A model-API feature that returns structured JSON matching a schema you supply. The provider enforces conformance during decoding rather than relying on the prompt to request structured output.
F1 score
The harmonic mean of precision and recall — a single number that summarises both whether you found the right items and whether you found only the right items. Used to compare retrieval strategies.

Cybersecurity & regulatory

Domain terms that appear in our cybersecurity case studies and regulatory references.

PUF (physically unclonable function)
A hardware fingerprint unique to each chip, derived from manufacturing variations too subtle to clone. Used to derive cryptographic keys without storing them — a root of trust embedded in silicon.
OT / ICS (operational technology / industrial control systems)
The computers that run factories, power plants, water treatment, transit systems, and other physical infrastructure. Distinct from "IT" in their long lifecycles, real-time constraints, and limited update windows.
IEC 62443
The international standard for cybersecurity in industrial automation and control systems. Defines security levels, zone-and-conduit architecture, and component certification requirements.
NIS2
The EU's Network and Information Security Directive, second edition (Directive 2022/2555). The baseline cyber regulation for European critical infrastructure operators and digital service providers.
CRA (EU Cyber Resilience Act)
The EU regulation (2024/2847) requiring cybersecurity by design for any product with digital components sold in the European market — including embedded systems, IoT devices, and software.
EU AI Act
The EU regulation (2024/1689) on artificial intelligence — the world's first comprehensive legal framework for AI systems. Tiers AI systems by risk and imposes obligations for high-risk and general-purpose AI.
EU DPP (Digital Product Passport)
A coming-into-force EU requirement for digital records of a product's components, history, and end-of-life. Driven by the Ecodesign for Sustainable Products Regulation; applies sector by sector starting with batteries, textiles, and electronics.
NIST 800-155 / 800-193
US government standards on hardware integrity verification — 800-155 for BIOS integrity measurement, 800-193 for platform firmware resiliency. Increasingly referenced by EU procurement.
PQC (post-quantum cryptography)
Encryption algorithms designed to resist attack from sufficiently large quantum computers — which would break today's standard public-key cryptography. NIST has standardised the first set; migration is underway across regulated infrastructure.
KEM (Key Encapsulation Mechanism)
A cryptographic primitive for securely exchanging an encryption key between two parties. ML-KEM (formerly Kyber) and HQC are post-quantum KEM candidates standardised or under standardisation by NIST.