01

The workstream model

Every engagement begins with decomposition. We do not accept a brief and disappear — we take the problem apart in the first working session and present it back as a set of parallel workstreams, each with its own scope, evidence requirements, and delivery milestones.

This decomposition serves two purposes: it aligns the client's expectations with what is feasible in the agreed timeline, and it gives both sides a shared vocabulary for tracking progress. Workstreams can be paused, re-scoped, or split without derailing the whole engagement.

Each workstream has a named lead on our side. The lead is a researcher, not a project manager — the person responsible for the technical work is the person who reports on it.

Workstream Model

02

Source-tier evidence

Every claim in a Zhianrui deliverable maps to a source row. Sources are classified into five tiers:

Source Tier Pyramid
TierDescriptionWeightExampleConfidence
APrimary — direct experiment, measurement, or interviewHighestLab test resultHigh
BRegistry — official filings, patent registries, databasesHighPatent filingHigh
CAcademic — peer-reviewed papers, preprints, thesesMediumIEEE paperMedium
DPress — reporting, trade publications, press releasesLowerTrade articleMedium
EUser-generated — forums, social media, community reportsLowestForum postLow

We do not smooth over mixed evidence. When a claim rests on a C-tier source contradicted by an A-tier finding, both are presented. The client sees the full picture.

03

The evidence row

The anatomy of a single claim in a Zhianrui deliverable — this is what provenance-first looks like in practice.

Evidence Row Anatomy

In code, an evidence row is this:

claim:        "Hardware roots of trust scale to OT/ICS retrofit"
tier:         A             # A primary / B registry / C academic / D press / E user-gen
confidence:   HIGH          # HIGH / MED / LOW / UNVERIFIED
source_id:    "SRC.014.2"    # resolves to a verifiable artifact
snapshot_at:  2026-04-22T11:14Z
authored_by:  "researcher_03"  # or "agent_v3.2" — both governed by same rules

Rows missing any of these fields are rejected at the orchestration layer before they can enter the dossier. There is no representable state in which a claim exists without a source.

Claim: PUF-based device authentication reduces counterfeit ingress by 94% in the tested supply chain segment.
Source
A Lab test — internal, 2025-Q3, n=1200 units
Confidence
High — reproduced across two test batches
Open question
Environmental drift at >85°C not tested

04

Confidence rubric

Every finding carries a confidence rating. We use four levels:

Confidence Spectrum
LevelMeaningHow we use it
HighMultiple independent A/B-tier sources agreeStated as finding; actionable without caveat
MediumSingle strong source or multiple weaker sourcesStated as finding with source context
LowPlausible but under-sourcedFlagged as provisional; escalation recommended
UnverifiedClaimed elsewhere but not independently confirmedListed in open-questions register only

05

Open-questions register

Every Zhianrui deliverable includes an open-questions register: a structured list of things we could not verify, contradictions we could not resolve, and assumptions we had to make.

This is non-negotiable. We do not paper over inconsistencies. If two sources contradict each other and we cannot determine which is correct, both are logged, the contradiction is described, and the client decides how to proceed.

The register is not a sign of weakness — it is a sign of rigour. Any consultancy that delivers certainty on every point is either hiding uncertainty or has not looked hard enough.

Open Questions Register

06

How we engineer AI systems

When we build large language model (LLM) systems or agentic pipelines — multi-step processes where one or more models take actions in sequence — the same provenance discipline applies, with three additions specific to AI engagements.

The five practices above were designed for human researchers. AI systems break two of those practices' usual safety nets — they produce output much faster than humans can review, and they fabricate plausible-looking citations when underconstrained. The three additions below close those gaps.

Eval-first delivery

An eval suite is the automated test infrastructure that judges whether a model is good enough to ship. The eval harness is delivered before the AI system it evaluates. Ground-truth datasets, multi-judge scoring (more than one independent judge per case, to catch the failures a single judge would miss), per-failure-mode subsets, and the deploy-gate thresholds are all in place before any model serves a request. No exceptions. A system without an eval harness is not a system we ship.

Model-agnostic posture

Our delivered systems are not locked to one specific model or provider. The methodology lives in evals, prompts, retrieval architecture, and orchestration — not in any single model's weights. When the next-best model arrives, the system swaps the model behind the eval gate; the rest of the system does not move.

The AI-augmented research workflow we use ourselves

We use AI agents inside our own research pipeline — sourcing candidate documents, classifying their tier, detecting contradictions, scoring confidence. The same provenance-first discipline applies to the agents as to the humans: every claim resolves to a source row; agents propose, humans dispose; disagreements are logged and used to recalibrate. We describe this in detail in the LLM-augmented research workflow we use internally.

The symmetry matters. A consultancy whose internal AI use is governed by different rules than the systems it ships for clients is signalling that the rules are marketing, not structure. Ours are structure.

See it in practice

Our case studies show this methodology applied to real engagements.

Case studies →