How we work — Zhianrui

The workstream model

Every engagement begins with decomposition. We do not accept a brief and disappear — we take the problem apart in the first working session and present it back as a set of parallel workstreams, each with its own scope, evidence requirements, and delivery milestones.

This decomposition serves two purposes: it aligns the client's expectations with what is feasible in the agreed timeline, and it gives both sides a shared vocabulary for tracking progress. Workstreams can be paused, re-scoped, or split without derailing the whole engagement.

Each workstream has a named lead on our side. The lead is a researcher, not a project manager — the person responsible for the technical work is the person who reports on it.

Source-tier evidence

Every claim in a Zhianrui deliverable maps to a source row. Sources are classified into five tiers:

TierDescriptionWeightExampleConfidence

APrimary — direct experiment, measurement, or interviewHighestLab test resultHigh

BRegistry — official filings, patent registries, databasesHighPatent filingHigh

CAcademic — peer-reviewed papers, preprints, thesesMediumIEEE paperMedium

DPress — reporting, trade publications, press releasesLowerTrade articleMedium

EUser-generated — forums, social media, community reportsLowestForum postLow

We do not smooth over mixed evidence. When a claim rests on a C-tier source contradicted by an A-tier finding, both are presented. The client sees the full picture.

The evidence row

The anatomy of a single claim in a Zhianrui deliverable — this is what provenance-first looks like in practice.

In code, an evidence row is this:

claim:        "Hardware roots of trust scale to OT/ICS retrofit"
tier:         A             # A primary / B registry / C academic / D press / E user-gen
confidence:   HIGH          # HIGH / MED / LOW / UNVERIFIED
source_id:    "SRC.014.2"    # resolves to a verifiable artifact
snapshot_at:  2026-04-22T11:14Z
authored_by:  "researcher_03"  # or "agent_v3.2" — both governed by same rules

Rows missing any of these fields are rejected at the orchestration layer before they can enter the dossier. There is no representable state in which a claim exists without a source.

Claim: PUF-based device authentication reduces counterfeit ingress by 94% in the tested supply chain segment.

Source
A Lab test — internal, 2025-Q3, n=1200 units

Confidence
High — reproduced across two test batches

Open question
Environmental drift at >85°C not tested

Confidence rubric

Every finding carries a confidence rating. We use four levels:

Level	Meaning	How we use it
High	Multiple independent A/B-tier sources agree	Stated as finding; actionable without caveat
Medium	Single strong source or multiple weaker sources	Stated as finding with source context
Low	Plausible but under-sourced	Flagged as provisional; escalation recommended
Unverified	Claimed elsewhere but not independently confirmed	Listed in open-questions register only

Open-questions register

Every Zhianrui deliverable includes an open-questions register: a structured list of things we could not verify, contradictions we could not resolve, and assumptions we had to make.

This is non-negotiable. We do not paper over inconsistencies. If two sources contradict each other and we cannot determine which is correct, both are logged, the contradiction is described, and the client decides how to proceed.

The register is not a sign of weakness — it is a sign of rigour. Any consultancy that delivers certainty on every point is either hiding uncertainty or has not looked hard enough.

How we engineer AI systems

When we build large language model (LLM) systems or agentic pipelines — multi-step processes where one or more models take actions in sequence — the same provenance discipline applies, with three additions specific to AI engagements.

The five practices above were designed for human researchers. AI systems break two of those practices' usual safety nets — they produce output much faster than humans can review, and they fabricate plausible-looking citations when underconstrained. The three additions below close those gaps.

Eval-first delivery

An eval suite is the automated test infrastructure that judges whether a model is good enough to ship. The eval harness is delivered before the AI system it evaluates. Ground-truth datasets, multi-judge scoring (more than one independent judge per case, to catch the failures a single judge would miss), per-failure-mode subsets, and the deploy-gate thresholds are all in place before any model serves a request. No exceptions. A system without an eval harness is not a system we ship.

Model-agnostic posture

Our delivered systems are not locked to one specific model or provider. The methodology lives in evals, prompts, retrieval architecture, and orchestration — not in any single model's weights. When the next-best model arrives, the system swaps the model behind the eval gate; the rest of the system does not move.

The AI-augmented research workflow we use ourselves

We use AI agents inside our own research pipeline — sourcing candidate documents, classifying their tier, detecting contradictions, scoring confidence. The same provenance-first discipline applies to the agents as to the humans: every claim resolves to a source row; agents propose, humans dispose; disagreements are logged and used to recalibrate. We describe this in detail in the LLM-augmented research workflow we use internally.

The symmetry matters. A consultancy whose internal AI use is governed by different rules than the systems it ships for clients is signalling that the rules are marketing, not structure. Ours are structure.

See it in practice

Our case studies show this methodology applied to real engagements.

Case studies →