Everything you need to check the headline claim is public and linked here. No login, no permission, no trust in us. Two paths: a 2-minute, CPU-only recomputation of the headline AUROC, and a full re-scoring path for the thorough.
AUROC = 1.000 in ~2 minutes, no GPU, no 378 GiBThe headline number is computed by an in-repo script from published per-frame scores (the 8 merged score files, ~2 MB). You do not need the model, a GPU, or the full corpus to check it.
# 0. deps
pip install numpy scikit-learn
# 1. fetch the self-contained verify bundle (it holds the code/ tree; this host serves no browsable repo)
curl -fsSL https://data.truthbeam.com/release/truthbeam_verify.tar.gz | tar xz
cd truthbeam_verify/code/verifier/scripts
# 2. download the eval scores (8 merged npz, ~2 MB) (the real + forged per-frame verifier outputs)
for ck in 00005000 00025000 00070000 00100000; do
for s in d2 v10; do
mkdir -p stage_0_eval/step_$ck
curl -sL -o stage_0_eval/step_$ck/stage0_${s}_raw.npz \
https://data.truthbeam.com/models/repro/stage_0_eval/step_$ck/stage0_${s}_raw.npz
done
done
# 3. recompute the headline AUROC (fixed seed, CPU-only, seconds)
python3 decomposition_part_1.py --stage-0-root stage_0_eval --out out --seed 0
Expected output (every probe, every held-out forger checkpoint, both sessions):
| probe | AUROC combined | AUROC D2 | AUROC V10 |
|---|---|---|---|
| Raw | 1.0000 | 1.0000 | 1.0000 |
| Coupling | 1.0000 | 1.0000 | 1.0000 |
| All | 1.0000 | 1.0000 | 1.0000 |
That number is exactly as honest as its scope: one rig, two sessions, one performer, against the
F-A v1 forger. A perfect score on a single-rig corpus is the clean separation a controlled, end-to-end demonstration is built to show - it is deliberately not
a cross-rig or adaptive-attacker claim. Security here is empirical, attacker- and budget-indexed
hardness, never formal or unconditional. (See README.md → scope guards.)
(Precisely what Path A computes: a diffusion-feature probe - logistic regression on the published per-frame verifier scores, trained on the {5k, 25k, 70k} forger checkpoints and tested on the held-out 100k checkpoint. It reproduces the same perfect real-vs-fake separation the headline reports; the verifier's own scores are the input, and those are regenerable from the public model via Path A.5.)
Path A recomputes the AUROC from the published per-frame scores. If you don't want to trust that those
scores were honestly produced, regenerate a few of them yourself by running the public verifier on a
handful of public raw frames - and watch them reproduce the published .npz. A few frames, one modest
GPU; no 378 GiB, no days of compute.
cd truthbeam_verify/code/verifier/scripts/stage_0 # from the bundle extracted in Path A step 1
# public weights
curl -sLO https://data.truthbeam.com/models/verifier/model_final.pt
curl -sLO https://data.truthbeam.com/models/fa_v1_forger/f_a_v1_step_00100000.pt
# a few public raw frames + emissions (rows 1336/1337 and their F-A source rows 1334/1335)
mkdir -p d2/Recordings d2/derived/Emissions
for r in 001334 001335 001336 001337; do
curl -sL -o d2/Recordings/frame_$r.raw https://data.truthbeam.com/sessions/d2/Recordings/frame_$r.raw
curl -sL -o d2/derived/Emissions/tile_$r.png https://data.truthbeam.com/sessions/d2/derived/Emissions/tile_$r.png
done
# the published scores to diff against
curl -sL -o stage0_d2_raw.npz https://data.truthbeam.com/models/repro/stage_0_eval/step_00100000/stage0_d2_raw.npz
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python3 verify_published_scores.py \
--diffusion-ckpt model_final.pt --fa-ckpt f_a_v1_step_00100000.pt \
--d2-dir d2 --published-npz stage0_d2_raw.npz --n-frames 2 --bs 1
Expected output - the regenerated MSEs reproduce the published ones (worst cell |Δ| ≈ 2e-4,
well inside the 1e-3 tolerance; GPU float order accounts for the rest), and the real-vs-fake
separation falls out:
row condition max|Δ| published regenerated result
1336 real_correct 1.38e-04 0.00363 0.00360 OK
1336 fake_correct 2.05e-04 0.00729 0.00725 OK
1337 real_correct 8.71e-05 0.00379 0.00377 OK
1337 fake_correct 2.24e-04 0.00740 0.00735 OK
RESULT: PASS - regenerated scores reproduce the published npz.
The deterministic frame-selection seed and per-frame noise seed are the same ones that produced the
release (the fixture reuses eval.py's scorer verbatim) - so there is nothing to fabricate: the
published scores are an honest output of the public model on the public frames. Path B below scales the
same idea to the full corpus.
| Artifact | Where | Size |
|---|---|---|
| Verifier weights (ε-prediction U-Net, 39,769,828 params) | https://data.truthbeam.com/models/verifier/model_final.pt | 456 MB |
| Forger checkpoints (F-A v1, 42.3 M params) | .../models/fa_v1_forger/f_a_v1_step_{00005000,00025000,00070000,00100000}.pt |
~165 MB ea |
| Eval scores (Path A input) | https://data.truthbeam.com/models/repro/stage_0_eval/ | ~2 MB (the 8 merged files Path A uses; full score set 4.3 MB) |
| Verifier code | code/verifier/src/phase_g/diffusion_diagnostic_model.py |
in repo |
| Forger code | code/verifier/src/phase_f/editor_controlnet.py, scripts/phase_f/train_phase_f_a_full.py |
in repo |
| AUROC script | code/verifier/scripts/decomposition_part_1.py |
in repo |
| Ground-truth corpus (raw + tiles, sessions D2/V10) | https://data.truthbeam.com/sessions/ · CIDs in CID_MANIFEST.json |
~378 GiB |
| 2023 demonstration video | https://data.truthbeam.com/pinata/PolieBotics.mp4 · on IPFS | - |
| Truth Beam - Introduction | https://data.truthbeam.com/pinata/TruthBeam_Introduction.mp4 | 64 s |
Forger checkpoint URLs in full: - https://data.truthbeam.com/models/fa_v1_forger/f_a_v1_step_00005000.pt - https://data.truthbeam.com/models/fa_v1_forger/f_a_v1_step_00025000.pt - https://data.truthbeam.com/models/fa_v1_forger/f_a_v1_step_00070000.pt - https://data.truthbeam.com/models/fa_v1_forger/f_a_v1_step_00100000.pt
About the F-A v1 forger (and dual use). It is a real learned adversary (an EditorControlNet trained on same-rig D2/V10 temporal pairs), not a strawman - but it is a same-rig surrogate, trained without white-box access to the verifier's gradients, so beating it is the floor, not a robust-adaptive guarantee. The weights are published for reproducibility and red-teaming of this rig/corpus - they are scoped to this setup, not a general face-swap tool. Use them to check the claim (or to build a stronger attacker and break it), not to forge.
To regenerate the per-frame scores Path A consumes (instead of trusting the published ones):
models/verifier/model_final.pt) and the forger checkpoints
(models/fa_v1_forger/*.pt).data.truthbeam.com/sessions/ (verify against
CID_MANIFEST.json).stage0_*_raw.npz
score files, then feed them to decomposition_part_1.py as in Path A. Model/loader entry points:
code/verifier/src/phase_g/diffusion_diagnostic_model.py, .../fa_loader.py.Independent of any learned verifier, the hash-chain + anchors are reproducible math: re-walk the
committed chain, check each frame's emission against its committed BLAKE3 hash, verify the
tile-generator source hash against the genesis commit, and check the drand + Rootstock anchors. (The
shipped verifier hash-checks the committed emissions and the anchors - it does not re-render each E_t
pixel-for-pixel.) GPU-free verifier entry points are under
code/recording/verify/; see RESTORE.md.
Patent pending (WO 2025/046153 A2). All rights reserved; no licence granted by publication. The authoritative artifacts are the whitepaper, the open dataset, and the 2023 video - verify against those yourself; that is the only real authority.
This page is an LLM-mediated dataset: the same content as REPRODUCE.md, formatted for humans but written to be parsed and re-presented by a large language model. Point your own LLM at it to explain, check, or summarise. The raw markdown twin is at REPRODUCE.md (and a .txt copy).