A hybrid quantum–classical project studying how a hardware-aware quantum Bayesian learner trades classification accuracy against entangling-circuit complexity under realistic NISQ hardware constraints. The learner performs Pothos–Chater–style similarity categorization on a 2-feature toy task; we then compress its entangling structure and measure how accuracy degrades against a real, post-transpile two-qubit gate count on IBM's FakeManila backend.
Reference: Pothos, E. M., & Chater, N. (2002). A simplicity principle in unsupervised human categorization. Cognitive Psychology, 45, 45–85.
Note on this README. The project was overhauled into a "v2" pipeline. The learning core was rebuilt from scratch as an exact, differentiable JAX statevector learner (
src/qcore.py), the hardware cost is now a genuine FakeManila transpile (src/hardware_cost.py), and all experiments run throughsrc/run_v2.pywriting toresults_v2/. This README documents that current pipeline. (Earlier finite-difference code undersrc/learner.py,src/train_*.py, etc. is superseded and kept only for reference; do not use it.)
- Overview & Purpose
- Environment Setup
- Architecture (file by file)
- How to Reproduce
- Interpreting the Results
- The Paper
- Running on Real Hardware
The thesis is unchanged from the project's inception: build a small quantum Bayesian learner that does Pothos–Chater–style categorization, and study how far its entangling structure can be compressed before accuracy collapses, using a real hardware cost rather than an idealized gate count.
Concretely, the v2 pipeline provides:
- An exact, differentiable learner. A 2-qubit JAX statevector model that
starts from the pure prior
|0…0⟩, re-uploads the input features, applies trainable Heisenberg entanglers (RXX·RYY·RZZ), reads out trainable single-qubit Pauli expectations, and is trained with exact autodiff gradients (no finite differences). It genuinely learns the task. - A real hardware-aware cost. The number of two-qubit operations
N₂qafter transpiling the circuit onto IBM'sFakeManilacoupling map (including any SWAPs the transpiler inserts), computed at runtime, not hardcoded. - A compression study. A per-interaction-term binary mask over the
entanglers, swept by greedy structured pruning, traces an
accuracy-vs-
N₂qPareto frontier. A matched random-mask ablation shows that the learned (greedy) sparsity beats unstructured sparsity at equal gate budget. - A difficulty knob. A checkerboard categorization concept whose spatial frequency controls task complexity (easy/medium/hard), producing a family of frontiers.
The project targets Python 3.14 with a local virtual environment.
cd qb-learner-compression
python3 -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtThe original repo pinned qiskit==1.1.0, which does not build on Python 3.14
(the only interpreter available on the development machine). requirements.txt
was therefore re-pinned to a current, tested stack:
| Package | Version | Role |
|---|---|---|
qiskit |
2.4.2 | circuit construction + transpile |
qiskit-aer |
0.17.2 | simulator backend |
qiskit-ibm-runtime |
0.47.0 | provides FakeManilaV2 (hardware snapshot) |
jax, jaxlib |
0.10.2 | exact-autodiff statevector learner |
numpy |
2.4.6 | arrays |
scipy |
1.17.1 | numerics |
scikit-learn |
1.9.0 | stratified train/test split |
pandas |
3.0.3 | result logging / aggregation |
matplotlib |
3.11.0 | figures |
tqdm, pyyaml |
— | misc utilities |
The learned model runs entirely in JAX (exact double-precision autodiff); Qiskit is used only for the FakeManila post-transpile hardware cost, so the two concerns stay decoupled.
An exact, differentiable JAX statevector quantum learner. This file replaces the old finite-difference / frozen-readout learner entirely.
- Forward model. Starts in the pure prior
|0…0⟩(the honest "no evidence yet" state — this avoids the maximally-mixed-invariance problem of the old code). It then interleaves data re-uploading (n_reupload: re-encode the featuresRY(πxq)·RZ(πxq)per qubit) with trainable variational layers. - Entanglers. Each layer applies single-qubit
RX·RZrotations followed by Heisenberg two-qubit interactionsRXX·RYY·RZZon each edge incfg.pairs. Every interaction term is individually gated by a per-term binary mask of shape(depth, n_edges, 3). - Readout. A trainable linear head over Pauli expectations:
logit = w · ⟨P⟩ + b, withw, boptimized. The default readout observables are configurable (readout_paulis); the experiment runner uses single-qubit Paulis only so that entanglement is the only thing that can carry feature interaction information. - Gradients & optimizer.
make_loss_and_gradbuilds a jitted numerically stable BCE loss and an exactjax.value_and_grad. A small pytreeAdamoptimizer (qcore.Adam) trains all parameters jointly. make_dynamic_fns— builds jitted loss/grad/predict functions where the mask is a runtime argument (a term withmask=0becomesRPP(0)=identity, bit-identical to omitting it). This lets one compiled function evaluate or train any mask with no per-mask JAX recompiles — essential for fast pruning sweeps.make_learned_mask_loss_and_grad— a continuous-relaxation "learned mask" with a differentiableλ·N₂qpenalty. Kept for reference but not used: it is unstable (Adam drives all keep-probabilities to 0 before the circuit learns to use the gates). Greedy structured pruning is used instead.
The simulator is verified bit-for-bit against Qiskit (scripts/verify_qcore.py).
The principled realization of the quantum-Bayesian mechanism with no pure-state
shortcut: the belief is a genuine density operator ρ, the prior is the
maximally mixed state I/2ⁿ (a uniform "no information" belief), and the evidence
update is a non-unital quantum channel with trace renormalization — a
stimulus-dependent Lüders/POVM filter Kₓ ρ Kₓ† then ρ/Tr(·) (the Bayesian update),
plus per-qubit amplitude damping (a multi-Kraus, non-unital CPTP map). The readout is
Tr[ρP] through the same trainable head. It is exact, differentiable (dense JAX matrix
ops), reduces bit-for-bit to qcore when the channel is off, and is verified mixed
(purity < 1), Hermitian, PSD, trace-one. Used to confirm the frontier and learned-vs-random
results hold for the principled model (src/run_density.py, tests/test_qdensity.py).
Full write-up in docs/DENSITY_MODEL.md.
Computes the real, post-transpile hardware cost:
- Instantiates
FakeManilaV2(a 5-qubit IBM device snapshot, linear coupling map0–1–2–3–4). build_cost_circuit(...)rebuilds the exact Qiskit circuit mirroring the qcore ansatz (HeisenbergRXX/RYY/RZZ, selected by the per-term mask; HEACX-RY-CXvariant also supported). Angles are placeholders since the 2q count depends only on circuit structure, not angle values.transpiled_2q_cost(...)transpiles onto FakeManila and counts all two-qubit operations including SWAPs inserted to satisfy the coupling map. Results are cached by a structural key and deterministic (fixedseed_transpiler), so repeated calls during pruning are cheap and reproducible. The count is provably mask-dependent (more active terms ⇒ higherN₂q).
get_pothos_chater_checker(freq, …)— the checkerboard categorization task:label = (⌊freq·x₀⌋ + ⌊freq·x₁⌋) mod 2. Higherfreq⇒ finer cells ⇒ a more intricate boundary that needs more entangling capacity. This is the difficulty knob, in the spirit of Pothos–Chater local-similarity grouping.CHECKER_DIFFICULTYmapseasy / medium / hard → freq 2 / 3 / 4;get_checker_difficulty(level, …)is the convenience accessor used by the runner. (Empirically,freq=4gives a clear graded decline with a visible knee rather than an all-or-nothing cliff.)- Legacy generators retained for reference: the original Gaussian-cluster
get_pothos_chater_{small,medium,large}plus the XOR/parity generators (get_pothos_chater_xor,get_pothos_chater_parity) and their difficulty maps.
The canonical pipeline. Uses qcore for learning and hardware_cost for the
cost, with a fixed canonical config (2 qubits, depth 4, one edge,
n_reupload=2, single-qubit readout ("Z0","Z1","X0","X1"), Heisenberg
entanglers).
- Full baseline — trains the full circuit (warm-up).
- Greedy frontier (
greedy_trajectory) — repeatedly drops the single interaction term whose removal least increases training CE, warm-start retraining at each budget, walking the mask from full down to no entanglers. One trajectory = the accuracy-vs-N₂qfrontier. - λ operating points — for each
λ, picks the trajectory point minimizingtrain_CE + λ·N₂q. - Matched random-mask ablation — at each pruned budget, trains several random masks with the same number of active terms (connectivity-matched control).
- Logging — writes per-difficulty CSVs to
results_v2/<difficulty>/{frontier,lambda_sweep,mask_ablation}.csv, incrementally per seed. All reported metrics use the discrete deployed circuit and the real FakeManilaN₂q.
python -m src.run_v2 --difficulty hard --seeds 0 1 2 3 4
python -m src.run_v2 --all --seeds 0 1 2 3 4 # easy, medium, hardverify_qcore.py— checks the qcore statevector against Qiskit over 20 random circuits (max|1 − fidelity|≈ 4e-16, accounting for global phase / endianness). Run this first to confirm the simulator is correct.train_quick.py— the Phase-1 Definition-of-Done check: trains the diagnostic baseline (λ=0, full mask) and confirms it learns well above chance (reaches train/test ≈ 1.0 on the well-separated large dataset).make_figures_v2.py— aggregates theresults_v2/<difficulty>/*.csvlogs into the paper's figures (results_v2/figures/) and LaTeX tables (results_v2/tables/). Every reported number is computed here from logged runs — nothing is hand-entered.probe_*.py(probe_checker.py,probe_knee.py,probe_knee3.py,probe_concentric.py) — dataset-design probes used to choose the checkerboard task as the difficulty knob (XOR/parity gave a cliff, concentric gave no knee, checkerboardfreq=4gave a graded frontier).
tests/test_qcore.py covers: simulator-vs-Qiskit agreement, the dynamic
(runtime) mask matching the static discrete mask bit-for-bit, training reducing
the loss, and full_mask shape/values. Run:
python -m pytest tests/ -qCopy-pasteable, start to finish:
# 1. Environment
cd qb-learner-compression
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 2. Verify the simulator is correct (qcore vs Qiskit)
python scripts/verify_qcore.py
# 3. (optional) sanity-check that the baseline learns + run the test suite
python scripts/train_quick.py
python -m pytest tests/ -q
# 4. Run the full sweep: greedy frontier + lambda sweep + mask ablation,
# across all difficulties (easy/medium/hard), 5 seeds each
python -m src.run_v2 --all --seeds 0 1 2 3 4
# 5. Aggregate logs into figures + LaTeX tables
python scripts/make_figures_v2.pyOutputs land in:
results_v2/<difficulty>/{frontier,lambda_sweep,mask_ablation}.csv— raw logsresults_v2/figures/—frontier_family.png,frontier_ce.png,mask_ablation.pngresults_v2/tables/—frontier.tex,lambda_sweep.tex,mask_ablation.tex
scripts/reproduce_paper.sh runs the whole simulation pipeline end to end (sanity
check → pure-state sweeps → density-matrix sweeps → noise model → classical baselines
→ paired statistics → all figures). Add --hardware to also re-run the ibm_fez
device experiments (needs IBM credentials).
bash scripts/reproduce_paper.sh # all simulation results + figures
bash scripts/reproduce_paper.sh --hardware # also the ibm_fez device runsEvery figure and table in paper/conference_101719.tex is produced by the commands
below; raw numbers live next to the figures so the PDF, repo, and logs stay in lockstep.
| Paper artifact | Produced by | Data |
|---|---|---|
| Fig. 1 (frontier family) | src.run_v2 → make_figures_v2.py |
results_v2/<diff>/frontier.csv |
| Tab. I (λ sweep) | src.run_v2 |
results_v2/hard/lambda_sweep.csv |
| Fig. 2 + Tab. II (learned vs random) | src.run_v2 → make_figures_v2.py |
results_v2/hard/mask_ablation.csv |
| Fig. 3 (noise robustness) | scripts/noise_robustness.py |
results_v2/hardware/noise_robustness.csv |
| Fig. 4 (device frontier) | scripts/hardware_frontier.py |
results_v2/hardware/ibm_fez_frontier.json |
| Abstract/§IV-E endpoints | scripts/run_on_hardware.py --mode ibm |
results_v2/hardware/ibm_fez_results.json |
| Fig. 5, 6 + Tab. III (density) | src.run_density → make_figures_density.py |
results_v2_density/<diff>/ |
| Tab. V (classical baselines, App. B) | scripts/classical_baselines.py |
results_v2/classical_baselines.csv |
| §IV-C paired statistic | scripts/paired_stats.py hard |
results_v2/hard/mask_ablation.csv |
Device-run provenance (backend, date, shots, transpiler settings) is recorded in
results_v2/hardware/RUN_METADATA.md and mirrored in the paper appendix.
The specific numbers come from the run and depend on seeds, so they are not hardcoded here. Qualitatively, you should expect:
- An accuracy-vs-
N₂qfrontier with a knee. As the greedy trajectory removes entangling terms, accuracy stays high for a while, then degrades gradually past a knee — a smooth trade-off, not a cliff (this is why the checkerboard task was chosen). - Entanglement is necessary. At
N₂q = 0(all entanglers removed) the model collapses to roughly chance accuracy — the single-qubit-only readout cannot represent the feature interaction the boundary requires without entanglers. - Learned masks beat random masks at matched budget. At equal
N₂q, the greedy-pruned (structured) masks achieve higher accuracy than connectivity- matched random masks, showing that which entanglers are kept matters, not just how many. - A family of frontiers by difficulty. The checkerboard frequency
(
easy/medium/hard=freq 2/3/4) shifts the whole frontier: harder tasks need more entangling capacity to reach the same accuracy.
The write-up lives in paper/ (paper/conference_101719.tex, IEEEtran). It is
intended to be regenerated from results_v2/: the figures and tables emitted
by scripts/make_figures_v2.py are the authoritative source, so no quantitative
claim is hand-typed. Per the project plan, the paper is finalized last, after
the results_v2 sweeps are complete.
The model trains in exact simulation and is deployed for inference on hardware.
src/hardware_backend.py builds a Qiskit circuit that is bit-identical to the trained
qcore model (verified to < 1e-9 in tests/test_hardware_backend.py), estimates the
readout observables on a chosen backend, and applies the trained head. Three backends:
exact—StatevectorEstimator(noiseless; equals the simulator).noisy—AerSimulator.from_backend(FakeManila), the real device noise model (free).ibm— a real IBM Quantum device viaQiskitRuntimeService(needs an API token).
# free: validate under the FakeManila noise model (full vs compressed circuit)
python scripts/run_on_hardware.py --difficulty hard --mode noisy --shots 4096
# how the whole frontier holds up under device noise (5 seeds -> figure + CSV)
python scripts/noise_robustness.py
# real IBM hardware (after setting up an account — see docs/HARDWARE.md)
python scripts/run_on_hardware.py --difficulty hard --mode ibm --n-eval 20Across five seeds, accuracy under the noise model tracks the noiseless frontier within a few percent at every two-qubit budget, so the compression savings are real net of noise.
Validated on a real device. Running on the physical IBM processor ibm_fez
(156 qubits), the full circuit (N2q=24) collapses to near chance (0.60) while the
greedy-compressed circuit (N2q=6) holds at 0.825 — on real hardware, structured
compression is not just free but necessary. The full on-device frontier
(scripts/hardware_frontier.py) makes this vivid: device accuracy falls monotonically
with N2q (0.81 at N2q=2,6 → 0.56 at N2q=18,24), the opposite of the flat simulated
frontier, because extra two-qubit gates accumulate noise. Numbers in
results_v2/hardware/; figures hardware_frontier.png / hardware_real.png (regenerate
with scripts/hardware_frontier.py / scripts/plot_hardware.py). First, sanity-check your
connection:
python scripts/check_ibm_connection.py # lists devices + least busyFull connection instructions (IBM Quantum Platform token/instance setup, costs,
troubleshooting) are in docs/HARDWARE.md.
Mohammad Zoraiz — mz248@duke.edu