# Robust QAOA Benchmark

This benchmark is a reproducible scaffold for RA-QAOA paper experiments.
It compares robust-score selection against ideal, noisy, and success-probability baselines.

## Setup

- graphs evaluated: 60
- candidates evaluated: 16320
- depths: p=1, p=2
- per-layer grid: 4 x 4
- noise profile: depolarizing(p=0.000, readout=0.020), depolarizing(p=0.040, readout=0.020), depolarizing(p=0.080, readout=0.020), depolarizing(p=0.160, readout=0.020)
- score weights: degradation=0.350, success=0.200, shot_variance=1.000, noise_sensitivity=0.350
- figure: `./robust_qaoa_benchmark.png`
- paired delta table: `./robust_qaoa_paired_deltas.csv`

## Aggregate Metrics

| depth | selector | graphs | noisy E[C] | ratio | success | sensitivity | shot stderr | score |
| ---: | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| 1 | robust | 60 | 7.676503 +/- 0.873080 | 0.680875 | 0.096099 +/- 0.025233 | 0.870598 +/- 0.206181 | 0.022282 | 0.656309 |
| 1 | ideal_expected | 60 | 7.791356 +/- 0.882825 | 0.690740 | 0.088796 +/- 0.024684 | 1.197504 +/- 0.165874 | 0.031815 | 0.653856 |
| 1 | noisy_expected | 60 | 7.791356 +/- 0.882825 | 0.690740 | 0.088918 +/- 0.024664 | 1.197504 +/- 0.165874 | 0.031157 | 0.653881 |
| 1 | success_probability | 60 | 7.713439 +/- 0.873415 | 0.683770 | 0.101516 +/- 0.024815 | 1.072831 +/- 0.181179 | 0.030089 | 0.654176 |
| 2 | robust | 60 | 7.750276 +/- 0.859378 | 0.691092 | 0.127260 +/- 0.031279 | 1.079351 +/- 0.232858 | 0.022926 | 0.661512 |
| 2 | ideal_expected | 60 | 7.887172 +/- 0.870351 | 0.703716 | 0.111957 +/- 0.030873 | 1.473085 +/- 0.176823 | 0.036362 | 0.656991 |
| 2 | noisy_expected | 60 | 7.887172 +/- 0.870351 | 0.703716 | 0.111957 +/- 0.030873 | 1.473085 +/- 0.176823 | 0.036405 | 0.656991 |
| 2 | success_probability | 60 | 7.793746 +/- 0.857002 | 0.695677 | 0.129890 +/- 0.031284 | 1.243622 +/- 0.216431 | 0.034835 | 0.660615 |

## Pairwise Deltas

| depth | baseline | metric | better | mean delta | 95% CI | robust win rate |
| ---: | --- | --- | --- | ---: | ---: | ---: |
| 1 | ideal_expected | noisy_expected_cut | higher | -0.114854 | [-0.158442, -0.071265] | 1.7% |
| 1 | ideal_expected | success_probability | higher | 0.007303 | [0.001278, 0.013328] | 28.3% |
| 1 | ideal_expected | noise_sensitivity | lower | -0.326906 | [-0.452858, -0.200954] | 50.0% |
| 1 | noisy_expected | noisy_expected_cut | higher | -0.114854 | [-0.158442, -0.071265] | 0.0% |
| 1 | noisy_expected | success_probability | higher | 0.007181 | [0.001194, 0.013169] | 28.3% |
| 1 | noisy_expected | noise_sensitivity | lower | -0.326906 | [-0.452858, -0.200954] | 50.0% |
| 1 | success_probability | noisy_expected_cut | higher | -0.036937 | [-0.084178, 0.010305] | 15.0% |
| 1 | success_probability | success_probability | higher | -0.005417 | [-0.008875, -0.001959] | 0.0% |
| 1 | success_probability | noise_sensitivity | lower | -0.202233 | [-0.326652, -0.077814] | 26.7% |
| 2 | ideal_expected | noisy_expected_cut | higher | -0.136896 | [-0.181157, -0.092636] | 0.0% |
| 2 | ideal_expected | success_probability | higher | 0.015303 | [0.007746, 0.022859] | 50.0% |
| 2 | ideal_expected | noise_sensitivity | lower | -0.393734 | [-0.521034, -0.266434] | 78.3% |
| 2 | noisy_expected | noisy_expected_cut | higher | -0.136896 | [-0.181157, -0.092636] | 0.0% |
| 2 | noisy_expected | success_probability | higher | 0.015303 | [0.007746, 0.022859] | 51.7% |
| 2 | noisy_expected | noise_sensitivity | lower | -0.393734 | [-0.521034, -0.266434] | 78.3% |
| 2 | success_probability | noisy_expected_cut | higher | -0.043470 | [-0.084304, -0.002636] | 8.3% |
| 2 | success_probability | success_probability | higher | -0.002630 | [-0.004222, -0.001038] | 0.0% |
| 2 | success_probability | noise_sensitivity | lower | -0.164271 | [-0.273996, -0.054547] | 51.7% |
## Interpretation

Use this report as evidence for the paper only after checking both mean performance and stability.
A robust selector is useful when it preserves noisy E[C] while reducing sensitivity or shot error.
If all selectors choose the same parameters, the graph suite or grid is not yet discriminative enough.

## Sample Single-Graph Report

# Robust QAOA Report: g000_n6_s101

- graph: 6 nodes, 6 edges
- depth: p=1
- exact optimum: 6.000 at 001011, 110100
- noise settings: depolarizing(p=0.000, readout=0.020), depolarizing(p=0.040, readout=0.020), depolarizing(p=0.080, readout=0.020), depolarizing(p=0.160, readout=0.020)
- candidates evaluated: 16
- score weights: degradation=0.350, success=0.200, shot_variance=1.000, noise_sensitivity=0.350

## RA-QAOA Selection

- gammas: 1.0471975512
- betas: 0.523598775598
- robust score: 0.515961
- noisy expected cut: 3.822966
- noisy expected std across noise sweep: 0.139525
- ideal expected cut: 4.078125
- degradation: 0.255159
- noise sensitivity: 2.366976
- approximation ratio: 0.637161
- success probability: 0.158824
- shot standard error: 0.015303
- dominant bitstring: 110100 (cut 6.000, p=0.079412)

## Baselines

| method | noisy E[C] | success | sensitivity | score | gammas | betas |
| --- | ---: | ---: | ---: | ---: | --- | --- |
| robust | 3.822966 | 0.158824 | 2.366976 | 0.515961 | 1.0471975512 | 0.523598775598 |
| ideal_expected | 3.822966 | 0.158824 | 2.366976 | 0.515961 | 1.0471975512 | 0.523598775598 |
| noisy_expected | 3.822966 | 0.158824 | 2.366976 | 0.515961 | 1.0471975512 | 0.523598775598 |
| success_probability | 3.822966 | 0.158824 | 2.366976 | 0.515961 | 1.0471975512 | 0.523598775598 |