# RA-QAOA Paper Draft: Results and Limitations

## Results

We evaluated robustness-aware QAOA parameter selection on 60 random Max-Cut
instances with 6, 8, and 10 nodes. For each graph, we compared p=1 and p=2 QAOA
parameter candidates under a depolarizing noise sweep of 0%, 4%, 8%, and 16%,
with 2% readout error. The main RA-QAOA selector optimizes a weighted score that
combines noisy expected cut, ideal-to-noisy degradation, success probability,
shot variance, and noise sensitivity.

At p=2, RA-QAOA did not maximize raw noisy expected cut. Relative to the
ideal-expected selector, robust minus baseline noisy expected cut was
`-0.136896` with 95% CI `[-0.181157, -0.092636]`.
However, the same comparison reduced noise sensitivity by
`-0.393734` with 95% CI
`[-0.521034, -0.266434]` and improved success probability by
`0.015303` with 95% CI `[0.007746, 0.022859]`.
This supports a stability-oriented interpretation: RA-QAOA sacrifices some
expected cut in exchange for parameters that vary less under the noise sweep.

The weight ablation confirms that the stability term changes the selected
parameters. Removing the sensitivity weight weakens the p=2 sensitivity delta to
`-0.029690` with 95% CI
`[-0.046874, -0.012506]`, while doubling the sensitivity weight strengthens it to
`-1.440525` with 95% CI
`[-1.622549, -1.258501]`. The effect is therefore not only a reporting artifact
of the selected metric; the score term materially changes the parameter choice.

The graph-size stratified analysis shows the same direction across the evaluated
sizes. The p=2 robust-minus-ideal sensitivity deltas were
`-0.253609` for 6-node graphs,
`-0.368436` for 8-node graphs, and
`-0.559158` for 10-node graphs. This
suggests that the aggregate stability result is not currently driven by only one
graph size.

Additional baselines clarify the scope of the claim. Compared with a
single-start SciPy Nelder-Mead optimizer, RA-QAOA had lower p=2 noisy expected
cut by `-1.561254` and lower
success probability by `-0.120209`,
but reduced noise sensitivity by
`-4.490405` with 95% CI
`[-4.999357, -3.981453]`. The multi-start optimizer check preserved the
same qualitative conclusion: robust minus multi-start optimizer noisy expected
cut was `-1.561331`, success
probability was `-0.120229`,
and noise sensitivity was `-4.490628`
with 95% CI `[-4.999651, -3.981605]`.

Together, these results support the following claim: RA-QAOA is not a
performance-maximizing optimizer, but a stability-oriented selector that can
trade raw noisy expected cut and success probability for substantially lower
noise sensitivity.

The 150-graph expanded suite preserves the same p=2 direction. Relative to the
ideal-expected selector, robust minus baseline noisy expected cut was
`-0.143569` with 95% CI
`[-0.171938, -0.115201]`, success probability was
`0.016922` with 95% CI
`[0.011660, 0.022183]`, and noise sensitivity was
`-0.412928` with 95% CI
`[-0.494519, -0.331337]`. This reduces the risk that the original 60-graph
result is a small-sample artifact.

Under the hardware-like proxy noise model, the stability result also remains:
robust minus ideal-expected noise sensitivity was
`-0.946489` with 95% CI
`[-1.144908, -0.748070]`. However, noisy expected cut was
`-0.279900` and success
probability was `-0.008033`
with 95% CI `[-0.019804, 0.003738]`. Therefore hardware-like results should be
reported as evidence for stability, not as evidence for success-probability
improvement.

## Limitations

First, all results use simulated final measurement noise rather than real QPU
runs. The hardware-like proxy scales noise by QAOA gate load, but it still does
not capture calibration drift, routing, crosstalk maps, layout constraints,
pulse schedules, or queue-time backend changes.

Second, the QAOA search space is intentionally small. The current suite uses a
4 x 4 per-layer grid and only p=1 and p=2. Larger grids, p=3, and optimizer-based
noise-aware objectives may change the trade-off frontier.

Third, the evaluated graph suite is still modest: 60 random graphs over 6, 8,
and 10 nodes. This is enough to find a repeatable trend, but not enough for a
broad claim about Max-Cut instances in general.

Fourth, the optimizer baseline is performance-oriented and uses a fixed
Nelder-Mead configuration. The multi-start check reduces the risk that the
single-start result is accidental, but it is not a complete study of optimizer
choice, initialization strategy, or convergence budget.

Fifth, lower noise sensitivity alone is not sufficient. Some weak or flat
parameter choices can also be insensitive to noise. Therefore RA-QAOA should be
reported as a performance-stability trade-off using both noisy expected cut and
sensitivity, not as a sensitivity-only method.

## Next Experiments

1. Add a limited p=3 pilot where runtime allows.
2. Compare against a noise-aware continuous optimizer objective.
3. Run a small real-backend pilot on 4-6 node graphs.
4. Replace the generic proxy profile with calibration-derived backend snapshots.
5. Convert this draft into the paper's Results section after the p=3 or real-backend pilot.
