С»ÆÊé

Skip to main content

Hydroclimatic Regime Conditioning of Seasonal Streamflow Forecasts in the Western US

Seasonal water supply forecasts in snow-dominated basins of the western United States rely on statistical regression models that treat all years as drawn from a single climatological population. We evaluate whether k-means clustering of cold-season hydroclimatic predictors can improve April–July streamflow forecasts by conditioning separate regressions on data-driven regimes, extending the stratified-training approach of Modi et al. (2022) to 29 CAMELS basins across five regions.Ìý

Under standard frozen-cluster leave-one-out cross-validation (FC-LOOCV), the regime model improves on a SWE-plus-precipitation baseline in 27 of 29 basins. Nested re-clustering within each fold (NR-LOOCV), which prevents information leakage from withheld years into cluster assignment, reduces genuine improvement to 2 of 29 basins, with median leakage of ΔR² = 0.21 and extreme cases exceeding 2.0. Leakage magnitude is predicted by a minimum-cluster residual degrees-of-freedom (DOF) criterion: basins where the smallest cluster supports fewer than five residual DOF cannot sustain stable regime-specific regressions, and standard cross-validation cannot detect this failure. In failing basins, skill loss concentrates in years that fall far from both cluster centroids; a distance-based fallback to the baseline forecast recovers skill in these cases. The k-means clusters recover physically interpretable warm/cold hydroclimatic regimes, and the method produces genuine skill gains where cluster balance and record length provide sufficient DOF. Honest evaluation of regime-conditioned forecasts requires nested re-clustering; frozen-cluster cross-validation is structurally biased and should not be used to claim skill improvement.