Rhein-Pegel Β· Citizen-Science

Scientific Methodology

Last updated 2026-05-07

This page documents in detail the methods used to derive physically interpretable variables from raw gauge data, and to build and validate forecast models. All formulas, training/test splits, NSE/RMSE values and known caveats are documented openly.

Phase 2.1 β€” NHN heights & longitudinal water-surface slope

Theory

Each gauge has a fixed gauge zero (PNP β€” Pegelnullpunkt) given as height above NHN (German height datum). Adding the measured water level $W(t)$ to the PNP yields the absolute water-surface height $h_\text{NHN}(t)$:

$$h_\text{NHN}^\text{station}(t) = \text{PNP}_\text{station} + W^\text{station}(t)$$

Differences between consecutive stations divided by the river-km distance give the local longitudinal water-surface slope (dimensionless, typically a few cm/km):

$$S(t) = \frac{h_\text{NHN}^\text{station-A}(t) - h_\text{NHN}^\text{station-B}(t)}{\text{km}\text{B} - \text{km}\text{A}}$$

Implementation

phase2_nhn_v2.py (year-by-year, InfluxDB-friendly): writes nhn and gefaelle measurements to rhein_derived. Run for 2017-2026: 844 k NHN points + 764 k slope points. Mainz NHN starts only on 2019-11-01 (PNP-validity start).

Results

The Upper Rhine plain has a typical slope of 0.15–0.25 m/km at mean flow, increasing to 0.30–0.40 m/km at flood. The slope is not constant along the reach: backwater from the Neckar mouth at Mannheim and the Main mouth at Mainz produces local minima.

Phase 2.2 β€” Global cross-correlation (CCF)

Theory

The cross-correlation function (CCF) between the first differences $dW/dt$ at two stations identifies the typical lag at which a wave passes from one to the other:

$$\rho_{AB}(\tau) = \text{Corr}!\left(dW_A/dt, \; dW_B/dt(\cdot + \tau)\right)$$

The argmax of $\rho_{AB}$ is the wave celerity time for that reach.

Results 2020–2026 (5 segments)

Segment Distance (km) Best $\tau$ (h) $\rho_\text{max}$ Implied celerity (m/s)
Worms β†’ Nierstein 37 6 0.097 1.7
Nierstein β†’ Mainz 18 4 0.115 1.3
Worms β†’ Mainz 55 10 0.077 1.5
Speyer β†’ Worms 43 7 0.078 1.7
Speyer β†’ Mainz 98 17 0.067 1.6

Assessment

The global CCF is noisy because of the dominance of low-frequency variability (long memory, see Phase 2.3) and contamination by reservoir-release transients. Peak $\rho \approx 0.07$–0.12 is too low for routing. The event-based variant (Phase 2.4) gives much cleaner numbers.

Phase 2.3 β€” Autocorrelation + Welch spectrum

Theory

The autocorrelation function $\rho(\tau)$ of $W(t)$ measures the system memory. The e-folding time $\tau_e$ is defined by $\rho(\tau_e) = 1/e$. The Welch spectrum gives the power-spectral density (PSD) of $W$ and reveals periodic components.

Results (year 2024, reference)

Station $\tau_e$ (h) $\rho(24\,\text{h})$ $\rho(72\,\text{h})$ PSD@24h SNR
Worms > 168 0.972 0.844 2.41
Nierstein > 168 0.973 0.846 3.18
Mainz > 168 0.971 0.842 2.62
Mannheim Neckar > 168 0.969 0.838 2.84

Assessment

The Mittelrhein is a very slow system: the e-folding time exceeds 7 days, $\rho(24\,\text{h}) \approx 0.97$. A pure persistence forecast at +24 h is therefore already a strong baseline (NSE > 0.94). The 1/24-h periodic peak is detectable in all stations and is most pronounced at Nierstein and Mannheim-Neckar. Likely causes: shipping traffic, evaporation, or release schedules of Swiss reservoirs upstream β€” open question for further investigation.

Phase 2.4 β€” Event-based CCF

Motivation

Restrict the cross-correlation analysis to identified flood events (Worms-W > 450 cm, β‰₯ 14 days apart). This eliminates the low-frequency background and yields physically meaningful celerity values.

Results β€” highlights

phase2_eventccf.py finds 14 events in 2018-2026, including the major HW January 2018 (peak 621 cm). Event-CCF results, segment Worms β†’ Mainz:

Assessment

The event-based CCF is the physically reliable celerity estimator for this reach. The values agree with hydraulic theory and provide the basis for Phase 6 wave propagation.

Phase 3.1 β€” Muskingum routing

Theory

The classical Muskingum scheme routes a discharge wave through a reach with two parameters $K$ (lag) and $x$ (weighting):

$$Q_\text{out}(t+\Delta t) = C_0 Q_\text{in}(t+\Delta t) + C_1 Q_\text{in}(t) + C_2 Q_\text{out}(t)$$

with $C_0, C_1, C_2$ derived from $K, x, \Delta t$.

v2: Dual-input with $Q_\text{lat}$

We extend to two upstream inflows (Worms and Raunheim/Main) plus a constant lateral inflow:

$$Q_\text{Mainz}(t+\Delta t) = f_\text{Musk}(Q_\text{Worms}, K_1, x_1) + f_\text{Musk}(Q_\text{Raunheim}, K_2, x_2) + Q_\text{lat}$$

Optimisation by differential evolution on 6 flood events 2018-2024.

Results

Globally fitted parameters: $K_1 = 14.3$ h, $x_1 = 0.00$, $K_2 = 0.4$ h, $x_2 = 0.012$, $Q_\text{lat} = -20$ mΒ³/s.

Cross-validated NSE > 0.97 on all 6 flood events both train and test. Peak-flow error typically < 3 %.

Caveats

Phase 3.2 β€” ARX baseline forecast for H_Mainz

Model

SARIMAX(2,0,0) with exogenous variables: WORMS_lag7, NIERSTEIN_lag1, RAUNHEIM_lag1, hourly resolution. Train 2020-2023, test 2024.

Coefficients (fitted)

NIERSTEIN_lag1   1.286
WORMS_lag7       0.082
RAUNHEIM_lag1    0.044
AR(1)            0.951
AR(2)           βˆ’0.041

Results

One-step-ahead NSE 0.95, RMSE 16.5 cm on 2024.

Known bug

The intended multi-lead-time output (1 h, 4 h, 8 h, 12 h) gives identical predictions for all leads β€” the ARX is genuinely 1-step-ahead and the lead-time differentiation is not implemented correctly. Real direct-method multi-lead-time forecasts are addressed in Phase 6b.

Model comparison (Phase 3)

Model Skill (vs. persistence) Strengths Weaknesses
Muskingum-v2 NSE > 0.97 on Q Physically interpretable, robust Q-only, not water-level
ARX SARIMAX NSE 0.95 (1-step) Fast, exogenous lag awareness Single lead time, not extendable

Conclusion: Muskingum is the best routing baseline; ARX needed redesign for genuine multi-lead-time forecasts (β†’ Phase 6b).

Phase 6 — Wave propagation Worms→Nierstein (40 segments)

Idea

Sample 40 points along the OSM Rhine line Worms β†’ Nierstein. For each point, display the Worms water level shifted in time by the segment-specific lag $\Delta t = \text{km} \cdot 1000 / c$ with constant celerity $c = 2.5$ m/s. The wave visibly travels along the reach as time progresses.

Geometry

geometry.py (local) samples 40 segments along the OSM Rhine polyline, writes rhein_segments.geojson and rhein_segments_meta.csv. write_segments_meta.py writes the 40 segment records (lat/lon/km) as station_meta to rhein_meta (kind=segment).

Choice of wave celerity

Constant $c = 2.5$ m/s based on the median event-CCF result (Phase 2.4), Lighthill-Whitham consistency. Values 2.5-4 m/s observed across events; 2.5 chosen as the conservative low end (slightly slower visualisation).

Pipeline (live, every 15 min)

  1. propagate.py reads current Worms-W from rhein_raw
  2. For each segment point, computes the Worms value at $t - \Delta t$
  3. Writes per-segment record to rhein_derived.rhein_segment
  4. Writes a styled GeoJSON /srv/web/maps/rhein_segments_live.geojson with per-feature stroke colour (SimpleStyle spec) for Leaflet rendering

Phase 6b β€” Worms forecast (direct-method ARX, 4 lead times)

Architecture

Four independent Ridge-regression models, one per lead time {1, 4, 8, 12} h. Direct method: $\hat{W}_\text{Worms}(t+L) = f_L(\text{features}(t))$ β€” no recursion, each lead has its own dedicated model.

Features (26 per model)

Lagged values of upstream gauges (Speyer, Mannheim, Worms-persistence) at lags up to 24 h, plus seasonality (sin/cos of hour-of-day and day-of-year).

Training

Train 2018-2024, test 2024-2025. Standard scaling + Ridge (Ξ±=1.0).

Results (hold-out 2024-2025)

Lead NSE RMSE (cm) Skill vs persistence
1 h 1.0000 0.4 62 %
4 h 0.9998 1.4 67 %
8 h 0.9991 2.7 68 %
12 h 0.9978 4.2 65 %

Pipeline (live, hourly)

forecast_cron.py loads pickled models, predicts +1/4/8/12 h, writes to rhein_forecast.worms_arx_forecast. Future-segment GeoJSON is interpolated for the time-slider animation.

Phase 6.5 β€” Leaflet iframe for per-segment colouring

Problem

Grafana 13's geomap panel cannot render GeoJSON LineStrings with per-feature colours β€” all features inherit the panel-level style colour.

Solution

External Leaflet front-end embedded in an iframe, rendering the styled GeoJSON directly with per-feature stroke properties.

Grafana configuration

Phase-3 dashboard panel id=5 is now a Text panel in HTML mode with <iframe src="/maps/welle.html">. Same-origin (both behind Caddy on rhein-pegel.duckdns.org), no X-Frame-Options issues.

Live URL

https://rhein-pegel.duckdns.org/maps/welle.html

Forecast model comparison

Model Skill +12h Skill +24h Comments
Persistence 0 % (def.) 0 % Baseline
ARX SARIMAX (Phase 3.2) n/a n/a Bug in lead differentiation
Direct-method ARX (Phase 6b) 65 % n/a Lead 1-12 h only
Direct-method ARX-v2 (Phase 6c) 64 % 63 % Lead 1-24 h, 11 stations
ARX-v4 (Phase 7.1) 65 % 65 % + DWD precipitation past
ARX-v5 (Phase 7.1b) 70 % 68 % + ICON forecast (perfect-foresight upper bound)
ARX-v6 (Phase 7.1c) 68 % 65 % + TIGGE reforecast (honest skill)

Phase 6c β€” Extension to 11 gauges + 24-h forecast + flood warner

Motivation

Phase 6b had only 6 stations and 4 lead times. Extending to 11 stations (4 additional southern gauges Iffezheim/Plittersdorf/Maxau/Philippsburg) and 7 lead times {1, 4, 8, 12, 16, 20, 24} h.

4 additional southern gauges

Full 8-year backfill 2018-2026, ~2 M new data points in rhein_raw.

Direct-method ARX-v2

Same architecture as Phase 6b, 102 features with extended lag windows: Iffezheim 0-28, Plittersdorf 0-24 (step 2), Maxau 0-20, Speyer 0-14, Mannheim 0-6, Worms 0-3.

Results v2 (hold-out 2024-2026)

Lead NSE RMSE (cm) Skill
1 h 1.0000 0.4 60 %
4 h 0.9999 1.2 70 %
8 h 0.9994 2.4 70 %
12 h 0.9985 3.6 69 %
16 h 0.9974 4.8 69 %
20 h 0.9959 6.0 68 %
24 h 0.9938 7.4 64 %

Phase 6c map (Maxau β†’ Mainz + Altrhein)

154 segments (136 main Rhine + 18 Altrhein), spatial linear interpolation between the 7 official gauges (no longer time-shifted from a single station). Real longitudinal water-level profile.

⚠ PNP limitation

405 cm at Maxau is not directly comparable to 87 cm at Worms β€” different gauge zeros. The Worms threshold-based colouring is therefore "rough indication only" in the Maxau region. Phase 6d addresses this with local MNW/MHW scaling.

Pushover flood warner

hw_alert.py (every 15 min): checks current Worms + forecast against thresholds 430/500/640 cm, sends Pushover alert (priority 0/1/2). 6 h cooldown per threshold. State stored in /app/data/hw_alert_state.json.

Pipeline overview (Phase 6c, 2026-05-05)

poller.py          (every 15 min)  β†’ rhein_raw   (live observations 11 stations)
propagate_v2.py    (every 15 min)  β†’ rhein_derived.rhein_segment_v2 + GeoJSON
brightsky_poller.py (hourly)       β†’ rhein_raw.weather + rhein_forecast.weather_forecast
forecast_cron_v2.py (hourly)       β†’ rhein_forecast.worms_arx_v2_forecast (1-24 h)
hw_alert.py         (every 15 min) β†’ Pushover

Phase 6d β€” Local MNW/MHW colour scale

Problem with Phase 6c

Worms thresholds applied uniformly across all 154 map segments produced incorrect colours at Maxau (low water at local 405 cm coloured as mean water on the Worms scale, which would correspond to 1300+ cm at Worms β€” physically nonsense).

Solution: ratio-based colouring per segment

For each main-Rhine segment, compute a normalised flood index:

$$r = \frac{W - \text{MNW}\text{local}}{\text{MHW}\text{local} - \text{MNW}_\text{local}}$$

where MNW/MHW values are linearly interpolated along the Rhine-km axis between the 7 official gauges (whose MNW/MHW are queried from PEGELONLINE station meta).

Altrhein keeps Worms-Boathouse scale

The Altrhein navigability is bound to the Worms gauge anyway β€” keeping the original 8-colour scale (purple/cyan/green/yellow/orange/red/dark red) for the navigability-related thresholds 100 / 150 / 200 / 430 / 460 / 500 / 510 / 550 / 640 cm.

Pushover bias correction

hw_alert.py applies bias correction from the 2024 backtest. Pushover message shows corrected value Β± RMSE range. Bias values (cm) at flood: {1: +0.1, 4: +0.2, 8: +1.2, 12: +2.8, 16: +4.8, 20: +7.1, 24: +9.7}.

Phase 6f β€” Muskingum sub-routing for the map

Motivation

Phase 6c spatial interpolation between gauges is geometrically simple but not physical β€” a flood wave travels south-to-north through the reach, gauges record at different times. Linear interpolation smears the wave.

Solution: Muskingum routing per gauge reach

Eight Muskingum reaches calibrated on mean-water-level anomaly: NSE 0.97-0.99, RMSE 6.7-16.3 cm. Implied celerity 0.8-3.1 m/s per reach.

Scripts: fit_muskingum.py, propagate_v3.py, /app/data/muskingum_v3_params.pkl. Cron now driven by propagate_v3.py.

MW-anomaly trick (PNP correction)

To work around different gauge zeros, all Muskingum routing is done on the anomaly $W - \text{MW}$ (deviation from mean water). Result transferred back to absolute height via $\text{MW}_\text{target}$ at the destination gauge.

Calibration of 8 reaches (train 2018-2024)

Reach NSE RMSE $K$ (h) $x$ Implied $c$ (m/s)
Iffezheim β†’ Plittersdorf 0.99 6.7 1.5 0.01 0.7
Plittersdorf β†’ Maxau 0.99 8.2 4.6 0.05 1.3
Maxau β†’ Speyer 0.98 9.1 6.0 0.10 1.8
Speyer β†’ Mannheim 0.98 11.5 7.0 0.13 1.0
Mannheim β†’ Worms 0.97 14.7 4.5 0.05 1.2
Worms β†’ Nierstein 0.97 16.3 12.0 0.30 0.9
Nierstein β†’ Mainz 0.99 7.5 5.0 0.05 1.0

Sub-routing per map segment

Within each map segment, Muskingum sub-routing with $K_\text{seg} = \max(0.5, \alpha \cdot K_\text{full})$ where $\alpha$ is the segment fraction of the full reach. Flood waves now propagate physically correctly south-to-north.

Validation Phase 6b/c β€” Backtest June 2024 flood

Time series plot

ARX-v2 forecast for the hold-out flood event (peak 695 cm at 2024-06-04 03:00). Cross-validated against the actual Worms record. /maps/backtest_hw2024.png.

Skill per lead time

/maps/backtest_hw2024_skill.png β€” RMSE 0.6-27.3 cm at lead 1-24 h.

Threshold crossings β€” flood-warner simulation

Threshold Crossed at Pre-fire warning time
430 cm 2024-06-03 17:30 + 17 h at lead 24 h
500 cm 2024-06-04 00:00 + 12 h at lead 12 h
640 cm 2024-06-04 02:00 + 9 h at lead 8 h

All three thresholds triggered the alert with 9-17 h pre-fire warning, zero false alarms.

Statistics in the flood regime (Worms-obs > 400 cm, n = 278-301)

Lead RMSE (cm) bias (cm) MAPE-HW
1 h 0.6 +0.1 0.001
4 h 1.7 +0.2 0.003
8 h 4.4 +1.2 0.008
12 h 8.7 +2.8 0.014
16 h 14.2 +4.8 0.024
20 h 20.1 +7.1 0.034
24 h 27.3 +9.7 0.046

Phase 6g β€” Multi-event backtest (all 11 floods 2018-2026)

Methodology

backtest_multi.py iterates over all 11 identified flood events 2018-2026 (Worms-W > 450 cm, gap > 14 days). For each event, runs ARX-v2 forecasts at lead 1-24 h on the surrounding Β±5-day window and computes pre-fire warning rate, skill, and bias.

Result β€” flood-warner trigger statistics

Threshold Events Pre-fire-Hits Misses Pre-fire rate Mean lead time
430 cm 11 10 0 (1 not crossed) 95 % 16-21 h
500 cm 8 8 0 100 % 12-15 h
640 cm 1 1 0 100 % 9 h

Skill per lead time (averaged over 11 events)

/maps/backtest_multi_skill.png β€” robust bias/RMSE values from 11-event statistics. More representative than the single 2024 event analysis.

Phase 6h β€” Conformal prediction with empirical residual quantiles

Problem with Phase 6d bias correction

A point bias correction is asymmetric in the flood regime β€” observation distribution is right-skewed (rare extremely high peaks), so the residual $W^\text{obs} - W^\text{pred}$ has a heavier upper tail than lower.

Conformal-prediction approach

arx_v2_conformal.py extends the v2 pickle with empirical training-residual quantiles in the HW subset (q05/q50/q95). Asymmetric 90 % prediction interval at +24 h: q05 = βˆ’13.5, q95 = +29.9 cm (right-skewed).

Pushover message: 90 % PI instead of RMSE assumption

hw_alert.py uses the RES_Q table. Trigger condition: $w_\text{pred} + q_{95}(\text{lead}) \geq \text{threshold}$ (conservative). Pushover message shows the 90 % range, not a Gaussian RMSE assumption.

Phase 6e β€” Time-slider animation

welle.html extended with a slider from βˆ’12 h to +24 h, auto-play option. Backend animation_state.py (cron every 15 min) produces a JSON snapshot with 37 hourly slices Γ— 154 segments at /maps/welle_animation_state.json. ANIM mode toggle in the map.

The animation makes the flood wave visibly travel from Maxau to Mainz (typical timing ~13 h south-to-north for the entire reach).

Phase 7 roadmap (open scientific improvements)

Phase 7.1 β€” DWD precipitation forecast as ARX feature

Status: v4 (past precipitation) and v5 (ICON forecast with perfect-foresight) deployed; v6 (TIGGE reforecast, honest skill) deployed 2026-05-06.

Phase 7.2 β€” Saint-Venant routing (instead of Muskingum)

Replace the linear Muskingum scheme with the full Saint-Venant 1-D shallow-water equations. Expected gain: better representation of flood-wave celerity dispersion (Muskingum has constant $c$ within a reach).

Phase 7.3 β€” BfG-WaVo benchmark comparison

The German Federal Institute for Hydrology (BfG) operates an authoritative water-level forecast (WaVo). No public forecast API yet; benchmark against published BfG bulletins manually.

Phase 7.4 β€” True quantile regression instead of conformal

Replace the empirical-quantile conformal approach with proper quantile-regression heads, per lead time.

Phase 7.1 β€” DWD precipitation feature (ARX-v4)

Data source: Brightsky API (DWD wrapper)

3 stations along the Black Forest crest (the Rhine flood-source region): - Karlsruhe (49.04, 8.40) - Freudenstadt (48.46, 8.41) - Triberg (48.13, 8.23)

Feature engineering

PRECIP_<STATION>_<W>h for $W \in {24, 48, 72}$: rolling-window cumulative precipitation, including the current hour. 9 additional features (3 stations Γ— 3 windows).

Skill comparison v3 (no precipitation) vs v4 (with precipitation)

Lead v3 RMSE v4 RMSE Improvement
24 h 7.77 7.25 βˆ’7 %

New leads enabled by v4: +36 h NSE 0.98, +48 h NSE 0.95, +72 h NSE 0.86.

Operational pipeline

Brightsky polls hourly, ARX-v4 runs hourly forecast cron at :10.

Phase 7.1b β€” DWD-ICON precipitation forecast as feature (open at the time)

Currently v4 uses only historical precipitation (cumulative windows). Next improvement: include the DWD-ICON precipitation forecast (0-72 h) as additional features. Conceptually harder because the model would need training with "what did the ICON forecast look like X hours ago" β€” historical ICON archives exist (DWD CDC) but require non-trivial pipeline (GRIB2 files). β†’ realised in v5 with perfect-foresight surrogate.

Phase 7.1b β€” DWD-ICON precipitation forecast as ARX feature (v5)

Idea

Phase 7.1 (v4) used only historical precipitation cumulants. Phase 7.1b (v5) adds forward-looking cumulant windows (0..+24 h, 0..+48 h, 0..+72 h per weather station = 9 additional future features).

Methodological challenge: training with perfect-foresight

No historical DWD-ICON forecast archives are stored locally (Brightsky keeps only current forecasts). Therefore training uses perfect-foresight:

$$\text{PRECIP_KARLSRUHE_FUT_24h}(t) = \sum_{i=t+1}^{t+24} P_\text{KARLSRUHE}^\text{obs}(i)$$

This is physically not correct for a live model β€” the actual 24-h-future precipitation is known only 24 h later. In the live system (forecast_cron_v5.py), these features are filled from the DWD-ICON forecast stored in rhein_forecast.weather_forecast by the Brightsky poller.

Consequence: the test-skill on hold-out 2024-2026 is an upper bound. In live operation the skill will be slightly lower, since ICON-D2 precipitation forecasts have a typical RMSE of 1-2 mm/24 h. A genuine Phase 7.1c would train with DWD-CDC ICON archives β€” laborious (GRIB2 files, separate pipeline).

Skill comparison v4 vs v5 (hold-out 2024-2026)

Lead v4 RMSE v5 RMSE reduction v4 skill v5 skill gain
+1 h 0.42 0.42 β€” 60 % 60 % β€”
+4 h 1.18 1.18 β€” 70 % 70 % β€”
+8 h 2.31 2.30 βˆ’0 % 70 % 70 % β€”
+12 h 3.40 3.35 βˆ’1 % 69 % 70 % + 1
+16 h 4.54 4.40 βˆ’3 % 69 % 70 % + 1
+20 h 5.79 5.46 βˆ’6 % 68 % 69 % + 1
+24 h 7.25 6.60 βˆ’9 % 65 % 68 % + 3
+36 h 13.45 11.10 βˆ’17 % 55 % 63 % + 8
+48 h 21.27 16.15 βˆ’24 % 42 % 56 % + 14
+72 h 36.28 24.26 βˆ’33 % 25 % 50 % + 25

Big gain at long leads: +72 h skill doubled (25 β†’ 50 %), RMSE reduced by a third. Three-day forecasts become operationally usable for the first time.

Live forecast (now)

State 2026-05-06 16:30 UTC with ICON precipitation forecast:

KARLSRUHE     future-precip 24h = 11.6 mm
FREUDENSTADT  future-precip 24h = 7.1 mm
TRIBERG       future-precip 24h = 7.0 mm
Lead v4 (no forecast) v5 (with ICON) difference
+24 h 107.7 cm 112.8 cm +5.1 cm
+48 h 114.0 cm 126.8 cm +12.8 cm
+72 h 117.1 cm 127.6 cm +10.5 cm

V5 predicts higher levels because the model "anticipates" upcoming precipitation activity.

Pipeline

brightsky_poller.py (cron 30 *)
  β”œβ”€ rhein_raw.weather (DWD observations)
  └─ rhein_forecast.weather_forecast (DWD-ICON forecast 0-72h)

arx_worms_v5.py (one-off, pickled)
  Train 2018-2024 with perfect-foresight precipitation

forecast_cron_v5.py (cron 20 *)
  Past precipitation from rhein_raw.weather (cumulative βˆ’24/βˆ’48/βˆ’72 h)
  Future precipitation from rhein_forecast.weather_forecast (cumulative +24/+48/+72 h)
  β†’ rhein_forecast.worms_arx_v5_forecast (lead 1..72 h)

Phase 7.1c β€” TIGGE reforecast as honest forecast features (v6)

Idea

Phase 7.1b (v5) trained with perfect-foresight: real future precipitation observations replaced the (not-archived-at-the-time) ICON forecast. Physically incorrect β€” the resulting test-skill is an upper bound, because the model "knew" during training what would actually happen later.

Phase 7.1c closes this methodological gap: training with real historical forecast archives downloaded from the TIGGE dataset on the ECMWF Data Store (ECDS).

Data source: TIGGE on ECDS

Backfill pipeline

tigge_pilot.py / tigge_backfill.py (one-off)
  β”œβ”€ cdsapi request per month 2018-01..2024-12
  β”œβ”€ GRIB2 β†’ cfgrib β†’ xarray
  β”œβ”€ spatial extraction: nearest grid point per station
  └─ rhein.weather_forecast_archive (measurement tigge_forecast,
                                     tags: station, origin, lead_h,
                                     time = init_time)

84 months Γ— 60 inits Γ— 13 lead steps Γ— 3 stations = ~196 000 data points, ~30 MB. Backfill 2026-05-06 21:00–23:16 UTC, all 84 months successful.

Lookup logic in training

For each training hour $t$, tigge_to_fut_features(...) finds:

  1. the most recent init run with init_time ≀ t (max 12 h old β†’ matches the live latency of ICON)
  2. $\Delta h = t - \text{init_time}$, then linear interpolation of the cumulative values between the available lead steps
  3. future feature PRECIP_<STATION>_FUT_<W>h := tp(min(72, Ξ”+W)) βˆ’ tp(Ξ”)

Feature names identical to v5 β†’ direct skill comparison.

Remaining methodological caveats

Caveat State v5 (Phase 7.1b) State v6 (Phase 7.1c)
Perfect-foresight in training βœ— present βœ“ lifted
Constant model version across training period n/a (observations) ⚠ TIGGE-IFS version drifts since 2018 (operational ECMWF releases)
Train-model = live-model βœ“ (both Brightsky/ICON) ⚠ Train: TIGGE-ECMWF-IFS / Live: Brightsky-DWD-ICON

The remaining caveats are milder: TIGGE-IFS drift acts only across years, but is methodologically much closer to the real live-model than the perfect-foresight substitute. Clean constant-version reforecast models (EFAS reforecast on EWDS) are reserved for Phase 7.1d, currently frozen (state 2024-11-11).

Skill comparison v5 (perfect-foresight) vs v6 (real forecasts) β€” hold-out 2024-2026

Lead v5 RMSE v6 RMSE Ξ” v5 skill v6 skill skill loss
+1 h 0.42 0.40 βˆ’0.02 60 % 63 % +3 (sic!)
+4 h 1.18 1.25 +0.07 70 % 69 % βˆ’1
+8 h 2.30 2.51 +0.21 70 % 69 % βˆ’1
+12 h 3.35 3.74 +0.39 70 % 68 % βˆ’2
+16 h 4.40 4.99 +0.59 70 % 68 % βˆ’2
+20 h 5.46 6.28 +0.82 69 % 67 % βˆ’2
+24 h 6.60 7.71 +1.11 68 % 65 % βˆ’3
+36 h 11.10 12.93 +1.83 63 % 59 % βˆ’4
+48 h 16.15 18.59 +2.44 56 % 52 % βˆ’4
+72 h 24.26 27.85 +3.59 50 % 45 % βˆ’5

Assessment:

Live pipeline (Phase 7.1c)

brightsky_poller.py (unchanged, cron 30 *)
  └─ rhein_forecast.weather_forecast (DWD-ICON forecast 0-72h, live)

arx_worms_v6.py (one-off, 2026-05-06)
  Train 2018-2024 with TIGGE reforecast features
  β†’ /app/data/worms_arx_v6_models.pkl (42 kB, 10 lead models)

forecast_cron_v6.py (cron 25 *, parallel to v5 cron 20 *)
  Past precipitation from rhein_raw.weather (same code as v5)
  Future precipitation from rhein_forecast.weather_forecast (Brightsky-DWD-ICON live)
  β†’ rhein_forecast.worms_arx_v6_forecast (lead 1..72 h)

Cron slot 25 instead of 20, so v5 and v6 run side-by-side β€” live-forecast comparison visible in the dashboard.

Status (2026-05-06)