Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
Franki Nguimatsia Tiofack*, Théotime Le Hellard*, Fabian Schramm*, Nicolas Perrin-Gilbert, Justin Carpentier
* Equal contribution
Published in Fourteenth International Conference on Learning Representations (ICLR), 2026
We introduce Guided Flow Policy (GFP), which couples a multi-step flow-matching policy with a distilled one-step actor to focus on learning from high-value actions in offline reinforcement learning.
