Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
Franki Nguimatsia Tiofack*, Théotime Le Hellard*, Fabian Schramm*, Nicolas Perrin-Gilbert, Justin Carpentier
Published in arXiv, 2025
We introduce Guided Flow Policy (GFP), which couples a multi-step flow-matching policy with a distilled one-step actor to focus on learning from high-value actions in offline reinforcement learning.
Recommended citation: Tiofack, F. N., Le Hellard, T., Schramm, F., Perrin-Gilbert, N., & Carpentier, J. (2025). "Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning." arXiv preprint arXiv:2512.03973. https://arxiv.org/abs/2512.03973
