域随机化Sim-to-Real

训练时随机化物理参数使策略鲁棒迁移

π* = argmax J(π) over randomized envs

模式: 域随机化

等待演示...

切换模式查看Sim-to-Real核心机制