MPPI

Overview of the algorithm and its variants


Problem Formulation

Consider a general nonlinear system with discrete dynamics and cost function of the following form:

\[\newcommand{\vb}[1]{ {\bf #1} } \newcommand{\PP}[1]{\left(#1\right)} \newcommand{\R}{\mathbb{R}} \newcommand{\expf}[1]{\exp\PP{#1}} \newcommand{\normal}[1]{\mathcal{N}\PP{#1}} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\Expectation}[2][]{\mathbb{E}_{#1}\left[#2\right]} \newcommand{\J}{\vb{J}} \newcommand{\Shape}{\vb{S}}\] \[\begin{align} \vb{x}_{t+1} &= \vb{F}\PP{\vb{x}_t, \vb{u}_t} \\ \vb{J}(X, U) &= \phi(\vb{x}_{T}) + \sum_{t = 0}^{T - 1}\vb{\ell}\PP{\vb{x}_t, \vb{u}_{t}} \label{eq:cost_function} \end{align}\]

where $\vb{x} \in \R^{n_x}$ is the state of dimension $n_x$, $\vb{u} \in \R^{n_u}$ is the control of dimension $n_u$, $T$ is the time horizon, $x$ is a state trajectory \(\left[\vb{x}_1, \vb{x}_2, ..., \vb{x}_T\right]\), $U$ is a control trajectory \([\vb{u}_0, \vb{u}_1, ..., \vb{u}_{T-1}]\), $\phi$ is the terminal cost, and $\vb{\ell}$ is the running cost.

MPPI Algorithm Overview

Model Predictive Path Integral (MPPI) is a stochastic optimal control algorithm that minimizes the cost function \eqref{eq:cost_function} the use of sampling. We start by sampling control trajectories, running each trajectory through the dynamics to create a corresponding state trajectory, and then evaluating each state and control trajectory through the cost function. Each trajectory’s cost is then run through the exponential transform,

\[\begin{align} S(\vb{J};\lambda) = \expf{-\frac{1}{\lambda} \vb{J}}, \end{align}\]

where $\lambda$ is the inverse temperature. Finally, a weighted average of the trajectories is conducted to produce the optimal control trajectory. The update law for $\mathcal{U}^{*}(t)$, the optimal trajectory at time $t$, ends up looking like

\[\begin{align} \mathcal{U}^{*}_t &= \sum_{m=1}^{M} \frac{\expf{-\frac{1}{\lambda} \vb{J}\PP{X^m,V^m}}\vb{v}^m_t}{\sum_{j=1}^{M}\expf{-\frac{1}{\lambda} \vb{J}\PP{X^j,V^j}}}\\ &= \vb{u}_t + \sum_{m=1}^{M} \frac{\expf{-\frac{1}{\lambda} \vb{J}\PP{X^m,V^m}}\epsilon^m_t}{\sum_{j=1}^{M}\expf{-\frac{1}{\lambda} \vb{J}\PP{X^j,V^j}}}, \label{eq:mppi_update_rule} \end{align}\]

where $V^m$ is the $m$-th sampled control trajectory, $\vb{v}^m_t = \vb{u}_t + \epsilon^m_t$ is the control from the $m$-th sampled trajectory at time $t$ sampled around the previous optimal control, $\vb{u}_t$, with $\epsilon^m_t \sim \normal{0, \sigma^2}$. Sampling in the control space ensures that the trajectories are dynamically feasible and allows us to use non-differentiable dynamics and cost functions. In practice, we have found that finding the smallest cost sample, $\rho$, and subtracting it from all the costs before going through the exponential transform helps to robustify the optmization task. Pseudo code for the algorithm is shown below.

Derivation and other algorithms

The original derivation of MPPI was done from a path-integral approach [1]. Future papers then derived MPPI from information theory [2], stochastic search [3], and mirror descent [4] approaches. A Tube-based MPPI controller [5] was also created in order to improve robustness to state disturbances. It made use of a tracking controller to cause the real system to track back to a nominal system that ignored state disturbances causing large costs. Both the real and the nominal trajectories are calculated using MPPI while the tracking controller was iterative Linear Quadratic Regulator (iLQR). In this setup, the tracking controller would always be the one sending controls to the system and as MPPI was not aware of the tracking controller, it could end up fighting against the tracking controller. In order to address this, Robust MPPI (RMPPI) was developed in [6] [7], which applied the tracking controller feedback within the samples MPPI used. RMPPI also contains other changes that when taken together provided an upper bound on how quickly the cost function can grow due to disturbance. Our library contains implementations of these algorithmic improvements as different controllers are the best choice in different scenarios.

References

[1] G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “Aggressive Driving with Model Predictive Path Integral Control,” in 2016 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2016, pp. 1433–1440. [Online]. Available: https://ieeexplore.ieee.org/document/7487277/

[2] G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving,” IEEE Transactions on Robotics, vol. 34, no. 6, pp. 1603-1622, 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8558663

[3] Z. Wang, O. So, K. Lee, and E. A. Theodorou, “Adaptive Risk Sensitive Model Predictive Control with Stochastic Search,” in Proceedings of the 3rd Conference on Learning for Dynamics and Control. PMLR, 2021, pp. 510–522. [Online]. Available: https://proceedings.mlr.press/v144/wang21b.html

[4] N. Wagener, C.-A. Cheng, J. Sacks, and B. Boots, “An Online Learning Approach to Model Predictive Control,” in Proceedings of Robotics: Science and Systems, Freiburg im Breisgau, Germany, Jun. 2019. [Online]. Available: https://www.roboticsproceedings.org/rss15/p33.pdf

[5] G. Williams, B. Goldfain, P. Drews, K. Saigol, J. Rehg, and E. Theodorou, “Robust Sampling Based Model Predictive Control with Sparse Objective Information,” in Robotics: Science and Systems XIV. Robotics: Science and Systems Foundation, Jun. 2018. [Online]. Available: http://www.roboticsproceedings.org/rss14/p42.pdf

[6] M. Gandhi, B. Vlahov, J. Gibson, G. Williams, and E. A. Theodorou, “Robust Model Predictive Path Integral Control: Analysis and Performance Guarantees,” IEEE Robotics and Automation Letters, vol. 6, no. 2, pp. 1423-1430, Feb. 2021. [Online]. Available: https://arxiv.org/abs/2102.09027v1

[7] G. R. Williams, “Model Predictive Path Integral Control: Theoretical Foundations and Applications to Autonomous Driving,” 2019


Project maintained by ACDSLab Hosted on GitHub Pages — Theme by mattgraham and orderedlist