Getting Started

Some simple examples of how to use MPPI-Generic

\[\newcommand{\vb}[1]{ {\bf #1} } \newcommand{\PP}[1]{\left(#1\right)} \newcommand{\R}{\mathbb{R}} \newcommand{\expf}[1]{\exp\PP{#1}} \newcommand{\normal}[1]{\mathcal{N}\PP{#1}} \newcommand{\abs}[1]{\left|#1\right|} \newcommand{\Expectation}[2][]{\mathbb{E}_{#1}\left[#2\right]} \newcommand{\J}{\vb{J}} \newcommand{\Shape}{\vb{S}}\]

Library Class Description

This library is made up of 6 major types of classes:

Dynamics
Cost Functions
Controllers
Sampling Distributions
Feedback Controllers
Plants

The Dynamics and Cost Function classes are self-evident and are classes describing the $\vb{F}$, $\ell$, and $\phi$ functions described here. The Controller class finds the optimal control sequence $U$ that minimizes the cost $\vb{J}(X, U)$ using algorithms such as MPPI. The Sampling Distributions are used by the Controller class to generate the control samples used for determining the optimal control sequence. The Feedback Controller class determines what feedback controller is used to help push the system back towards the desired trajectory. Unless otherwise specified, the Feedback Controllers in code examples are instantiated but turned off by default. Finally, Plants are a MPC wrapper around a given controller and are where the interface methods in and out of the controller are generally defined. For example, a common-use case of MPPI is on a robotics platform running Robot Operating System (ROS) [1]. The Plant would then be where you would implement your ROS subscribers to information such as state, ROS publishers of the control output, and the necessary methods to convert from ROS messages to MPPI-Generic equivalents. Each class type has their own parameter structures which encapsulate the adjustable parameters of each specific instantiation of the class. For example, the cartpole dynamics parameters include mass and length of the pendulum whereas a double integrator dynamics system has no additional parameters.

For those who want to use pre-existing Dynamics and Cost Functions, the MPPI-Generic library provides a large variety. Pre-existing Dynamics include a quadcopter, 2D double integrator, cartpole, differential-drive robot [2], various Autorally models [3], and various models learned for a Polaris RZR X [4]. The various Cost Functions included are mostly specific to each Dynamics model but we do provide a Dynamics-agnostic quadratic cost as well.

There are two ways to use the MPPI-Generic library: stand-alone or as a MPC controller.

Stand-alone Usage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <mppi/controllers/MPPI/mppi_controller.cuh>
#include <mppi/cost_functions/cartpole/cartpole_quadratic_cost.cuh>
#include <mppi/dynamics/cartpole/cartpole_dynamics.cuh>
#include <mppi/feedback_controllers/DDP/ddp.cuh>

const int NUM_TIMESTEPS = 100;
const int NUM_ROLLOUTS = 2048;

using DYN_T = CartpoleDynamics;
using COST_T = CartpoleQuadraticCost;
using FB_T = DDPFeedback<DYN_T, NUM_TIMESTEPS>;
using SAMPLING_T = mppi::sampling_distributions::GaussianDistribution<DYN_T::DYN_PARAMS_T>;
using CONTROLLER_T = VanillaMPPIController<DYN_T, COST_T, FB_T, NUM_TIMESTEPS, NUM_ROLLOUTS, SAMPLING_T>;
using CONTROLLER_PARAMS_T = CONTROLLER_T::TEMPLATED_PARAMS;

int main(int argc, char** argv) {
  float dt = 0.02;
  std::shared_ptr<DYN_T> dynamics = std::make_shared<DYN_T>();  // set up dynamics
  std::shared_ptr<COST_T> cost = std::make_shared<COST_T>();    // set up cost
  // set up feedback controller
  std::shared_ptr<FB_T> fb_controller = std::make_shared<FB_T>(dynamics.get(), dt);
  // set up sampling distribution
  SAMPLING_T::SAMPLING_PARAMS_T sampler_params;
  std::fill(sampler_params.std_dev, sampler_params.std_dev + DYN_T::CONTROL_DIM, 1.0);
  std::shared_ptr<SAMPLING_T> sampler = std::make_shared<SAMPLING_T>(sampler_params);

  // set up MPPI Controller
  CONTROLLER_PARAMS_T controller_params;
  controller_params.dt_ = dt;
  controller_params.lambda_ = 1.0;
  controller_params.dynamics_rollout_dim_ = dim3(64, DYN_T::STATE_DIM, 1);
  controller_params.cost_rollout_dim_ = dim3(NUM_TIMESTEPS, 1, 1);
  std::shared_ptr<CONTROLLER_T> controller = std::make_shared<CONTROLLER_T>(
      dynamics.get(), cost.get(), fb_controller.get(), sampler.get(), controller_params);

  DYN_T::state_array x = dynamics->getZeroState();  // set up initial state
  // calculate control
  controller->computeControl(x, 1);
  auto control_sequence = controller->getControlSeq();
  std::cout << "Control Sequence:\n" << control_sequence << std::endl;
  return 0;
}

Listing 1: Minimal example used in a stand-alone fashion

If you just wanted to run a single iteration of MPPI in order to get an optimal control sequence, the code in Lst 1 provides a minimal working example.

#include <mppi/controllers/MPPI/mppi_controller.cuh>
#include <mppi/cost_functions/cartpole/cartpole_quadratic_cost.cuh>
#include <mppi/dynamics/cartpole/cartpole_dynamics.cuh>
#include <mppi/feedback_controllers/DDP/ddp.cuh>

Breaking it down, we start by including the controller, dynamics, cost function, and feedback controller headers used in this example. Again, by default, the feedback controller will not be used but it is required for the MPPI controller. In this example, we are using a cartpole system with a quadratic cost function to swing up the pole of the cart. The controller we will be using is the standard MPPI controller.

const int NUM_TIMESTEPS = 100;
const int NUM_ROLLOUTS = 2048;

Next, we define the number of timesteps $T$ and the number of samples, $M$, we will be using for this example.

using DYN_T = CartpoleDynamics;
using COST_T = CartpoleQuadraticCost;
using FB_T = DDPFeedback<DYN_T, NUM_TIMESTEPS>;
using SAMPLING_T = mppi::sampling_distributions::GaussianDistribution<DYN_T::DYN_PARAMS_T>;
using CONTROLLER_T = VanillaMPPIController<DYN_T, COST_T, FB_T, NUM_TIMESTEPS, NUM_ROLLOUTS, SAMPLING_T>;
using CONTROLLER_PARAMS_T = CONTROLLER_T::TEMPLATED_PARAMS;

We then set up aliases for each class type to make it easier to follow.

  float dt = 0.02;
  std::shared_ptr<DYN_T> dynamics = std::make_shared<DYN_T>();  // set up dynamics
  std::shared_ptr<COST_T> cost = std::make_shared<COST_T>();    // set up cost
  // set up feedback controller
  std::shared_ptr<FB_T> fb_controller = std::make_shared<FB_T>(dynamics.get(), dt);
  // set up sampling distribution
  SAMPLING_T::SAMPLING_PARAMS_T sampler_params;
  std::fill(sampler_params.std_dev, sampler_params.std_dev + DYN_T::CONTROL_DIM, 1.0);
  std::shared_ptr<SAMPLING_T> sampler = std::make_shared<SAMPLING_T>(sampler_params);

Then in the main method, we first set up the $\Delta t$ we would like to use followed by creating Dynamics, Cost, Feedback Controller, and Sampling Distribution variables. We could set up different versions of this problem by adjusting the parameters of the Dynamics or Cost Function but for now, we will leave them to the default. The Feedback Controller is setup with the dynamics and dt used. We then setup the Gaussian Sampling Distribution with a standard deviation of $1.0$ in each control dimension.

  // set up MPPI Controller
  CONTROLLER_PARAMS_T controller_params;
  controller_params.dt_ = dt;
  controller_params.lambda_ = 1.0;
  controller_params.dynamics_rollout_dim_ = dim3(64, DYN_T::STATE_DIM, 1);
  controller_params.cost_rollout_dim_ = dim3(NUM_TIMESTEPS, 1, 1);
  std::shared_ptr<CONTROLLER_T> controller = std::make_shared<CONTROLLER_T>(
      dynamics.get(), cost.get(), fb_controller.get(), sampler.get(), controller_params);

The final setup step is to create the MPPI controller. We first construct a parameters variable, controller_params, and ensure we set the $\Delta t$, $\lambda$, and parallelization parameters appropriately. From there, we can then construct the controller and pass in the dynamics, cost function, feedback controller, sampling distribution, and controller params.

  DYN_T::state_array x = dynamics->getZeroState();  // set up initial state
  // calculate control
  controller->computeControl(x, 1);
  auto control_sequence = controller->getControlSeq();
  std::cout << "Control Sequence:\n" << control_sequence << std::endl;

We create an initial state of the cartpole and then ask MPPI to compute an optimal control sequence starting from that state with computeControl(). The optimal control sequence is then returned as an Eigen::Matrix from getControLSequence() and printed out to the terminal.

MPC Usage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
#pragma once
#include <mppi/core/base_plant.hpp>
#include <mppi/dynamics/cartpole/cartpole_dynamics.cuh>

template <class CONTROLLER_T>
class SimpleCartpolePlant : public BasePlant<CONTROLLER_T>
{
public:
  using control_array = typename CartpoleDynamics::control_array;
  using state_array = typename CartpoleDynamics::state_array;
  using output_array = typename CartpoleDynamics::output_array;

  SimpleCartpolePlant(std::shared_ptr<CONTROLLER_T> controller, int hz, int optimization_stride)
    : BasePlant<CONTROLLER_T>(controller, hz, optimization_stride)
  {
    system_dynamics_ = std::make_shared<CartpoleDynamics>(1.0, 1.0, 1.0);
  }

  void pubControl(const control_array& u)
  {
    state_array state_derivative;
    output_array dynamics_output;
    state_array prev_state = current_state_;
    float t = this->state_time_;
    float dt = this->controller_->getDt();
    system_dynamics_->step(prev_state, current_state_, state_derivative, u, dynamics_output, t, dt);
    current_time_ += dt;
  }

  void pubNominalState(const state_array& s)
  {
  }

  void pubFreeEnergyStatistics(MPPIFreeEnergyStatistics& fe_stats)
  {
  }

  int checkStatus()
  {
    return 0;
  }

  double getCurrentTime()
  {
    return current_time_;
  }

  double getPoseTime()
  {
    return this->state_time_;
  }

  double getAvgLoopTime() const
  {
    return this->avg_loop_time_ms_;
  }

  double getLastOptimizationTime() const
  {
    return this->optimization_duration_;
  }

  state_array current_state_ = state_array::Zero();

protected:
  std::shared_ptr<CartpoleDynamics> system_dynamics_;
  double current_time_ = 0.0;
};

Listing 2: Basic Plant implementation that interacts with a virtual Cartpole Dynamics system stored internal to the Plant.

When using MPPI in a MPC fashion, we need to use a Plant wrapper around our controller. The Plant houses methods to obtain new data such as state, calculate the optimal control sequence at a given rate using the latest information available, and provide the latest control to the external system while providing the necessary tooling to ensure there are no race conditions. As this is the class that provides the interaction between the algorithm and the actual system, it is a component that has to be modified for every use case. For this example, we will implement a plant inheriting from BasePlant that houses the external system completely internal to the class. Specifically, we will write our plant to run the dynamics inside pubControl() in order to produce a new state. We shall then call updateState() at a different fixed rate from the controller re-planning rate to show that the capability of the code base.

In Lst. 2, we can see a simple implementation of a Plant. SimpleCartpolePlant instantiates a CartpoleDynamics variable upon creation, overwrites the required virtual methods from BasePlant, and sets up the dynamics update to occur within pubControl(). Looking at the constructor, we pass a shared pointer to a Controller, an integer representing the controller replanning rate, and the minimum timestep we want to adjust the control trajectory by when performing multiple optimal control calculations to the base Plant constructor, and then create our stand-in system dynamics. pubControl() is where we send the control to the system and so in this case, we create necessary extra variables and then pass the current state $\vb{x}_{t}$ and control $\vb{u}_t$ as prev_state and u respectively to the Dynamics’ step() method to get the next state, $\vb{x}_{t+1}$, in the variable current_state_. We also update the current time to show that system has moved forward in time. Looking at this class, a potential issue arises as it is templated on the controller which in turn might not use CartpoleDynamics as its Dynamics class. This can be easily remedied by replacing any reference to CarpoleDynamics with CONTROLLER_T::TEMPLATED_DYNAMICS to make this plant work with the Dynamics used by the instantiated Controller.

Now that we have written our specialized Plant class, we can then go back to the minimal example and make some modifications to use the controller in a MPC fashion, shown in Lst. 3. For this example, we will just run a simple for loop that calls the Plant’s runControlIteration() and updateState() methods to simulate a receiving a new state from the system and then calculating a new optimal control sequence from it. The updateState() method calls pubControl() internally so the system state and the current time will get updated as this runs. For real-time scenarios, there is also the runControlLoop() Plant method which can be launched in a separate thread and calls runControlIteration() internally at the specified replanning rate.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#include <mppi/controllers/MPPI/mppi_controller.cuh>
#include <mppi/cost_functions/cartpole/cartpole_quadratic_cost.cuh>
#include <mppi/dynamics/cartpole/cartpole_dynamics.cuh>
#include <mppi/feedback_controllers/DDP/ddp.cuh>
#include <mppi_paper_example/plants/cartpole_plant.hpp>

const int NUM_TIMESTEPS = 100;
const int NUM_ROLLOUTS = 2048;
const int DYN_BLOCK_X = 64;
using DYN_T = CartpoleDynamics;
const int DYN_BLOCK_Y = DYN_T::STATE_DIM;
using COST_T = CartpoleQuadraticCost;
using FB_T = DDPFeedback<DYN_T, NUM_TIMESTEPS>;
using SAMPLING_T = mppi::sampling_distributions::GaussianDistribution<DYN_T::DYN_PARAMS_T>;
using CONTROLLER_T = VanillaMPPIController<DYN_T, COST_T, FB_T, NUM_TIMESTEPS, NUM_ROLLOUTS, SAMPLING_T>;
using CONTROLLER_PARAMS_T = CONTROLLER_T::TEMPLATED_PARAMS;

using PLANT_T = SimpleCartpolePlant<CONTROLLER_T>;

int main(int argc, char** argv)
{
  float dt = 0.02;
  DYN_T dynamics;                     // set up dynamics
  COST_T cost;                        // set up cost
  FB_T fb_controller(&dynamics, dt);  // set up feedback controller
  // set up sampling distribution
  SAMPLING_T sampler;
  auto sampler_params = sampler.getParams();
  std::fill(sampler_params.std_dev, sampler_params.std_dev + DYN_T::CONTROL_DIM, 1.0);
  sampler.setParams(sampler_params);

  // set up MPPI Controller
  CONTROLLER_PARAMS_T controller_params;
  controller_params.dt_ = dt;
  controller_params.lambda_ = 1.0;
  controller_params.dynamics_rollout_dim_ = dim3(DYN_BLOCK_X, DYN_BLOCK_Y, 1);
  controller_params.cost_rollout_dim_ = dim3(96, 1, 1);
  std::shared_ptr<CONTROLLER_T> controller =
      std::make_shared<CONTROLLER_T>(&dynamics, &cost, &fb_controller, &sampler, controller_params);

  // Create plant
  PLANT_T plant(controller, (1.0 / dt), 1);

  std::atomic<bool> alive(true);
  for (int t = 0; t < 10000; t++)
  {
    plant.updateState(plant.current_state_, (t + 1) * dt);
    plant.runControlIteration(&alive);
  }

  std::cout << "Avg Optimization time: " << plant.getAvgOptimizationTime() << " ms" << std::endl;
  std::cout << "Last Optimization time: " << plant.getLastOptimizationTime() << " ms" << std::endl;
  std::cout << "Avg Loop time: " << plant.getAvgLoopTime() << " ms" << std::endl;
  std::cout << "Avg Optimization Hz: " << 1.0 / (plant.getAvgOptimizationTime() * 1e-3) << " Hz" << std::endl;

  auto control_sequence = controller->getControlSeq();
  std::cout << "State: \n" << plant.current_state_.transpose() << std::endl;
  std::cout << "Control Sequence:\n" << control_sequence << std::endl;
  return 0;
}

Listing 3: Minimal example used in a MPC fashion

References

[1] S. Macenski, T. Foote, B. Gerkey, C. Lalancette, and W. Woodall, “Robot Operating System 2: Design, architecture, and uses in the wild,” Science Robotics, vol. 7, no. 66, p. eabm6074, May 2022. [Online]. Available: https://www.science.org/doi/10.1126/scirobotics.abm6074

[2] J.-P. Laumond et al., Robot Motion Planning and Control. Springer, 1998, vol. 229.

[3] G. Williams, P. Drews, B. Goldfain, J. M. Rehg, and E. A. Theodorou, “Information-Theoretic Model Predictive Control: Theory and Applications to Autonomous Driving,” IEEE Transactions on Robotics, vol. 34, no. 6, pp. 1603-1622, 2018. [Online]. Available: https://ieeexplore.ieee.org/document/8558663

[4] J. Gibson, B. Vlahov, D. Fan, P. Spieler, D. Pastor, A.-a. Agha-mohammadi, and E. A. Theodorou, “A Multi-step Dynamics Modeling Framework For Autonomous Driving In Multiple Environments,” in 2023 IEEE International Conference on Robotics and Automation (ICRA). IEEE, May 2023, pp. 7959-7965. [Online]. Available: https://ieeexplore.ieee.org/document/10161330