Work with EDO
Physical systems can typically be modeled using differential equations or equations that include derivatives. Forces, hence Newton's laws, can be expressed as derivatives, as can Maxwell's equations, so differential equations can describe most physics problems. A differential equation describes how a system changes as a function of the system's current state, effectively defining state transition. Systems of differential equations can be written in matrix/vector form:
where x is the state vector, A is the state transition matrix determined from physical dynamics and x point (or dx/dt) is the change of state with a change in time. Essentially, matrix A acts on state x to advance a small step in time. This formulation is typically used for linear equations (where the elements of A do not contain any state vectors), but can be used for nonlinear equations where the elements of A may have state vectors that can lead to the complex behavior described above. This equation describes how an environment or system develops over time, starting from a particular initial condition. In mathematics, these are known as initial value problems, since evaluating how the system will develop requires the specification of an initial state.
The above expression describes a particular class of differential equations, ordinary differential equations (ODE), where the derivatives are all of one variable, usually time but occasionally space. The dot denotes dx/dt, or change of state with incremental change over time. ODEs are well studied and linear ODE systems have a wide range of analytical solution approaches available. Analytical solutions allow solutions to be expressed in terms of variables, making them more flexible to explore the entire behavior of the system. Nonlinear ones have fewer approaches, but certain classes of systems have analytical solutions available. However, for the most part, nonlinear (and some linear) ODEs are best solved by simulation, where the solution is determined as numerical values at each time step.
Simulation is based on finding an approximation to the differential equation, often by transformation to an algebraic equationwhich has a known precision over a small change in time. Computers can then make many small changes over time to show how the system develops. There are many algorithms available to calculate this will, such as Matlab's ODE45 functions or Python SciPy's solve_ivp. These algorithms take an ODE and a starting point/initial condition, automatically determine the optimal step size, and proceed through the system until the specified completion time.
If we can apply the correct control inputs to an ODE system, we can often get it to the desired state. As discussed last time, RL provides an approach to determining the correct inputs for nonlinear systems. To develop RL, we will again use the gym environment, but this time we will create a custom gym environment based on our own ODE. Following Gym documentationwe create an observation space that will cover our state space and an action space for the control space. We initialize/reset the gym to an arbitrary point within the state space (although we must be cautious here, not all desired final states They are always accessible from any initial state for some systems). In the gym step function, we step over a short time horizon in our ODE by applying the estimated input of the algorithm using Python SciPy's solve_ivp function. Solve_ivp calls a function that contains the particular ODE we are working with. The code is available at git. The startup and restart functions are simple; init creates an observation space for each system state and reset sets a random starting point for each of those variables within the domain at a minimum distance from the origin. In the step function, notice the solve_ivp line that calls the actual dynamics, solving the dynamics ODE in a short time step, passing the applied control K.
#taken from https://www.gymlibrary.dev/content/environment_creation/
#create gym for Moore-Greitzer Model
#action space: continuous +/- 10.0 float , maybe make scale to mu
#observation space: -30,30 x2 float for x,y,zand
#reward: -1*(x^2+y^2+z^2)^1/2 (try to drive to 0)#Moore-Grietzer model:
from os import path
from typing import Optional
import numpy as np
import math
import scipy
from scipy.integrate import solve_ivp
import gymnasium as gym
from gymnasium import spaces
from gymnasium.envs.classic_control import utils
from gymnasium.error import DependencyNotInstalled
import dynamics #local library containing formulas for solve_ivp
from dynamics import MGM
class MGMEnv(gym.Env):
#no render modes
def __init__(self, render_mode=None, size=30):
self.observation_space =spaces.Box(low=-size+1, high=size-1, shape=(2,), dtype=float)
self.action_space = spaces.Box(-10, 10, shape=(1,), dtype=float)
#need to update action to normal distribution
def _get_obs(self):
return self.state
def reset(self, seed: Optional(int) = None, options=None):
#need below to seed self.np_random
super().reset(seed=seed)
#start random x1, x2 origin
np.random.seed(seed)
x=np.random.uniform(-8.,8.)
while (x>-2.5 and x<2.5):
np.random.seed()
x=np.random.uniform(-8.,8.)
np.random.seed(seed)
y=np.random.uniform(-8.,8.)
while (y>-2.5 and y<2.5):
np.random.seed()
y=np.random.uniform(-8.,8.)
self.state = np.array((x,y))
observation = self._get_obs()
return observation, {}
def step(self,action):
u=action.item()
result=solve_ivp(MGM, (0, 0.05), self.state, args=(u))
x1=result.y(0,-1)
x2=result.y(1,-1)
self.state=np.array((x1.item(),x2.item()))
done=False
observation=self._get_obs()
info=x1
reward = -math.sqrt(x1.item()**2)#+x2.item()**2)
truncated = False #placeholder for future expnasion/limits if solution diverges
info = x1
return observation, reward, done, truncated, {}
The dynamics of the Moore-Greitzer mode (MGM) function are shown below. This implementation is based on the solve_ivp documentation. Limits are placed to avoid the divergence of solutions; If the system reaches the limits, the reward will be low, causing the algorithm to revise the control approach. Creating ODE gyms based on the template discussed here should be simple: resize the observation space to match the dimensions of the ODE system and update the dynamic equation as necessary.
def MGM(t, A, K):
#non-linear approximation of surge/stall dynamics of a gas turbine engine per Moore-Greitzer model from
#"Output-Feedbak Cotnrol on Nonlinear systems using Control Contraction Metrics and Convex Optimization"
#by Machester and Slotine
#2D system, x1 is mass flow, x2 is pressure increase
x1, x2 = A
if x1>20: x1=20.
elif x1<-20: x1=-20.
if x2>20: x2=20.
elif x2<-20: x2=-20.
dx1= -x2-1.5*x1**2-0.5*x1**3
dx2=x1+K
return np.array((dx1, dx2))
For this example, we are using an ODE based on the Moore-Greitzer model (MGM) that describes the surge and stall dynamics of a gas turbine engine¹. This equation describes coupled damped oscillations between mass flow and engine pressure. The goal of the controller is to quickly damp oscillations to 0 by controlling the pressure in the motor. MGM has “motivated substantial development of nonlinear control design,” making it an interesting test case for the SAC and GP approaches. The code that describes the equation can be found at GitHub. Three other nonlinear ODEs are also listed. The Van Der Pol oscillator is a classical nonlinear oscillating system based on electronic system dynamics. The Lorenz attractor is a seemingly simple ODE system that can produce either chaotic behavior or results highly sensitive to initial conditions, such that any infinitely small difference in the starting point will lead, in an uncontrolled system, to a widely varying state. divergent. The third is a mean field ODE system provided by Duriez/Brunton/Noack that describes the development of complex interactions of stable and unstable waves as an approximation to turbulent fluid flow.
To avoid repeating the analysis of the last article, we simply present the results here, noting that again the GP approach produced a better controller in lower computational time than the SAC/neural network approach. The following figures show the oscillations of an uncontrolled system, under the GP controller and under the SAC controller.
Both algorithms improve uncontrolled dynamics. We see that although the SAC controller acts faster (in approximately 20 time steps), its precision is low. The GP controller takes a little longer to act, but provides smooth behavior in both states. Furthermore, as before, GP converged in fewer iterations than SAC.
We have seen that gyms can be easily adopted to enable training of RL algorithms in ODE systems, we briefly discussed how powerful ODEs can be in describing and thus exploring RL control of physical dynamics, and we have seen again that GP produces better results. However, we have not yet attempted to optimize any of the algorithms, but have simply set up, essentially, an assumption of the basic parameters of the algorithm. We'll address that shortcoming now by expanding on the MGM study.