this example shows how to generate a reinforcement learning reward function from a simulink design optimization model verification block.
for this example, open the simulink model levelcheckblock.slx, which contains a check step response characteristics block named level check.
generate the reward function code from specifications in the level check block, using generaterewardfunction. the code is displayed in the matlab editor.
generaterewardfunction("levelcheckblock/level check")
for this example, the code is saved in the matlab function file myblockrewardfcn.m.
display the generated reward function.
function reward = myblockrewardfcn(x,t)
% myblockrewardfcn generates rewards from simulink block specifications.
%
% x : input of levelcheckblock/level check
% t : simulation time (s)
% reinforcement learning toolbox
% 27-may-2021 16:45:27
%#codegen
%% specifications from levelcheckblock/level check
block1_initialvalue = 1;
block1_finalvalue = 2;
block1_steptime = 0;
block1_steprange = block1_finalvalue - block1_initialvalue;
block1_minrise = block1_initialvalue block1_steprange * 80/100;
block1_maxsettling = block1_initialvalue block1_steprange * (1 2/100);
block1_minsettling = block1_initialvalue block1_steprange * (1-2/100);
block1_maxovershoot = block1_initialvalue block1_steprange * (1 10/100);
block1_minundershoot = block1_initialvalue - block1_steprange * 5/100;
if t >= block1_steptime
if block1_initialvalue <= block1_finalvalue
block1_upperboundtimes = [0,5; 5,max(5 1,t 1)];
block1_upperboundamplitudes = [block1_maxovershoot
block1_maxovershoot;
block1_maxsettling
block1_maxsettling];
block1_lowerboundtimes = [0,2; 2,5; 5,max(5 1,t 1)];
block1_lowerboundamplitudes = [block1_minundershoot
block1_minundershoot;
block1_minrise
block1_minrise;
block1_minsettling
block1_minsettling];
else
block1_upperboundtimes = [0,2; 2,5; 5,max(5 1,t 1)];
block1_upperboundamplitudes = [block1_minundershoot
block1_minundershoot;
block1_minrise,block1_minrise;
block1_minsettling
block1_minsettling];
block1_lowerboundtimes = [0,5; 5,max(5 1,t 1)];
block1_lowerboundamplitudes = [block1_maxovershoot
block1_maxovershoot;
block1_maxsettling
block1_maxsettling];
end
block1_xmax = zeros(1,size(block1_upperboundtimes,1));
for idx = 1:numel(block1_xmax)
tseg = block1_upperboundtimes(idx,:);
xseg = block1_upperboundamplitudes(idx,:);
block1_xmax(idx) = interp1(tseg,xseg,t,'linear',nan);
end
if all(isnan(block1_xmax))
block1_xmax = inf;
else
block1_xmax = max(block1_xmax,[],'omitnan');
end
block1_xmin = zeros(1,size(block1_lowerboundtimes,1));
for idx = 1:numel(block1_xmin)
tseg = block1_lowerboundtimes(idx,:);
xseg = block1_lowerboundamplitudes(idx,:);
block1_xmin(idx) = interp1(tseg,xseg,t,'linear',nan);
end
if all(isnan(block1_xmin))
block1_xmin = -inf;
else
block1_xmin = max(block1_xmin,[],'omitnan');
end
else
block1_xmin = -inf;
block1_xmax = inf;
end
%% penalty function weight (specify nonnegative)
weight = 1;
%% compute penalty
% penalty is computed for violation of linear bound constraints.
%
% to compute exterior bound penalty, use the exteriorpenalty function and
% specify the penalty method as 'step' or 'quadratic'.
%
% alternaltively, use the hyperbolicpenalty or barrierpenalty function for
% computing hyperbolic and barrier penalties.
%
% for more information, see help for these functions.
penalty = sum(exteriorpenalty(x,block1_xmin,block1_xmax,'step'));
%% compute reward
reward = -weight * penalty;
end
the generated reward function takes as input arguments the current value of the verification block input signals and the simulation time. a negative reward is calculated using a weighted penalty that acts whenever the current block input signals violate the linear bound constraints defined in the verification block.
the generated reward function is a starting point for reward design. you can tune the weights or use a different penalty function to define a more appropriate reward for your reinforcement learning agent.