create a two-pg电子麻将胡了
create a two-dimensional grid world for reinforcement learning
since r2019a
description
examples
create grid world environment
for this example, consider a 5-by-5 grid world with the following rules:
a 5-by-5 grid world bounded by borders, with 4 possible actions (north = 1, south = 2, east = 3, west = 4).
the agent begins from cell [2,1] (second row, first column).
the agent receives reward 10 if it reaches the terminal state at cell [5,5] (blue).
the environment contains a special jump from cell [2,4] to cell [4,4] with 5 reward.
the agent is blocked by obstacles in cells [3,3], [3,4], [3,5] and [4,3] (black cells).
all other actions result in -1 reward.

first, create a gridworld object using the creategridworld function.
gw = creategridworld(5,5)
gw =
gridworld with properties:
gridsize: [5 5]
currentstate: "[1,1]"
states: [25x1 string]
actions: [4x1 string]
t: [25x25x4 double]
r: [25x25x4 double]
obstaclestates: [0x1 string]
terminalstates: [0x1 string]
probabilitytolerance: 8.8818e-16
now, set the initial, terminal and obstacle states.
gw.currentstate = '[2,1]'; gw.terminalstates = '[5,5]'; gw.obstaclestates = ["[3,3]";"[3,4]";"[3,5]";"[4,3]"];
update the state transition matrix for the obstacle states and set the jump rule over the obstacle states.
updatestatetranstionforobstacles(gw) gw.t(state2idx(gw,"[2,4]"),:,:) = 0; gw.t(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 1;
next, define the rewards in the reward transition matrix.
ns = numel(gw.states); na = numel(gw.actions); gw.r = -1*ones(ns,ns,na); gw.r(state2idx(gw,"[2,4]"),state2idx(gw,"[4,4]"),:) = 5; gw.r(:,state2idx(gw,gw.terminalstates),:) = 10;
now, use rlmdpenv to create a grid world environment using the gridworld object gw.
env = rlmdpenv(gw)
env =
rlmdpenv with properties:
model: [1x1 rl.env.gridworld]
resetfcn: []
you can visualize the grid world environment using the plot function.
plot(env)

input arguments
m — number of rows of the grid world
scalar
number of rows of the grid world, specified as a scalar.
n — number of columns of the grid world
scalar
number of columns of the grid world, specified as a scalar.
moves — action names
'standard' (default) | 'kings'
action names, specified as either 'standard' or
'kings'. when moves is set to
'standard', the actions are['n';'s';'e';'w'].'kings', the actions are['n';'s';'e';'w';'ne';'nw';'se';'sw'].
output arguments
gw — two-dimensional grid world
gridworld object
two-dimensional grid world, returned as a gridworld object with
properties listed below. for more information, see create custom grid world environments.
gridsize — size of the grid world
[m,n] vector
size of the grid world, specified as a [m,n] vector.
currentstate — name of the current state
string
name of the current state, specified as a string.
actions — action names
string vector
action names, specified as a string vector. the length of the
actions vector is determined by the
moves argument.
actions is a string vector of length:
four, if
movesis specified as'standard'.eight,
movesis specified as'kings'.
t — state transition matrix
3d array
state transition matrix, specified as a 3-d array, which determines the
possible movements of the agent in an environment. state transition matrix
t is a probability matrix that indicates how likely the agent
will move from the current state s to any possible next state
s' by performing action a.
t is given by,
t is:
a
k-by-k-by-4 array, ifmovesis specified as'standard'. here,k=m*n.a
k-by-k-by-8 array, ifmovesis specified as'kings'.
r — reward transition matrix
3d array
reward transition matrix, specified as a 3-d array, determines how much reward
the agent receives after performing an action in the environment.
r has the same shape and size as state transition matrix
t. reward transition matrix r is given by,
r is:
a
k-by-k-by-4 array, ifmovesis specified as'standard'. here,k=m*n.a
k-by-k-by-8 array, ifmovesis specified as'kings'.
obstaclestates — state names that cannot be reached in the grid world
string vector
state names that cannot be reached in the grid world, specified as a string vector.
terminalstates — terminal state names in the grid world
string vector
terminal state names in the grid world, specified as a string vector.
version history
introduced in r2019a
see also
functions
objects
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.