shapley values -pg电子麻将胡了
shapley values
since r2021a
description
the shapley value of a feature for a query point explains the deviation of the prediction for the query point from the average prediction, due to the feature. for each query point, the sum of the shapley values for all features corresponds to the total deviation of the prediction from the average.
you can create a shapley object for a machine learning model with a
specified query point (querypoint). the software creates an object and
computes the shapley values of all features for the query point.
use the shapley values to explain the contribution of individual features to a prediction
at the specified query point. use the plot function to
create a bar graph of the shapley values. you can compute the shapley values for another query
point by using the fit
function.
creation
syntax
description
also computes the shapley values for the query point explainer = shapley(___,'querypoint',querypoint)querypoint and
stores the computed shapley values in the shapleyvalues
property of explainer. you can specify
querypoint in addition to any of the input argument combinations
in the previous syntaxes.
specifies additional options using one or more name-value arguments. for example,
specify explainer = shapley(___,name,value)'useparallel',true to compute shapley values in
parallel.
input arguments
blackbox — machine learning model to be interpreted
regression model object | classification model object | function handle
machine learning model to be interpreted, specified as a full or compact regression or classification model object or a function handle.
full or compact model object — you can specify a full or compact regression or classification model object, which has a
predictobject function. the software uses thepredictfunction to compute shapley values.if you specify a model object that does not contain predictor data (for example, a compact model), then you must provide the predictor data using
x.when you train a model, use a numeric matrix or table for the predictor data where rows correspond to individual observations.
regression model object
supported model full or compact regression model object ensemble of regression models , regressionbaggedensemble,gaussian kernel regression model using random feature expansion gaussian process regression regressiongp,generalized additive model , linear regression for high-dimensional data regressionlinearneural network regression model , regression tree regressiontree,compactregressiontreesupport vector machine regression regressionsvm,compactregressionsvmclassification model object
supported model full or compact classification model object discriminant analysis classifier , multiclass model for support vector machines or other classifiers classificationecoc,compactclassificationecocensemble of learners for classification , , gaussian kernel classification model using random feature expansion classificationkernelgeneralized additive model , k-nearest neighbor classifier classificationknnlinear classification model classificationlinearmulticlass naive bayes model , compactclassificationnaivebayesneural network classifier , support vector machine classifier for one-class and binary classification classificationsvm,compactclassificationsvmbinary decision tree for multiclass classification , compactclassificationtreefunction handle — you can specify a function handle that accepts predictor data and returns a column vector containing a prediction for each observation in the predictor data. the prediction is a predicted response for regression or a predicted score of a single class for classification. you must provide the predictor data using
x.
x — predictor data
numeric matrix | table
predictor data, specified as a numeric matrix or table. each row of
x corresponds to one observation, and each column corresponds
to one variable.
for a numeric matrix:
the variables that makes up the columns of
xmust have the same order as the predictor variables that trainedblackbox, stored inblackbox.x.if you trained
blackboxusing a table, thenxcan be a numeric matrix if the table contains all numeric predictor variables.
for a table:
if you trained
blackboxusing a table (for example,tbl), then all predictor variables inxmust have the same variable names and data types as those intbl. however, the column order ofxdoes not need to correspond to the column order oftbl.if you trained
blackboxusing a numeric matrix, then the predictor names inblackbox.predictornamesand the corresponding predictor variable names inxmust be the same. to specify predictor names during training, use thepredictornamesname-value argument. all predictor variables inxmust be numeric vectors.xcan contain additional variables (response variables, observation weights, and so on), butshapleyignores them.shapleydoes not support multicolumn variables or cell arrays other than cell arrays of character vectors.
if blackbox is a model object that does not contain predictor
data or a function handle, you must provide x. if
blackbox is a full machine learning model object and you
specify this argument, then shapley does not use the predictor
data in blackbox; it uses the specified predictor data
only.
data types: single | double
querypoint — query point
row vector of numeric values | single-row table
query point at which shapley explains a prediction, specified
as a row vector of numeric values or a single-row table.
for a row vector of numeric values:
for a single-row table:
if you trained
blackboxusing a table (for example,tbl), then all predictor variables inquerypointmust have the same variable names and data types as those intbl. however, the column order ofquerypointdoes not need to correspond to the column order oftbl.if you trained
blackboxusing a numeric matrix, then the predictor names inblackbox.predictornamesand the corresponding predictor variable names inquerypointmust be the same. to specify predictor names during training, use thepredictornamesname-value argument. all predictor variables inquerypointmust be numeric vectors.querypointcan contain additional variables (response variables, observation weights, and so on), butshapleyignores them.shapleydoes not support multicolumn variables or cell arrays other than cell arrays of character vectors.
if querypoint contains nans for continuous
predictors and 'method' is
'conditional', then the shapley values (shapleyvalues) in the returned object are nans.
otherwise, shapley handles nans in
querypoint in the same way as blackbox
(the predict object function of blackbox or
the function handle specified by blackbox).
example: blackbox.x(1,:) specifies the query point as the first
observation of the predictor data in the full machine learning model
blackbox.
data types: single | double | table
specify optional pairs of arguments as
name1=value1,...,namen=valuen, where name is
the argument name and value is the corresponding value.
name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
before r2021a, use commas to separate each name and value, and enclose
name in quotes.
example: shapley(blackbox,'querypoint',q,'method','conditional')
creates a shapley object and computes the shapley values for the query
point q using the extension to the kernel shap
algorithm.
categoricalpredictors — categorical predictors list
vector of positive integers | logical vector | character matrix | string array | cell array of character vectors | 'all'
categorical predictors list, specified as one of the values in this table.
| value | description |
|---|---|
| vector of positive integers | each entry in the vector is an index value indicating that the corresponding predictor
is categorical. the index values are between 1 and
if
|
| logical vector | a |
| character matrix | each row of the matrix is the name of a predictor variable. the names must match the variable names of the predictor data in the form of a table. pad the names with extra blanks so each row of the character matrix has the same length. |
| string array or cell array of character vectors | each element in the array is the name of a predictor variable. the names must match the variable names of the predictor data in the form of a table. |
'all' | all predictors are categorical. |
if you specify
blackboxas a function handle, thenshapleyidentifies categorical predictors from the predictor datax. if the predictor data is in a table,shapleyassumes that a variable is categorical if it is a logical vector, unordered categorical vector, character array, string array, or cell array of character vectors. if the predictor data is a matrix,shapleyassumes that all predictors are continuous. to identify any other predictors as categorical predictors, specify them by using thecategoricalpredictorsname-value argument.if you specify
blackboxas a regression or classification model object, thenshapleyidentifies categorical predictors by using thecategoricalpredictorsproperty of the model object.
shapley supports an ordered categorical predictor when
blackbox supports ordered categorical predictors and you
specify 'method'
as 'interventional'.
example: 'categoricalpredictors','all'
data types: single | double | logical | char | string | cell
maxnumsubsets — maximum number of predictor subsets
min(2^m,1024) where m is
the number of predictors (default) | positive integer
maximum number of predictor subsets to use for shapley value computation, specified as a positive integer.
for details on how shapley chooses the subsets to use,
see computational cost.
this argument is valid only when shapley uses the kernel shap
algorithm or the extension to the kernel shap algorithm. if you set the
maxnumsubsets argument when method is
'interventional', the software uses the kernel shap algorithm.
for more information, see algorithms.
example: 'maxnumsubsets',100
data types: single | double
method — shapley value computation algorithm
'interventional' (default) | 'conditional'
since r2023a
shapley value computation algorithm, specified as
'interventional' or 'conditional'.
'interventional'(default) —shapleycomputes the shapley values with an interventional value function.shapleyoffers three interventional algorithms: kernel shap [1], linear shap [1], and tree shap [2]. the software selects an algorithm based on the machine learning modelblackboxand other specified options. for details, see interventional algorithms.'conditional'—shapleyuses the extension to the kernel shap algorithm [3] with a conditional value function.
the method
property stores the name of the selected algorithm. for more information, see algorithms.
before r2023a: you can specify this argument as
'interventional-kernel' or
'conditional-kernel'. shapley supports
the kernel shap algorithm and the extension of the kernel shap algorithm.
example: 'method','conditional'
data types: char | string
useparallel — flag to run in parallel
false (default) | true
flag to run in parallel, specified as true
or false. if you specify "useparallel",true, the
shapley function executes for-loop iterations by
using . the loop runs in parallel when you
have parallel computing toolbox™.
this argument is valid only when shapley uses the tree shap
algorithm for an ensemble of trees, the kernel shap algorithm, or the extension to
the kernel shap algorithm.
example: 'useparallel',true
data types: logical
properties
blackboxmodel — machine learning model to be interpreted
regression model object | classification model object | function handle
this property is read-only.
machine learning model to be interpreted, specified as a regression or classification model object or a function handle.
the blackbox
argument sets this property.
blackboxfitted — prediction for query point computed by machine learning model
scalar
this property is read-only.
prediction for the query point computed by the machine learning model (blackboxmodel), specified as a scalar.
if
blackboxmodelis a model object, thenblackboxfittedis a predicted response for regression or a classified label for classification.if
blackboxmodelis a function handle, thenblackboxfittedis a value returned by the function handle, either a predicted response for regression or a predicted score of a single class for classification.
categoricalpredictors — categorical predictor indices
vector of positive integers | []
this property is read-only.
categorical predictor
indices, specified as a vector of positive integers. categoricalpredictors
contains index values indicating that the corresponding predictors are categorical. the index
values are between 1 and p, where p is the number of
predictors used to train the model. if none of the predictors are categorical, then this
property is empty ([]).
if you specify
blackboxusing a function handle, thenshapleyidentifies categorical predictors from the predictor datax. if you specify thecategoricalpredictorsname-value argument, then the argument sets this property.if you specify
blackboxas a regression or classification model object, thenshapleydetermines this property by using thecategoricalpredictorsproperty of the model object.
shapley supports an ordered categorical predictor when
blackbox supports ordered categorical predictors and when you
specify 'method' as
'interventional'.
intercept — average prediction
numeric vector | numeric scalar
average prediction, averaged over the predictor data x,
specified as a numeric vector or numeric scalar.
if
blackboxmodelis a classification model object, theninterceptis a vector of the average classification scores for each class.if
blackboxmodelis a regression model object, theninterceptis a scalar of the average response.if
blackboxmodelis a function handle, theninterceptis a scalar of the average function evaluation.
for a query point, the sum of the shapley values for all features corresponds to the
total deviation of the prediction from the average
(intercept).
method — shapley value computation algorithm
'interventional-linear' | 'interventional-tree' | 'interventional-kernel' | 'conditional-kernel'
this property is read-only.
shapley value computation algorithm, specified as
'interventional-linear', 'interventional-tree',
'interventional-kernel', or
'conditional-kernel'.
'interventional-linear'—shapleyuses the linear shap algorithm [1] with an interventional value function. that is,shapleycomputes interventional shapley values using the estimated coefficients for linear models.'interventional-tree'—shapleyuses the tree shap algorithm [2] with an interventional value function.'interventional-kernel'—shapleyuses the kernel shap algorithm [1] with an interventional value function.'conditional-kernel'—shapleyuses the extension to the kernel shap algorithm [3] with a conditional value function.
the method
argument of shapley or the method
argument of fit sets this property.
for more information, see algorithms.
numsubsets — number of predictor subsets
positive integer
this property is read-only.
number of predictor subsets to use for shapley value computation, specified as a positive integer.
the maxnumsubsets
argument of shapley or the maxnumsubsets
argument of fit sets this property.
for details on how shapley chooses the subsets to use, see
computational cost.
querypoint — query point
row vector of numeric values | single-row table
this property is read-only.
query point at which shapley explains a prediction using the
shapley values (shapleyvalues), specified as a row vector of numeric values or single-row
table.
the querypoint
argument of shapley or the querypoint
argument of fit sets this property.
shapleyvalues — shapley values for query point
table
this property is read-only.
shapley values for the query point (querypoint),
specified as a table.
for regression, the table has two columns. the first column contains the predictor variable names, and the second column contains the shapley values of the predictors.
for classification, the table has two or more columns, depending on the number of classes in
blackboxmodel. the first column contains the predictor variable names, and the rest of the columns contain the shapley values of the predictors for each class.
x — predictor data
numeric matrix | table
this property is read-only.
predictor data, specified as a numeric matrix or table.
each row of x corresponds to one observation, and each column
corresponds to one variable.
if an observation contains nans for continuous predictors and
method is
'conditional-kernel', then shapley does not use
the observation for the shapley value computation. otherwise,
shapley handles nans in x
in the same way as blackboxmodel (the predict
object function of blackboxmodel or the function handle specified
by blackboxmodel).
shapley stores all observations, including the rows with missing
values, in this property.
examples
compute shapley values when creating shapley object
train a classification model and create a shapley object. when you create a shapley object, specify a query point so that the software computes the shapley values for the query point. then create a bar graph of the shapley values by using the object function plot.
load the creditrating_historical data set. the data set contains customer ids and their financial ratios, industry labels, and credit ratings.
tbl = readtable('creditrating_historical.dat');display the first three rows of the table.
head(tbl,3)
id wc_ta re_ta ebit_ta mve_bvtd s_ta industry rating
_____ _____ _____ _______ ________ _____ ________ ______
62394 0.013 0.104 0.036 0.447 0.142 3 {'bb'}
48608 0.232 0.335 0.062 1.969 0.281 8 {'a' }
42444 0.311 0.367 0.074 1.935 0.366 1 {'a' }
train a blackbox model of credit ratings by using the fitcecoc function. use the variables from the second through seventh columns in tbl as the predictor variables. a recommended practice is to specify the class names to set the order the classes.
blackbox = fitcecoc(tbl,'rating', ... 'predictornames',tbl.properties.variablenames(2:7), ... 'categoricalpredictors','industry', ... 'classnames',{'aaa' 'aa' 'a' 'bbb' 'bb' 'b' 'ccc'});
create a shapley object that explains the prediction for the last observation. specify a query point so that the software computes shapley values and stores them in the shapleyvalues property.
querypoint = tbl(end,:)
querypoint=1×8 table
id wc_ta re_ta ebit_ta mve_bvtd s_ta industry rating
_____ _____ _____ _______ ________ ____ ________ ______
73104 0.239 0.463 0.065 2.924 0.34 2 {'aa'}
explainer = shapley(blackbox,'querypoint',querypoint)warning: computation can be slow because the predictor data has over 1000 observations. use a smaller sample of the training set or specify 'useparallel' as true for faster computation.
explainer =
shapley with properties:
blackboxmodel: [1x1 classificationecoc]
querypoint: [1x8 table]
blackboxfitted: {'aa'}
shapleyvalues: [6x8 table]
numsubsets: 64
x: [3932x6 table]
categoricalpredictors: 6
method: 'interventional-kernel'
intercept: [-1.7642 -1.3677 -1.0980 -1.0645 -1.4758 -2.1268 -2.3909]
as the warning message indicates, the computation can be slow because the predictor data has over 1000 observations. for faster computation, use a smaller sample of the training set or specify 'useparallel' as true.
for a classification model, shapley computes shapley values using the predicted class score for each class. display the values in the shapleyvalues property.
explainer.shapleyvalues
ans=6×8 table
predictor aaa aa a bbb bb b ccc
__________ _________ __________ ___________ __________ ___________ __________ __________
"wc_ta" 0.051507 0.022531 0.0093463 0.0017109 -0.027655 -0.041443 -0.039882
"re_ta" 0.16772 0.094211 0.051629 -0.011019 -0.087919 -0.20974 -0.29463
"ebit_ta" 0.0011995 0.00052588 0.00041919 0.00011866 -0.00066237 -0.0013347 -0.0011824
"mve_bvtd" 1.3417 1.3082 0.61472 -0.11247 -0.6555 -0.86908 -0.68547
"s_ta" -0.013059 -0.0091049 -0.00031099 -0.0028624 -0.00019227 0.0016759 -0.0024149
"industry" -0.10142 -0.048668 0.0036522 0.081542 0.092657 0.10464 0.15888
the shapleyvalues property contains the shapley values of all features for each class.
plot the shapley values for the predicted class by using the plot function.
plot(explainer)

the horizontal bar graph shows the shapley values for all variables, sorted by their absolute values. each shapley value explains the deviation of the score for the query point from the average score of the predicted class, due to the corresponding variable.
create shapley object and compute shapley values using fit
train a regression model and create a shapley object. when you create a shapley object, if you do not specify a query point, then the software does not compute shapley values. use the object function fit to compute the shapley values for the specified query point. then create a bar graph of the shapley values by using the object function plot.
load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s.
load carbigcreate a table containing the predictor variables acceleration, cylinders, and so on, as well as the response variable mpg.
tbl = table(acceleration,cylinders,displacement,horsepower,model_year,weight,mpg);
removing missing values in a training set can help reduce memory consumption and speed up training for the fitrkernel function. remove missing values in tbl.
tbl = rmmissing(tbl);
train a blackbox model of mpg by using the function
rng('default') % for reproducibility mdl = fitrkernel(tbl,'mpg','categoricalpredictors',[2 5]);
create a shapley object. specify the data set tbl, because mdl does not contain training data.
explainer = shapley(mdl,tbl)
explainer =
shapley with properties:
blackboxmodel: [1x1 regressionkernel]
querypoint: []
blackboxfitted: []
shapleyvalues: []
numsubsets: 64
x: [392x7 table]
categoricalpredictors: [2 5]
method: 'interventional-kernel'
intercept: 22.6202
explainer stores the training data tbl in the x property.
compute the shapley values of all predictor variables for the first observation in tbl.
querypoint = tbl(1,:)
querypoint=1×7 table
acceleration cylinders displacement horsepower model_year weight mpg
____________ _________ ____________ __________ __________ ______ ___
12 8 307 130 70 3504 18
explainer = fit(explainer,querypoint);
for a regression model, shapley computes shapley values using the predicted response, and stores them in the shapleyvalues property. display the values in the shapleyvalues property.
explainer.shapleyvalues
ans=6×2 table
predictor shapleyvalue
______________ ____________
"acceleration" -0.1561
"cylinders" -0.18306
"displacement" -0.34203
"horsepower" -0.27291
"model_year" -0.2926
"weight" -0.32402
plot the shapley values for the query point by using the plot function.
plot(explainer)

the horizontal bar graph shows the shapley values for all variables, sorted by their absolute values. each shapley value explains the deviation of the prediction for the query point from the average, due to the corresponding variable.
specify blackbox model using function handle
train a regression model and create a shapley object using a function handle to the predict function of the model. use the object function fit to compute the shapley values for the specified query point. then plot the shapley values by using the object function plot.
load the carbig data set, which contains measurements of cars made in the 1970s and early 1980s.
load carbigcreate a table containing the predictor variables acceleration, cylinders, and so on.
tbl = table(acceleration,cylinders,displacement,horsepower,model_year,weight);
train a blackbox model of mpg by using the treebagger function.
rng('default') % for reproducibility mdl = treebagger(100,tbl,mpg,'method','regression','categoricalpredictors',[2 5]);
shapley does not support a treebagger object directly, so you cannot specify the first input argument (blackbox model) of shapley as a treebagger object. instead, you can use a function handle to the predict function. you can also specify options of the predict function using name-value arguments of the function.
create the function handle to the predict function of the treebagger object mdl. specify the array of tree indices to use as 1:50.
f = @(tbl) predict(mdl,tbl,'trees',1:50);create a shapley object using the function handle f. when you specify a blackbox model as a function handle, you must provide the predictor data. tbl includes categorical predictors (cylinder and model_year) with the double data type. by default, shapley does not treat variables with the double data type as categorical predictors. specify the second (cylinder) and fifth (model_year) variables as categorical predictors.
explainer = shapley(f,tbl,'categoricalpredictors',[2 5]);
explainer = fit(explainer,tbl(1,:));plot the shapley values.
plot(explainer)

more about
shapley values
in game theory, the shapley value of a player is the average marginal contribution of the player in a cooperative game. in the context of machine learning prediction, the shapley value of a feature for a query point explains the contribution of the feature to a prediction (response for regression or score of each class for classification) at the specified query point.
the shapley value of a feature for a query point is the contribution of the feature to the deviation from the average prediction. for a query point, the sum of the shapley values for all features corresponds to the total deviation of the prediction from the average. that is, the sum of the average prediction and the shapley values for all features corresponds to the prediction for the query point.
for more details, see shapley values for machine learning model.
references
[1] lundberg, scott m., and s. lee. "a unified approach to interpreting model predictions." advances in neural information processing systems 30 (2017): 4765–774.
[2] lundberg, scott m., g. erion, h. chen, et al. "from local explanations to global understanding with explainable ai for trees." nature machine intelligence 2 (january 2020): 56–67.
[3] aas, kjersti, martin jullum, and anders løland. "explaining individual predictions when features are dependent: more accurate approximations to shapley values." artificial intelligence 298 (september 2021).
extended capabilities
automatic parallel support
accelerate code by automatically running computation in parallel using parallel computing toolbox™.
to run in parallel, set the useparallel name-value argument to
true in the call to this function.
for more general information about parallel computing, see (parallel computing toolbox).
version history
introduced in r2021ar2023a: shapley supports the linear shap and tree shap algorithms
shapley supports the linear shap [1] algorithm for linear models and the tree
shap [2] algorithm for tree models and ensemble
models of tree learners.
if you specify the method name-value
argument as 'interventional' (default), shapley selects
an algorithm based on the machine learning model type of blackbox. the
method property
stores the name of the selected algorithm.
r2023a: values of the method name-value argument have changed
the supported values of the method name-value
argument have changed from 'interventional-kernel' and
'conditional-kernel' to 'interventional' and
'conditional', respectively.
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.