yamnet neural network -pg电子麻将胡了
yamnet neural network
since r2020b
syntax
description
examples
download yamnet
this example uses:
download and unzip the audio toolbox™ model for yamnet.
type yamnet at the command window. if the audio toolbox model for yamnet is not installed, then the function provides a link to the location of the network weights. to download the model, click the link. unzip the file to a location on the matlab path.
alternatively, execute the following commands to download and unzip the yamnet model to your temporary directory.
downloadfolder = fullfile(tempdir,'yamnetdownload'); loc = websave(downloadfolder,'https://ssd.mathworks.com/supportfiles/audio/yamnet.zip'); yamnetlocation = tempdir; unzip(loc,yamnetlocation) addpath(fullfile(yamnetlocation,'yamnet'))
check that the installation is successful by typing yamnet at the command window. if the network is installed, then the function returns a (deep learning toolbox) object.
yamnet
ans =
seriesnetwork with properties:
layers: [86×1 nnet.cnn.layer.layer]
inputnames: {'input_1'}
outputnames: {'sound'}
load pretrained yamnet
this example uses:
load a pretrained yamnet convolutional neural network and examine the layers and classes.
use yamnet to load the pretrained yamnet network. the output net is a (deep learning toolbox) object.
net = yamnet
net =
seriesnetwork with properties:
layers: [86×1 nnet.cnn.layer.layer]
inputnames: {'input_1'}
outputnames: {'sound'}
view the network architecture using the layers property. the network has 86 layers. there are 28 layers with learnable weights: 27 convolutional layers, and 1 fully connected layer.
net.layers
ans =
86x1 layer array with layers:
1 'input_1' image input 96×64×1 images
2 'conv2d' convolution 32 3×3×1 convolutions with stride [2 2] and padding 'same'
3 'b' batch normalization batch normalization with 32 channels
4 'activation' relu relu
5 'depthwise_conv2d' grouped convolution 32 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same'
6 'l11' batch normalization batch normalization with 32 channels
7 'activation_1' relu relu
8 'conv2d_1' convolution 64 1×1×32 convolutions with stride [1 1] and padding 'same'
9 'l12' batch normalization batch normalization with 64 channels
10 'activation_2' relu relu
11 'depthwise_conv2d_1' grouped convolution 64 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same'
12 'l21' batch normalization batch normalization with 64 channels
13 'activation_3' relu relu
14 'conv2d_2' convolution 128 1×1×64 convolutions with stride [1 1] and padding 'same'
15 'l22' batch normalization batch normalization with 128 channels
16 'activation_4' relu relu
17 'depthwise_conv2d_2' grouped convolution 128 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same'
18 'l31' batch normalization batch normalization with 128 channels
19 'activation_5' relu relu
20 'conv2d_3' convolution 128 1×1×128 convolutions with stride [1 1] and padding 'same'
21 'l32' batch normalization batch normalization with 128 channels
22 'activation_6' relu relu
23 'depthwise_conv2d_3' grouped convolution 128 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same'
24 'l41' batch normalization batch normalization with 128 channels
25 'activation_7' relu relu
26 'conv2d_4' convolution 256 1×1×128 convolutions with stride [1 1] and padding 'same'
27 'l42' batch normalization batch normalization with 256 channels
28 'activation_8' relu relu
29 'depthwise_conv2d_4' grouped convolution 256 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same'
30 'l51' batch normalization batch normalization with 256 channels
31 'activation_9' relu relu
32 'conv2d_5' convolution 256 1×1×256 convolutions with stride [1 1] and padding 'same'
33 'l52' batch normalization batch normalization with 256 channels
34 'activation_10' relu relu
35 'depthwise_conv2d_5' grouped convolution 256 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same'
36 'l61' batch normalization batch normalization with 256 channels
37 'activation_11' relu relu
38 'conv2d_6' convolution 512 1×1×256 convolutions with stride [1 1] and padding 'same'
39 'l62' batch normalization batch normalization with 512 channels
40 'activation_12' relu relu
41 'depthwise_conv2d_6' grouped convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same'
42 'l71' batch normalization batch normalization with 512 channels
43 'activation_13' relu relu
44 'conv2d_7' convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same'
45 'l72' batch normalization batch normalization with 512 channels
46 'activation_14' relu relu
47 'depthwise_conv2d_7' grouped convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same'
48 'l81' batch normalization batch normalization with 512 channels
49 'activation_15' relu relu
50 'conv2d_8' convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same'
51 'l82' batch normalization batch normalization with 512 channels
52 'activation_16' relu relu
53 'depthwise_conv2d_8' grouped convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same'
54 'l91' batch normalization batch normalization with 512 channels
55 'activation_17' relu relu
56 'conv2d_9' convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same'
57 'l92' batch normalization batch normalization with 512 channels
58 'activation_18' relu relu
59 'depthwise_conv2d_9' grouped convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same'
60 'l101' batch normalization batch normalization with 512 channels
61 'activation_19' relu relu
62 'conv2d_10' convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same'
63 'l102' batch normalization batch normalization with 512 channels
64 'activation_20' relu relu
65 'depthwise_conv2d_10' grouped convolution 512 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same'
66 'l111' batch normalization batch normalization with 512 channels
67 'activation_21' relu relu
68 'conv2d_11' convolution 512 1×1×512 convolutions with stride [1 1] and padding 'same'
69 'l112' batch normalization batch normalization with 512 channels
70 'activation_22' relu relu
71 'depthwise_conv2d_11' grouped convolution 512 groups of 1 3×3×1 convolutions with stride [2 2] and padding 'same'
72 'l121' batch normalization batch normalization with 512 channels
73 'activation_23' relu relu
74 'conv2d_12' convolution 1024 1×1×512 convolutions with stride [1 1] and padding 'same'
75 'l122' batch normalization batch normalization with 1024 channels
76 'activation_24' relu relu
77 'depthwise_conv2d_12' grouped convolution 1024 groups of 1 3×3×1 convolutions with stride [1 1] and padding 'same'
78 'l131' batch normalization batch normalization with 1024 channels
79 'activation_25' relu relu
80 'conv2d_13' convolution 1024 1×1×1024 convolutions with stride [1 1] and padding 'same'
81 'l132' batch normalization batch normalization with 1024 channels
82 'activation_26' relu relu
83 'global_average_pooling2d' global average pooling global average pooling
84 'dense' fully connected 521 fully connected layer
85 'softmax' softmax softmax
86 'sound' classification output crossentropyex with 'speech' and 520 other classes
to view the names of the classes learned by the network, you can view the classes property of the classification output layer (the final layer). view the first 10 classes by specifying the first 10 elements.
net.layers(end).classes(1:10)
ans = 10×1 categorical
speech
child speech, kid speaking
conversation
narration, monologue
babbling
speech synthesizer
shout
bellow
whoop
yell
use analyzenetwork (deep learning toolbox) to visually explore the network.
analyzenetwork(net)

yamnet was released with a corresponding sound class ontology, which you can explore using the object.
ygraph = yamnetgraph;
p = plot(ygraph);
layout(p,'layered')
the ontology graph plots all 521 possible sound classes. plot a subgraph of the sounds related to respiratory sounds.
allrespiratorysounds = dfsearch(ygraph,"respiratory sounds");
ygraphspeech = subgraph(ygraph,allrespiratorysounds);
plot(ygraphspeech)
classify sounds using yamnet
this example uses:
read in an audio signal to classify it.
[audioin,fs] = audioread(
"trainwhistle-16-44p1-mono-9secs.wav");plot and listen to the audio signal.
t = (0:numel(audioin)-1)/fs; plot(t,audioin) xlabel("time (s)") ylabel("ampltiude") axis tight

% to play the sound, call soundsc(audioin,fs)yamnet requires you to preprocess the audio signal to match the input format used to train the network. the preprocesssing steps include resampling the audio signal and computing an array of mel spectrograms. to learn more about mel spectrograms, see . use yamnetpreprocess to preprocess the signal and extract the mel spectrograms to be passed to yamnet. visualize one of these spectrograms chosen at random.
spectrograms = yamnetpreprocess(audioin,fs); arbitraryspect = spectrograms(:,:,1,randi(size(spectrograms,4))); surf(arbitraryspect,edgecolor="none") view([90 -90]) xlabel("mel band") ylabel("frame") title("mel spectrogram for yamnet") axis tight

create a yamnet neural network. using the yamnet function requires installing the pretrained yamnet network. if the network is not installed, the function provides a link to download the pretrained model. call classify with the network on the preprocessed mel spectrogram images.
net = yamnet; classes = classify(net,spectrograms);
calling classify returns a label for each of the spectrogram images in the input. classify the sound as the most frequently occurring label in the output of classify.
mysound = mode(classes)
mysound = categorical
whistle
transfer learning using yamnet
this example uses:
download and unzip the air compressor data set [1]. this data set consists of recordings from air compressors in a healthy state or one of 7 faulty states.
url = 'https://www.mathworks.com/supportfiles/audio/aircompressordataset/aircompressordataset.zip'; downloadfolder = fullfile(tempdir,'aircompressordataset'); datasetlocation = tempdir; if ~exist(fullfile(tempdir,'aircompressordataset'),'dir') loc = websave(downloadfolder,url); unzip(loc,fullfile(tempdir,'aircompressordataset')) end
create an audiodatastore object to manage the data and split it into train and validation sets.
ads = audiodatastore(downloadfolder,'includesubfolders',true,'labelsource','foldernames'); [adstrain,adsvalidation] = spliteachlabel(ads,0.8,0.2);
read an audio file from the datastore and save the sample rate for later use. reset the datastore to return the read pointer to the beginning of the data set. listen to the audio signal and plot the signal in the time domain.
[x,fileinfo] = read(adstrain); fs = fileinfo.samplerate; reset(adstrain) sound(x,fs) figure t = (0:size(x,1)-1)/fs; plot(t,x) xlabel('time (s)') title('state = ' string(fileinfo.label)) axis tight

extract mel spectrograms from the train set using yamnetpreprocess. there are multiple spectrograms for each audio signal. replicate the labels so that they are in one-to-one correspondence with the spectrograms.
emptylabelvector = adstrain.labels; emptylabelvector(:) = []; trainfeatures = []; trainlabels = emptylabelvector; while hasdata(adstrain) [audioin,fileinfo] = read(adstrain); features = yamnetpreprocess(audioin,fileinfo.samplerate); numspectrums = size(features,4); trainfeatures = cat(4,trainfeatures,features); trainlabels = cat(2,trainlabels,repmat(fileinfo.label,1,numspectrums)); end
extract features from the validation set and replicate the labels.
validationfeatures = []; validationlabels = emptylabelvector; while hasdata(adsvalidation) [audioin,fileinfo] = read(adsvalidation); features = yamnetpreprocess(audioin,fileinfo.samplerate); numspectrums = size(features,4); validationfeatures = cat(4,validationfeatures,features); validationlabels = cat(2,validationlabels,repmat(fileinfo.label,1,numspectrums)); end
the air compressor data set has only eight classes.
read in yamnet and convert it to a (deep learning toolbox).
if yamnet pretrained network is not installed on your machine, execute the following commands to download and unzip the yamnet model to your temporary directory.
downloadfolder = fullfile(tempdir,'yamnetdownload'); loc = websave(downloadfolder,'https://ssd.mathworks.com/supportfiles/audio/yamnet.zip'); yamnetlocation = tempdir; unzip(loc,yamnetlocation) addpath(fullfile(yamnetlocation,'yamnet'))
after you read in yamnet and convert it to a (deep learning toolbox), replace the final fullyconnectedlayer (deep learning toolbox) and the final (deep learning toolbox) to reflect the new task.
uniquelabels = unique(adstrain.labels); numlabels = numel(uniquelabels); net = yamnet; lgraph = layergraph(net.layers); newdenselayer = fullyconnectedlayer(numlabels,"name","dense"); lgraph = replacelayer(lgraph,"dense",newdenselayer); newclassificationlayer = classificationlayer("name","sounds","classes",uniquelabels); lgraph = replacelayer(lgraph,"sound",newclassificationlayer);
to define training options, use (deep learning toolbox).
minibatchsize = 128; validationfrequency = floor(numel(trainlabels)/minibatchsize); options = trainingoptions('adam', ... 'initiallearnrate',3e-4, ... 'maxepochs',2, ... 'minibatchsize',minibatchsize, ... 'shuffle','every-epoch', ... 'plots','training-progress', ... 'verbose',false, ... 'validationdata',{single(validationfeatures),validationlabels}, ... 'validationfrequency',validationfrequency);
to train the network, use (deep learning toolbox).
aircompressornet = trainnetwork(trainfeatures,trainlabels,lgraph,options);

save the trained network to aircompressornet.mat. you can now use this pre-trained network by loading the aircompressornet.mat file.
save aircompressornet.mat aircompressornet
references
[1] verma, nishchal k., et al. “intelligent condition based monitoring using acoustic signals for air compressors.” ieee transactions on reliability, vol. 65, no. 1, mar. 2016, pp. 291–309. doi.org (crossref), doi:10.1109/tr.2015.2459684.
output arguments
net — pretrained yamnet neural network
seriesnetwork object
pretrained yamnet neural network, returned as a (deep learning toolbox) object.
references
[1] gemmeke, jort f., et al. “audio set: an ontology and human-labeled dataset for audio events.” 2017 ieee international conference on acoustics, speech and signal processing (icassp), ieee, 2017, pp. 776–80. doi.org (crossref), doi:10.1109/icassp.2017.7952261.
[2] hershey, shawn, et al. “cnn architectures for large-scale audio classification.” 2017 ieee international conference on acoustics, speech and signal processing (icassp), ieee, 2017, pp. 131–35. doi.org (crossref), doi:10.1109/icassp.2017.7952132.
extended capabilities
c/c code generation
generate c and c code using matlab® coder™.
usage notes and limitations:
only the
activationsandpredictobject functions are supported.to create a
seriesnetworkobject for code generation, see (matlab coder).
gpu code generation
generate cuda® code for nvidia® gpus using gpu coder™.
usage notes and limitations:
only the
activations,classify,predict,predictandupdatestate, andresetstateobject functions are supported.to create a
seriesnetworkobject for code generation, see (gpu coder).
version history
introduced in r2020b
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.