streamline audio feature extraction -pg电子麻将胡了
streamline audio feature extraction
since r2019b
description
audiofeatureextractor encapsulates multiple audio feature
extractors into a streamlined and modular implementation.
creation
description
creates an
audio feature extractor with default property values.afe = audiofeatureextractor()
specifies nondefault properties for afe = audiofeatureextractor(name=value)afe using one or more name-value
arguments.
properties
main properties
window — analysis window
hamming(1024,"periodic") (default) | real vector
analysis window, specified as a real vector.
data types: single | double
overlaplength — overlap length of adjacent analysis windows
512 (default) | integer in the range [0,
numel(window))
window)overlap length of adjacent analysis windows, specified as an integer in the range
[0, numel(window)).
data types: single | double
fftlength — fft length
[] (default) | positive integer
fft length, specified as an integer. the default value of []
means that the fft length is equal to the window length numel(window).
data types: single | double
samplerate — input sample rate (hz)
44100 (default) | positive scalar
input sample rate in hz, specified as a positive scalar.
data types: single | double
spectraldescriptorinput — input to spectral descriptors
"linearspectrum" (default) | "melspectrum" | "barkspectrum" | "erbspectrum"
input to spectral descriptors, specified as "linearspectrum",
"melspectrum", "barkspectrum", or
"erbspectrum".
spectral descriptors affected by this property are:
the spectrum input to the spectral descriptors is the same as output from the corresponding feature:
for example, if you set spectraldescriptorinput to
"barkspectrum", and spectralcentroid to
true, then afe returns the centroid of the
default bark
spectrum.
[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav"); afe = audiofeatureextractor(samplerate=fs, ... spectraldescriptorinput="barkspectrum", ... spectralcentroid=true); barkspectralcentroid = extract(afe,audioin);
barkspectrum using , then the nondefault bark spectrum is the input
to the spectral descriptors. for example, if you call
setextractorparameters(afe,"barkspectrum",numbands=40), then
afe returns the centroid of a 40-band bark spectrum.
setextractorparameters(afe,"barkspectrum",numbands=40)
bark40spectralcentroid = extract(afe,audioin);data types: char | string
featurevectorlength — number of features output from extract
positive integer
this property is read-only.
total number of features output from extract for the current
object configuration, specified as a positive integer.
featurevectorlength is equal to the second dimension of the
output from the
function.
data types: single | double
features to extract
linearspectrum — extract linear spectrum
false (default) | true
extract the one-sided linear spectrum, specified as true or
false.
to set parameters of the linear spectrum extraction, use :
setextractorparameters(afe,"linearspectrum",name=value)frequencyrange–– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified,frequencyrangedefaults to[0,.samplerate/2]spectrumtype–– spectrum type, specified as"power"or"magnitude". if unspecified,spectrumtypedefaults to"power".windownormalization–– apply window normalization, specified astrueorfalse. if unspecified,windownormalizationdefaults totrue.
data types: logical
melspectrum — extract mel spectrum
false (default) | true
extract the one-sided mel spectrum, specified as true or
false.
to set parameters of the mel spectrum extraction, use :
setextractorparameters(afe,"melspectrum",name=value)frequencyrange–– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified,frequencyrangedefaults to[0,.samplerate/2]spectrumtype–– spectrum type, specified as"power"or"magnitude". if unspecified,spectrumtypedefaults to"power".numbands–– number of mel bands, specified as an integer. if unspecified,numbandsdefaults to32.filterbanknormalization–– normalization applied to bandpass filters, specified as"bandwidth","area", or"none". if unspecified,filterbanknormalizationdefaults to"bandwidth".windownormalization–– apply window normalization, specified astrueorfalse. if unspecified,windownormalizationdefaults totrue.filterbankdesigndomain–– domain in which the filter bank is designed, specified as either"linear"or"warped". if unspecified,filterbankdesigndomaindefaults to"linear".
data types: logical
barkspectrum — extract bark spectrum
false (default) | true
extract the one-sided bark spectrum, specified as true or
false.
to set parameters of the bark spectrum extraction, use :
setextractorparameters(afe,"barkspectrum",name=value)frequencyrange–– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified,frequencyrangedefaults to[0,.samplerate/2]spectrumtype–– spectrum type, specified as"power"or"magnitude". if unspecified,spectrumtypedefaults to"power".numbands–– number of bark bands, specified as an integer. if unspecified,numbandsdefaults to32.filterbanknormalization–– normalization applied to bandpass filters, specified as"bandwidth","area", or"none". if unspecified,filterbanknormalizationdefaults to"bandwidth".windownormalization–– apply window normalization, specified astrueorfalse. if unspecified,windownormalizationdefaults totrue.filterbankdesigndomain–– domain in which the filter bank is designed, specified as either"linear"or"warped". if unspecified,filterbankdesigndomaindefaults to"linear".
data types: logical
erbspectrum — extract erb spectrum
false (default) | true
extract the one-sided erb spectrum, specified as true or
false.
to set parameters of the erb spectrum extraction, use :
setextractorparameters(afe,"erbspectrum",name=value)frequencyrange–– frequency range of the extracted spectrum in hz, specified as a two-element vector of increasing numbers in the range [0, samplerate/2]. if unspecified,frequencyrangedefaults to[0,.samplerate/2]spectrumtype–– spectrum type, specified as"power"or"magnitude". if unspecified,spectrumtypedefaults to"power".numbands–– number of erb bands, specified as an integer. if unspecified,numbandsdefaults toceil((frequencyrange(2))-.hz2erb(frequencyrange(1)))filterbanknormalization–– normalization applied to bandpass filters, specified as"bandwidth","area", or"none". if unspecified,filterbanknormalizationdefaults to"bandwidth".windownormalization–– apply window normalization, specified astrueorfalse. if unspecified,windownormalizationdefaults totrue.
data types: logical
mfcc — extract mel-frequency cepstral coefficients (mfcc)
false (default) | true
extract mel-frequency cepstral coefficients (mfcc), specified as
true or false.
to set parameters of the mfcc extraction, use :
setextractorparameters(afe,"mfcc",name=value)numcoeffs–– number of coefficients returned for each window, specified as a positive integer. if unspecified,numcoeffsdefaults to13.deltawindowlength–– delta window length, specified as an odd integer greater than 2. if unspecified,deltawindowlengthdefaults to9. this parameter affects themfccdeltaandmfccdeltadeltafeatures.rectification–– type of nonlinear rectification, specified as"log"or"cubic-root".
the mel-frequency cepstral coefficients are calculated using the melspectrum.
data types: logical
mfccdelta — extract delta of mfcc
false (default) | true
extract delta of mfcc, specified as true or
false.
the delta mfcc is calculated based on the extracted mfcc. parameters set on
mfcc affect mfccdelta.
data types: logical
mfccdeltadelta — extract delta-delta of mfcc
false (default) | true
extract delta-delta of mfcc, specified as true or
false.
the delta-delta mfcc is calculated based on the extracted mfcc. parameters set on
mfcc affect mfccdeltadelta.
data types: logical
gtcc — extract gammatone cepstral coefficients (gtcc)
false (default) | true
extract gammatone cepstral coefficients (gtcc), specified as
true or false.
to set parameters of the gtcc extraction, use :
setextractorparameters(afe,"gtcc",name=value)numcoeffs–– number of coefficients returned for each window, specified as a positive integer. if unspecified,numcoeffsdefaults to13.deltawindowlength–– delta window length, specified as an odd integer greater than 2. if unspecified,deltawindowlengthdefaults to9. this parameter affects thegtccdeltaandgtccdeltadeltafeatures.
rectification–– type of nonlinear rectification, specified as"log"or"cubic-root".
the gammatone cepstral coefficients are calculated using the erbspectrum.
data types: logical
gtccdelta — extract delta of gtcc
false (default) | true
extract delta of gtcc, specified as true or
false.
the delta gtcc is calculated based on the extracted gtcc. parameters set on
gtcc affect gtccdelta.
data types: logical
gtccdeltadelta — extract delta-delta of gtcc
false (default) | true
extract delta-delta of gtcc, specified as true or
false.
the delta-delta gtcc is calculated based on the extracted gtcc. parameters set on
gtcc affect gtccdeltadelta.
data types: logical
spectralcentroid — extract spectral centroid
false (default) | true
extract spectral centroid, specified as true or
false.
the spectral centroid is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralcrest — extract spectral crest
false (default) | true
extract spectral crest, specified as true or
false.
the spectral crest is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectraldecrease — extract spectral decrease
false (default) | true
extract spectral decrease, specified as true or
false.
the spectral decrease is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralentropy — extract spectral entropy
false (default) | true
extract spectral entropy, specified as true or
false.
the spectral entropy is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralflatness — extract spectral flatness
false (default) | true
extract spectral flatness, specified as true or
false.
the spectral flatness is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralflux — extract spectral flux
false (default) | true
extract spectral flux, specified as true or
false.
the spectral flux is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
to set parameters of the spectral flux extraction, use :
setextractorparameters(afe,"spectralflux",name=value)normtype–– norm type used to calculate the spectral flux, specified as1or2. if unspecified,normtypedefaults to2.
data types: logical
spectralkurtosis — extract spectral kurtosis
false (default) | true
extract spectral kurtosis, specified as true or
false.
the spectral kurtosis is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralrolloffpoint — extract spectral rolloff point
false (default) | true
extract spectral rolloff point, specified as true or
false.
the spectral rolloff point is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
to set parameters of the spectral rolloff point extraction, use :
setextractorparameters(afe,"spectralrolloffpoint",name=value)threshold–– threshold of the rolloff point, specified as a scalar in the range (0, 1). if unspecified,thresholddefaults to0.95.
data types: logical
spectralskewness — extract spectral skewness
false (default) | true
extract spectral skewness, specified as true or
false.
the spectral skewness is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralslope — extract spectral slope
false (default) | true
extract spectral slope, specified as true or
false.
the spectral slope is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
spectralspread — extract spectral spread
false (default) | true
extract spectral spread, specified as true or
false.
the spectral spread is calculated on one of the following spectral representations, as specified by the spectraldescriptorinput property:
data types: logical
pitch — extract pitch
false (default) | true
extract pitch, specified as true or
false.
to set parameters of the pitch extraction, use :
setextractorparameters(afe,"pitch",name=value)method–– method used to calculate the pitch, specified as"pef","ncf","cep","lhs", or"srh". if unspecified,methoddefaults to"ncf". for a description of available pitch extraction methods, see .range–– range within to search for the pitch in hz, specified as a two-element row vector of increasing values. if unspecified,rangedefaults to[50,400].medianfilterlength–– median filter length used to smooth pitch estimates over time, specified as a positive integer. if unspecified,medianfilterlengthdefaults to1(no median filtering).
data types: logical
harmonicratio — extract harmonic ratio
false (default) | true
extract harmonic ratio, specified as true or
false.
data types: logical
zerocrossrate — extract zero-crossing rate
false (default) | true
extract zero-crossing rate, specified as true or
false.
to set parameters of the zero-crossing rate extraction, use :
setextractorparameters(afe,"zerocrossrate",name=value)method–– method for computing the zero-crossing rate, specified as"difference"or"comparison". if unspecified,method, defaults to"difference". for more information, see .level–– signal level for which the crossing rate is computed, specified as a real scalar.audiofeatureextractorsubtracts thelevelvalue from the signal and then finds the zero crossings. if unspecified,leveldefaults to0.threshold–– threshold above and below thelevelvalue over which the crossing rate is computed, specified as a real scalar.audiofeatureextractorsets all the values of the input in the range[–tothreshold,threshold]0and then finds the zero crossings. if unspecified,thresholddefaults to0.transitionedge— transitions to include when counting zero crossings, specified as"falling","rising", or"both". if you specify"falling", only negative-going transitions are counted. if you specify"rising", only positive-going transitions are counted. if unspecified,transitionedgedefaults to"both".zeropositive— sign convention, specified as a logical scalar. if you specifyzeropositiveastrue, then0is considered positive. if you specifyzeropositiveasfalse, thenaudiofeatureextractorconsiders0,–1, and1to have distinct signs following the convention of the function. if unspecified,zeropositivedefaults tofalse.
data types: logical
shorttimeenergy — extract short-time energy
false (default) | true
extract short-time energy, specified as true or
false. the short-time energy is computed using
ste = sum(xbw.^2,1),
where xbw is the buffered and windowed
signal.
example: chirp function
generate a chirp sampled at 1 khz for 3 seconds. the instantaneous frequency is 100 hz at and crosses 200 hz at second. divide the signal into 103-sample segments with 43 samples of overlap between adjoining segments. window each segment with a periodic hamming window.
fs = 1e3; x = chirp(0:1/fs:3,100,1,200)'; win = hamming(103,"periodic"); nover = 43; [xb,~] = buffer(x,length(win),nover,"nodelay"); xbw = xb.*win;
compute the short-time energy using the definition.
edef = sum(xbw.^2,1)';
use audiofeatureextractor to compute the short-time energy.
eafe = extract(audiofeatureextractor(shorttimeenergy=true, ...
samplerate=fs,window=win,overlaplength=nover),x);verify that both procedures give the same short-time energy.
dff = max(abs(eafe-edef))
dff = 0
data types: logical
object functions
| extract audio features | |
| set nondefault parameter values for individual feature extractors | |
| output mapping and individual feature extractor parameters | |
| create matlab function compatible with c/c code generation | |
| plot extracted audio features |
examples
extract multiple audio features
read in an audio signal.
[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");create an audiofeatureextractor object that extracts the mfcc, delta mfcc, delta-delta mfcc, pitch, spectral centroid, zero-crossing rate, and short-time energy of the signal. use a 30 ms analysis window with 20 ms overlap.
afe = audiofeatureextractor( ... samplerate=fs, ... window=hamming(round(0.03*fs),"periodic"), ... overlaplength=round(0.02*fs), ... mfcc=true, ... mfccdelta=true, ... mfccdeltadelta=true, ... pitch=true, ... spectralcentroid=true, ... zerocrossrate=true, ... shorttimeenergy=true);
call extract to extract the audio features from the audio signal.
features = extract(afe,audioin);
use info to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.
idx = info(afe)
idx = struct with fields:
mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13]
mfccdelta: [14 15 16 17 18 19 20 21 22 23 24 25 26]
mfccdeltadelta: [27 28 29 30 31 32 33 34 35 36 37 38 39]
spectralcentroid: 40
pitch: 41
zerocrossrate: 42
shorttimeenergy: 43
plot the detected pitch over time.
t = linspace(0,size(audioin,1)/fs,size(features,1)); plot(t,features(:,idx.pitch)) title("pitch") xlabel("time (s)") ylabel("frequency (hz)")

plot the zero-crossing rate over time.
plot(t,features(:,idx.zerocrossrate)) title("zero-crossing rate") xlabel("time (s)")

plot the short-time energy over time.
plot(t,features(:,idx.shorttimeenergy)) title("short-time energy") xlabel("time (s)")

extract features from dataset
create an audio datastore that points to audio samples included with audio toolbox®.
folder = fullfile(matlabroot,"toolbox","audio","samples"); ads = audiodatastore(folder);
find all files that correspond to a sample rate of 44.1 khz and then the datastore.
keepfile = cellfun(@(x)contains(x,"44p1"),ads.files);
ads = subset(ads,keepfile);convert the data to a array. tall arrays are evaluated only when you request them explicitly using . matlab® automatically optimizes the queued calculations by minimizing the number of passes through the data. if you have parallel computing toolbox™, you can spread the calculations across multiple workers. the audio data is represented as an m-by-1 tall cell array, where m is the number of files in the audio datastore.
adstall = tall(ads)
starting parallel pool (parpool) using the 'local' profile ...
connected to the parallel pool (number of workers: 6).
adstall =
m×1 tall cell array
{ 539648×1 double}
{ 227497×1 double}
{ 8000×1 double}
{ 685056×1 double}
{ 882688×2 double}
{1115760×2 double}
{ 505200×2 double}
{3195904×2 double}
: :
: :
create an audiofeatureextractor object to extract the mel spectrum, bark spectrum, erb spectrum, and linear spectrum from each audio file. use the default analysis window and overlap length for the spectrum extraction.
afe = audiofeatureextractor(samplerate=44.1e3, ... melspectrum=true, ... barkspectrum=true, ... erbspectrum=true, ... linearspectrum=true);
define a function so that audio features are extracted from each cell of the tall array. call to evaluate the tall array.
specstall = cellfun(@(x)extract(afe,x),adstall,uniformoutput=false); specs = gather(specstall);
evaluating tall expression using the parallel pool 'local': - pass 1 of 1: completed in 14 sec evaluation completed in 14 sec
the specs variable returned from gather is a numfiles-by-1 cell array, where numfiles is the number of files in the datastore. each element of the cell array is a numhops-by-numfeatures-by-numchannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.
numfiles = numel(specs)
numfiles = 12
[numhops1,numfeaturesfile1,numchanelsfile1] = size(specs{1})numhops1 = 1053
numfeaturesfile1 = 620
numchanelsfile1 = 1
[numhops2,numfeaturesfile2,numchanelsfile2] = size(specs{2})numhops2 = 443
numfeaturesfile2 = 620
numchanelsfile2 = 1
visualize extracted audio features
use plotfeatures to visualize audio features extracted with an audiofeatureextractor object.
read in an audio signal from a file.
[audioin,fs] = audioread("counting-16-44p1-mono-15secs.wav");create an audiofeatureextractor object that extracts the gammatone cepstral coefficients (gtccs) and the delta of the gtccs. set the samplerate property to the sample rate of the audio signal, and use the default values for the other properties.
afe = audiofeatureextractor(samplerate=fs,gtcc=true,gtccdelta=true);
plot the features extracted from the audio signal.
plotfeatures(afe,audioin)

algorithms
the audiofeatureextractor creates a feature extraction pipeline based on
your selected features. to reduce computations, audiofeatureextractor reuses
intermediary representations and outputs some intermediate representations as features.

for example, to create an object that extracts the centroid of the bark spectrum, the flux
of the bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the mfcc, specify
the audiofeatureextractor as
follows.
afe = audiofeatureextractor( ... spectraldescriptorinput="barkspectrum", ... spectralcentroid=true, ... spectralflux=true, ... pitch=true, ... harmonicratio=true, ... mfccdeltadelta=true)
afe =
audiofeatureextractor with properties:
properties
window: [1024×1 double]
overlaplength: 512
samplerate: 44100
fftlength: []
spectraldescriptorinput: 'barkspectrum'
enabled features
mfccdeltadelta, spectralcentroid, spectralflux, pitch, harmonicratio
disabled features
linearspectrum, melspectrum, barkspectrum, erbspectrum, mfcc, mfccdelta
gtcc, gtccdelta, gtccdeltadelta, spectralcrest, spectraldecrease, spectralentropy
spectralflatness, spectralkurtosis, spectralrolloffpoint, spectralskewness, spectralslope, spectralspread
to extract a feature, set the corresponding property to true.
for example, obj.mfcc = true, adds mfcc to the list of enabled features.
note
because audiofeatureextractor reuses intermediary representations, the
features output from audiofeatureextractor might not correspond with the
default configuration of features output by corresponding individual feature
extractors.
extended capabilities
c/c code generation
generate c and c code using matlab® coder™.
usage notes and limitations:
you cannot generate code directly from
audiofeatureextractor. you can generate c/c code from the function returned by .functions returned by that compute an auditory spectrum (mel, bark, erb) support optimized code generation using single instruction, multiple data (simd) instructions. for more information about simd code generation, see (matlab coder).
zerocrossratecode generation does not support disabling dynamic memory allocation when the input is multichannel.
gpu arrays
accelerate code by running on a graphics processing unit (gpu) using parallel computing toolbox™.
this function fully supports gpu arrays. for more information, see run matlab functions on a gpu (parallel computing toolbox).
version history
introduced in r2019br2023a: generate optimized c/c code for computing auditory spectrum
functions returned by that compute an auditory spectrum (mel, bark, erb) support optimized c/c code generation using single instruction, multiple data (simd) instructions.
r2022b: visualize extracted features
use the object function to visualize extracted audio features.
r2020b: computation of deltas and delta-deltas
the
function is now used to compute mfccdelta,
mfccdeltadelta, gtccdelta, and
gtccdeltadelta. the audiodelta algorithm has a
different startup behavior than the previous algorithm. the default window length used to
compute the deltas has changed from 2 to 9. a delta
window length of 2 is no longer supported.
see also
| audiodatastore | audiodataaugmenter | |
打开示例
您曾对此示例进行过修改。是否要打开带有您的编辑的示例?
matlab 命令
您点击的链接对应于以下 matlab 命令:
请在 matlab 命令行窗口中直接输入以执行命令。web 浏览器不支持 matlab 命令。
you can also select a web site from the following list:
how to get best site performance
select the china site (in chinese or english) for best site performance. other mathworks country sites are not optimized for visits from your location.