classes definitions, and skeletons of helpfiles/testfiles for (buf)mfcc and (buf)melbands

nix
Pierre Alexandre Tremblay 7 years ago
parent 0fb414af7d
commit e125ad692a

@ -0,0 +1,25 @@
FluidBufMFCC{
*process { arg server, source, startFrame = 0, numFrames = -1, startChan = 0, numChans = -1, features, numCoefs = 13, numBands = 40, minFreq = 20, maxFreq = 20000, winSize = 1024, hopSize = -1, fftSize = -1, action;
var maxFFTSize = if (fftSize == -1) {winSize.nextPowerOfTwo} {fftSize};
source = source.asUGenInput;
features = features.asUGenInput;
source.isNil.if {"FluidBufMFCC: Invalid source buffer".throw};
features.isNil.if {"FluidBufMFCC: Invalid features buffer".throw};
server = server ? Server.default;
//NB For wrapped versions of NRT classes, we set the params for maxima to
//whatever has been passed in language-side (e.g maxFFTSize still exists as a parameter for the server plugin, but makes less sense here: it just needs to be set to a legal value)
// same goes to maxNumCoefs, which is passed numCoefs in this case
forkIfNeeded{
server.sendMsg(\cmd, \BufMFCC, source, startFrame, numFrames, startChan, numChans, features, numCoefs, numBands, minFreq, maxFreq, numCoefs, winSize, hopSize, fftSize, maxFFTSize);
server.sync;
features = server.cachedBufferAt(features); features.updateInfo; server.sync;
action.value(features);
};
}
}

@ -0,0 +1,25 @@
FluidBufMelBands{
*process { arg server, source, startFrame = 0, numFrames = -1, startChan = 0, numChans = -1, features, numBands = 40, minFreq = 20, maxFreq = 20000, winSize = 1024, hopSize = -1, fftSize = -1, action;
var maxFFTSize = if (fftSize == -1) {winSize.nextPowerOfTwo} {fftSize};
source = source.asUGenInput;
features = features.asUGenInput;
source.isNil.if {"FluidBufMFCC: Invalid source buffer".throw};
features.isNil.if {"FluidBufMFCC: Invalid features buffer".throw};
server = server ? Server.default;
//NB For wrapped versions of NRT classes, we set the params for maxima to
//whatever has been passed in language-side (e.g maxFFTSize still exists as a parameter for the server plugin, but makes less sense here: it just needs to be set to a legal value)
// same for maxNumBands which is passed numBands
forkIfNeeded{
server.sendMsg(\cmd, \BufMelBands, source, startFrame, numFrames, startChan, numChans, features, numBands, minFreq, maxFreq, numBands, winSize, hopSize, fftSize, maxFFTSize);
server.sync;
features = server.cachedBufferAt(features); features.updateInfo; server.sync;
action.value(features);
};
}
}

@ -0,0 +1,20 @@
FluidMFCC : MultiOutUGen {
*kr { arg in = 0, numCoefs = 13, numBands = 40, minFreq = 20, maxFreq = 20000, maxNumCoefs = 40, winSize = 1024, hopSize = -1, fftSize = -1, maxFFTSize = 16384;
^this.multiNew('control', in.asAudioRateInput(this), numCoefs, numBands, minFreq, maxFreq, maxNumCoefs, winSize, hopSize, fftSize, maxFFTSize);
}
init {arg ...theInputs;
inputs = theInputs;
^this.initOutputs(inputs.at(5),rate);
}
checkInputs {
if(inputs.at(5).rate != 'scalar') {
^(": maxNumCoefs cannot be modulated.");
};
if(inputs.at(9).rate != 'scalar') {
^(": maxFFTSize cannot be modulated.");
};^this.checkValidInputs;
}
}

@ -0,0 +1,20 @@
FluidMelBands : MultiOutUGen {
*kr { arg in = 0, numBands = 40, minFreq = 20, maxFreq = 20000, maxNumBands = 40, winSize = 1024, hopSize = -1, fftSize = -1, maxFFTSize = 16384;
^this.multiNew('control', in.asAudioRateInput(this), numBands, minFreq, maxFreq, maxNumBands, winSize, hopSize, fftSize, maxFFTSize);
}
init {arg ...theInputs;
inputs = theInputs;
^this.initOutputs(inputs.at(4),rate);
}
checkInputs {
if(inputs.at(4).rate != 'scalar') {
^(": maxNumCoefs cannot be modulated.");
};
if(inputs.at(8).rate != 'scalar') {
^(": maxFFTSize cannot be modulated.");
};^this.checkValidInputs;
}
}

@ -0,0 +1,131 @@
TITLE:: FluidBufMFCC
SUMMARY:: Seven Spectral Shape Descriptors on a Buffer
CATEGORIES:: Libraries>FluidDecomposition
RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/SpecCentroid, Classes/SpecFlatness, Classes/SpecCentroid, Classes/SpecPcile
DESCRIPTION::
This class implements seven of the most popular spectral shape descriptors, computed on a linear scale for both amplitude and frequency. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.FOOTNOTE:: This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 725899).::
The descriptors are:
LIST::
##the four first statistical moments (https://en.wikipedia.org/wiki/Moment_(mathematics) ), more commonly known as:
LIST::
## the spectral centroid (1) in spectral bins as units. This is the point that splits the spectrum in 2 halves of equal energy. It is the weighted average of the magnitude spectrum.
## the spectral spread (2) in spectral bins. This is the standard deviation of the spectrum envelop, or the average of the distance to the centroid.
## the normalised skewness (3) as ratio. This indicates how tilted is the spectral curve in relation to the middle of the spectral frame, i.e. half of the Nyquist frequency. If it is below the bin representing that frequency, i.e. the central bin of the magnitude spectrum, it is positive.
## the normalised kurtosis (4) as ratio. This indicates how focused is the spectral curve. If it is peaky, it is high.
::
## the rolloff (5) in bin number. This indicates the bin under which 95% of the energy is included.
## the flatness (6) in dB. This is the ratio of geometric mean of the magnitude, over the arithmetic mean of the magnitudes. It yields a very approximate measure on how noisy a signal is.
## the crest (7) in dB. This is the ratio of the loudest magnitude over the RMS of the whole frame. A high number is an indication of a loud peak poking out from the overal spectral curve.
The drawings in Peeters 2003 (http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Peeters_2003_cuidadoaudiofeatures.pdf) are useful, as are the commented examples below. For the mathematically-inclined reader, the tutorials and code offered here (https://www.audiocontentanalysis.org/) are interesting to further the understanding.
::
The process will return a multichannel buffer with the seven channels per input channel, each containing the 7 shapes. Each sample represents a value, which is every hopSize.
CLASSMETHODS::
METHOD:: process
This is the method that calls for the spectral shape descriptors to be calculated on a given source buffer.
ARGUMENT:: server
The server on which the buffers to be processed are allocated.
ARGUMENT:: source
The index of the buffer to use as the source material to be described through the various descriptors. The different channels of multichannel buffers will be processing sequentially.
ARGUMENT:: startFrame
Where in the srcBuf should the process start, in sample.
ARGUMENT:: numFrames
How many frames should be processed.
ARGUMENT:: startChan
For multichannel srcBuf, which channel should be processed first.
ARGUMENT:: numChans
For multichannel srcBuf, how many channel should be processed.
ARGUMENT:: features
The destination buffer for the 7 spectral features describing the spectral shape.
ARGUMENT:: numCoefs
(describe argument here)
ARGUMENT:: numBands
(describe argument here)
ARGUMENT:: minFreq
(describe argument here)
ARGUMENT:: maxFreq
(describe argument here)
ARGUMENT:: winSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hope size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts.
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision.
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [features] as an argument.
RETURNS::
Nothing, as the destination buffer is declared in the function call.
EXAMPLES::
code::
// create some buffers
(
b = Buffer.read(s,File.realpath(FluidBufSpectralShape.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Nicol-LoopE-M.wav");
c = Buffer.new(s);
)
// run the process with basic parameters
(
Routine{
t = Main.elapsedTime;
FluidBufSpectralShape.process(s, b, features: c);
(Main.elapsedTime - t).postln;
}.play
)
// listen to the source and look at the buffer
b.play;
c.plot(minval:-5, maxval:250)
::
STRONG::A stereo buffer example.::
CODE::
// load two very different files
(
b = Buffer.read(s,File.realpath(FluidBufSpectralShape.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-SA-UprightPianoPedalWide.wav");
c = Buffer.read(s,File.realpath(FluidBufSpectralShape.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-AaS-AcousticStrums-M.wav");
)
// composite one on left one on right as test signals
FluidBufCompose.process(s, c, numFrames:b.numFrames, startFrame:555000,destStartChan:1, destination:b)
b.play
// create a buffer as destinations
c = Buffer.new(s);
//run the process on them
(
Routine{
t = Main.elapsedTime;
FluidBufSpectralShape.process(s, b, features: c);
(Main.elapsedTime - t).postln;
}.play
)
// look at the buffer: 7shapes for left, then 7 shapes for right
c.plot(minval:-25, maxval:150)
::

@ -0,0 +1,128 @@
TITLE:: FluidBufMelBands
SUMMARY:: Seven Spectral Shape Descriptors on a Buffer
CATEGORIES:: Libraries>FluidDecomposition
RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/SpecCentroid, Classes/SpecFlatness, Classes/SpecCentroid, Classes/SpecPcile
DESCRIPTION::
This class implements seven of the most popular spectral shape descriptors, computed on a linear scale for both amplitude and frequency. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.FOOTNOTE:: This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 725899).::
The descriptors are:
LIST::
##the four first statistical moments (https://en.wikipedia.org/wiki/Moment_(mathematics) ), more commonly known as:
LIST::
## the spectral centroid (1) in spectral bins as units. This is the point that splits the spectrum in 2 halves of equal energy. It is the weighted average of the magnitude spectrum.
## the spectral spread (2) in spectral bins. This is the standard deviation of the spectrum envelop, or the average of the distance to the centroid.
## the normalised skewness (3) as ratio. This indicates how tilted is the spectral curve in relation to the middle of the spectral frame, i.e. half of the Nyquist frequency. If it is below the bin representing that frequency, i.e. the central bin of the magnitude spectrum, it is positive.
## the normalised kurtosis (4) as ratio. This indicates how focused is the spectral curve. If it is peaky, it is high.
::
## the rolloff (5) in bin number. This indicates the bin under which 95% of the energy is included.
## the flatness (6) in dB. This is the ratio of geometric mean of the magnitude, over the arithmetic mean of the magnitudes. It yields a very approximate measure on how noisy a signal is.
## the crest (7) in dB. This is the ratio of the loudest magnitude over the RMS of the whole frame. A high number is an indication of a loud peak poking out from the overal spectral curve.
The drawings in Peeters 2003 (http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Peeters_2003_cuidadoaudiofeatures.pdf) are useful, as are the commented examples below. For the mathematically-inclined reader, the tutorials and code offered here (https://www.audiocontentanalysis.org/) are interesting to further the understanding.
::
The process will return a multichannel buffer with the seven channels per input channel, each containing the 7 shapes. Each sample represents a value, which is every hopSize.
CLASSMETHODS::
METHOD:: process
This is the method that calls for the spectral shape descriptors to be calculated on a given source buffer.
ARGUMENT:: server
The server on which the buffers to be processed are allocated.
ARGUMENT:: source
The index of the buffer to use as the source material to be described through the various descriptors. The different channels of multichannel buffers will be processing sequentially.
ARGUMENT:: startFrame
Where in the srcBuf should the process start, in sample.
ARGUMENT:: numFrames
How many frames should be processed.
ARGUMENT:: startChan
For multichannel srcBuf, which channel should be processed first.
ARGUMENT:: numChans
For multichannel srcBuf, how many channel should be processed.
ARGUMENT:: features
The destination buffer for the 7 spectral features describing the spectral shape.
ARGUMENT:: numBands
(describe argument here)
ARGUMENT:: minFreq
(describe argument here)
ARGUMENT:: maxFreq
(describe argument here)
ARGUMENT:: winSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hope size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts.
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision.
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [features] as an argument.
RETURNS::
Nothing, as the destination buffer is declared in the function call.
EXAMPLES::
code::
// create some buffers
(
b = Buffer.read(s,File.realpath(FluidBufSpectralShape.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Nicol-LoopE-M.wav");
c = Buffer.new(s);
)
// run the process with basic parameters
(
Routine{
t = Main.elapsedTime;
FluidBufSpectralShape.process(s, b, features: c);
(Main.elapsedTime - t).postln;
}.play
)
// listen to the source and look at the buffer
b.play;
c.plot(minval:-5, maxval:250)
::
STRONG::A stereo buffer example.::
CODE::
// load two very different files
(
b = Buffer.read(s,File.realpath(FluidBufSpectralShape.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-SA-UprightPianoPedalWide.wav");
c = Buffer.read(s,File.realpath(FluidBufSpectralShape.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-AaS-AcousticStrums-M.wav");
)
// composite one on left one on right as test signals
FluidBufCompose.process(s, c, numFrames:b.numFrames, startFrame:555000,destStartChan:1, destination:b)
b.play
// create a buffer as destinations
c = Buffer.new(s);
//run the process on them
(
Routine{
t = Main.elapsedTime;
FluidBufSpectralShape.process(s, b, features: c);
(Main.elapsedTime - t).postln;
}.play
)
// look at the buffer: 7shapes for left, then 7 shapes for right
c.plot(minval:-25, maxval:150)
::

@ -0,0 +1,258 @@
TITLE:: FluidMFCC
SUMMARY:: Seven Spectral Shape Descriptors in Real-Time
CATEGORIES:: Libraries>FluidDecomposition
RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/SpecCentroid, Classes/SpecFlatness, Classes/SpecCentroid, Classes/SpecPcile
DESCRIPTION::
This class implements seven of the most popular spectral shape descriptors, computed on a linear scale for both amplitude and frequency. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.FOOTNOTE:: This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 725899).::
The descriptors are:
LIST::
##the four first statistical moments (https://en.wikipedia.org/wiki/Moment_(mathematics) ), more commonly known as:
LIST::
## the spectral centroid (1) in spectral bins as units. This is the point that splits the spectrum in 2 halves of equal energy. It is the weighted average of the magnitude spectrum.
## the spectral spread (2) in spectral bins. This is the standard deviation of the spectrum envelop, or the average of the distance to the centroid.
## the normalised skewness (3) as ratio. This indicates how tilted is the spectral curve in relation to the middle of the spectral frame, i.e. half of the Nyquist frequency. If it is below the bin representing that frequency, i.e. the central bin of the magnitude spectrum, it is positive.
## the normalised kurtosis (4) as ratio. This indicates how focused is the spectral curve. If it is peaky, it is high.
::
## the rolloff (5) in bin number. This indicates the bin under which 95% of the energy is included.
## the flatness (6) in dB. This is the ratio of geometric mean of the magnitude, over the arithmetic mean of the magnitudes. It yields a very approximate measure on how noisy a signal is.
## the crest (7) in dB. This is the ratio of the loudest magnitude over the RMS of the whole frame. A high number is an indication of a loud peak poking out from the overal spectral curve.
The drawings in Peeters 2003 (http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Peeters_2003_cuidadoaudiofeatures.pdf) are useful, as are the commented examples below. For the mathematically-inclined reader, the tutorials and code offered here (https://www.audiocontentanalysis.org/) are interesting to further the understanding.
::
The process will return a multichannel control steam with the seven values, which will be repeated if no change happens within the algorythm, i.e. when the hopSize is larger than the server's kr period.
CLASSMETHODS::
METHOD:: kr
The audio rate in, control rate out version of the object.
ARGUMENT:: in
The audio to be processed.
ARGUMENT:: numCoefs
(describe argument here)
ARGUMENT:: numBands
(describe argument here)
ARGUMENT:: minFreq
(describe argument here)
ARGUMENT:: maxFreq
(describe argument here)
ARGUMENT:: maxNumCoefs
(describe argument here)
ARGUMENT:: winSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hope size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
A 7-channel KR signal with the seven spectral shape descriptors. The latency is winSize.
EXAMPLES::
code::
//create a monitoring bus for the descriptors
b = Bus.new(\control,0,7);
//create a monitoring window for the values
(
w = Window("Frequency Monitor", Rect(10, 10, 220, 190)).front;
c = Array.fill(7, {arg i; StaticText(w, Rect(10, i * 25 + 10, 135, 20)).background_(Color.grey(0.7)).align_(\right)});
c[0].string = ("Centroid: ");
c[1].string = ("Spread: ");
c[2].string = ("Skewness: ");
c[3].string = ("Kurtosis: ");
c[4].string = ("Rolloff: ");
c[5].string = ("Flatness: ");
c[6].string = ("Crest: ");
a = Array.fill(7, {arg i;
StaticText(w, Rect(150, i * 25 + 10, 60, 20)).background_(Color.grey(0.7)).align_(\center);
});
)
//run the wondow updating routine.
(
r = Routine {
{
b.get({ arg val;
{
if(w.isClosed.not) {
val.do({arg item,index;
a[index].string = item.round(0.01)})
}
}.defer
});
0.01.wait;
}.loop
}.play
)
//play a simple sound to observe the values
(
{
var source;
source = BPF.ar(WhiteNoise.ar(), 330, 55/330);
Out.kr(b,FluidSpectralShape.kr(source));
source.dup;
}.play;
)
::
STRONG::A commented tutorial on how each descriptor behaves with test signals: ::
CODE::
// as above, create a monitoring bus for the descriptors
b = Bus.new(\control,0,7);
//again, create a monitoring window for the values
(
w = Window("Frequency Monitor", Rect(10, 10, 220, 190)).front;
c = Array.fill(7, {arg i; StaticText(w, Rect(10, i * 25 + 10, 135, 20)).background_(Color.grey(0.7)).align_(\right)});
c[0].string = ("Centroid: ");
c[1].string = ("Spread: ");
c[2].string = ("Skewness: ");
c[3].string = ("Kurtosis: ");
c[4].string = ("Rolloff: ");
c[5].string = ("Flatness: ");
c[6].string = ("Crest: ");
a = Array.fill(7, {arg i;
StaticText(w, Rect(150, i * 25 + 10, 60, 20)).background_(Color.grey(0.7)).align_(\center);
});
)
// this time, update a little more slowly, and convert in Hz the 3 descriptors published in bins by the algorythm.
(
r = Routine {
{
b.get({ arg val;
{
if(w.isClosed.not) {
val.do({arg item,index;
if ((index < 2) || (index == 4))
{
a[index].string = (item * s.sampleRate / 1024).round(0.01);
} {
a[index].string = item.round(0.01);
};
})
}
}.defer
});
0.2.wait;
}.loop
}.play
)
// first, a sine wave
(
x = {
arg freq=220;
var source;
source = SinOsc.ar(freq,mul:0.1);
Out.kr(b, VarLag.kr(FluidSpectralShape.kr(source),1024/s.sampleRate));
source.dup;
}.play;
)
// at 220, the centroid is on the frequency, the spread is narrow, but as wide as the FFT Hann window ripples, the skewness is high as we are low and therefore far left of the middle bin (aka half-Nyquist), the Kurtosis is incredibly high as we have a very peaky spectrum. The rolloff is slightly higher than the frequency, taking into account the FFT windowing ripples, the flatness is incredibly low, as we have one peak and not much else, and the crest is quite high, because most of the energy is in a few peaky bins.
x.set(\freq, 440)
// at 440, the skewness has changed (we are nearer the middle of the spectrogram) and the Kurtosis too, although it is still so high it is quite in the same order of magnitude. The rest is stable, as expected.
x.set(\freq, 11000)
// at 11kHz, kurtosis is still in the thousand, but skewness is almost null, as expected.
x.free
// second, broadband noise
(
x = {
arg type = 0;
var source;
source = Select.ar(type,[WhiteNoise.ar(0.1),PinkNoise.ar(0.1)]);
Out.kr(b, VarLag.kr(FluidSpectralShape.kr(source),1024/s.sampleRate));
source.dup;
}.play;
)
// white noise has a linear repartition of energy, so we would expect a centroid in the middle bin (aka half-Nyquist) with a spread covering the full range (+/- a quarter-Nyquist), with a skewness almost null since we are centered, and a very low Kurtosis since we are flat. The rolloff should be almost at Nyquist, the flatness as high as it gets, and the crest quite low.
x.set(\type, 1)
// pink noise has a drop of 3dB per octave across the spectrum, so we would, by comparison, expect a lower centroid, a slighly higher skewness and kurtosis, a lower rolloff, a slighly lower flatness and a higher crest for the larger low-end energy.
x.free
// third, bands of noise
(
x = {
arg type = 0;
var source, chain;
chain = FFT(LocalBuf(1024), WhiteNoise.ar(0.5));
chain = chain.pvcollect(1024, {arg mag,phase;[mag,phase]},5,11,1);
source = Select.ar(type,[
BPF.ar(BPF.ar(WhiteNoise.ar(0.5),330,0.666),330,0.666),
IFFT(chain)]);
Out.kr(b, VarLag.kr(FluidSpectralShape.kr(source),1024/s.sampleRate));
source.dup;
}.play;
)
// a second-order bandpass filter on whitenoise, centred on 330Hz with one octave bandwidth, gives us a centroid quite high. This is due to the exponential behaviour of the filter, with a gentle slope. Observe the spectral analyser:
s.freqscope
// at first it seems quite centred, but then flip the argument FrqScl to lin(ear) and observe how high the spectrum goes. If we set it to a brickwall spectral filter tuned on the same frequencies:
x.set(\type, 1)
// we have a much narrower register, and our centroid and spread, as well as the kurtosis and flatness, agrees with this reading.
x.free
//fourth, equally spaced sines
(
x = {
arg freq = 220;
var source;
source = Mix.fill(7, {arg ind; SinOsc.ar(freq + (ind * (220 / 6)), 0, 0.02)});
Out.kr(b,FluidSpectralShape.kr(source));
source.dup;
}.play;
)
// this example shows a similar result to the brickwall spectral bandpass above. If we move the central frequency nearer the half-Nyquist:
x.set(\freq, 8800)
// we can observe that the linear spread is kept the same, since there is the same linear distance in Hz between our frequencies. Skewness is a good indication here of where we are in the spectrum with the shape.
::

@ -0,0 +1,255 @@
TITLE:: FluidMelBands
SUMMARY:: Seven Spectral Shape Descriptors in Real-Time
CATEGORIES:: Libraries>FluidDecomposition
RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/SpecCentroid, Classes/SpecFlatness, Classes/SpecCentroid, Classes/SpecPcile
DESCRIPTION::
This class implements seven of the most popular spectral shape descriptors, computed on a linear scale for both amplitude and frequency. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.FOOTNOTE:: This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Unions Horizon 2020 research and innovation programme (grant agreement No 725899).::
The descriptors are:
LIST::
##the four first statistical moments (https://en.wikipedia.org/wiki/Moment_(mathematics) ), more commonly known as:
LIST::
## the spectral centroid (1) in spectral bins as units. This is the point that splits the spectrum in 2 halves of equal energy. It is the weighted average of the magnitude spectrum.
## the spectral spread (2) in spectral bins. This is the standard deviation of the spectrum envelop, or the average of the distance to the centroid.
## the normalised skewness (3) as ratio. This indicates how tilted is the spectral curve in relation to the middle of the spectral frame, i.e. half of the Nyquist frequency. If it is below the bin representing that frequency, i.e. the central bin of the magnitude spectrum, it is positive.
## the normalised kurtosis (4) as ratio. This indicates how focused is the spectral curve. If it is peaky, it is high.
::
## the rolloff (5) in bin number. This indicates the bin under which 95% of the energy is included.
## the flatness (6) in dB. This is the ratio of geometric mean of the magnitude, over the arithmetic mean of the magnitudes. It yields a very approximate measure on how noisy a signal is.
## the crest (7) in dB. This is the ratio of the loudest magnitude over the RMS of the whole frame. A high number is an indication of a loud peak poking out from the overal spectral curve.
The drawings in Peeters 2003 (http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Peeters_2003_cuidadoaudiofeatures.pdf) are useful, as are the commented examples below. For the mathematically-inclined reader, the tutorials and code offered here (https://www.audiocontentanalysis.org/) are interesting to further the understanding.
::
The process will return a multichannel control steam with the seven values, which will be repeated if no change happens within the algorythm, i.e. when the hopSize is larger than the server's kr period.
CLASSMETHODS::
METHOD:: kr
The audio rate in, control rate out version of the object.
ARGUMENT:: in
The audio to be processed.
ARGUMENT:: numBands
(describe argument here)
ARGUMENT:: minFreq
(describe argument here)
ARGUMENT:: maxFreq
(describe argument here)
ARGUMENT:: maxNumBands
(describe argument here)
ARGUMENT:: winSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hope size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
A 7-channel KR signal with the seven spectral shape descriptors. The latency is winSize.
EXAMPLES::
code::
//create a monitoring bus for the descriptors
b = Bus.new(\control,0,7);
//create a monitoring window for the values
(
w = Window("Frequency Monitor", Rect(10, 10, 220, 190)).front;
c = Array.fill(7, {arg i; StaticText(w, Rect(10, i * 25 + 10, 135, 20)).background_(Color.grey(0.7)).align_(\right)});
c[0].string = ("Centroid: ");
c[1].string = ("Spread: ");
c[2].string = ("Skewness: ");
c[3].string = ("Kurtosis: ");
c[4].string = ("Rolloff: ");
c[5].string = ("Flatness: ");
c[6].string = ("Crest: ");
a = Array.fill(7, {arg i;
StaticText(w, Rect(150, i * 25 + 10, 60, 20)).background_(Color.grey(0.7)).align_(\center);
});
)
//run the wondow updating routine.
(
r = Routine {
{
b.get({ arg val;
{
if(w.isClosed.not) {
val.do({arg item,index;
a[index].string = item.round(0.01)})
}
}.defer
});
0.01.wait;
}.loop
}.play
)
//play a simple sound to observe the values
(
{
var source;
source = BPF.ar(WhiteNoise.ar(), 330, 55/330);
Out.kr(b,FluidSpectralShape.kr(source));
source.dup;
}.play;
)
::
STRONG::A commented tutorial on how each descriptor behaves with test signals: ::
CODE::
// as above, create a monitoring bus for the descriptors
b = Bus.new(\control,0,7);
//again, create a monitoring window for the values
(
w = Window("Frequency Monitor", Rect(10, 10, 220, 190)).front;
c = Array.fill(7, {arg i; StaticText(w, Rect(10, i * 25 + 10, 135, 20)).background_(Color.grey(0.7)).align_(\right)});
c[0].string = ("Centroid: ");
c[1].string = ("Spread: ");
c[2].string = ("Skewness: ");
c[3].string = ("Kurtosis: ");
c[4].string = ("Rolloff: ");
c[5].string = ("Flatness: ");
c[6].string = ("Crest: ");
a = Array.fill(7, {arg i;
StaticText(w, Rect(150, i * 25 + 10, 60, 20)).background_(Color.grey(0.7)).align_(\center);
});
)
// this time, update a little more slowly, and convert in Hz the 3 descriptors published in bins by the algorythm.
(
r = Routine {
{
b.get({ arg val;
{
if(w.isClosed.not) {
val.do({arg item,index;
if ((index < 2) || (index == 4))
{
a[index].string = (item * s.sampleRate / 1024).round(0.01);
} {
a[index].string = item.round(0.01);
};
})
}
}.defer
});
0.2.wait;
}.loop
}.play
)
// first, a sine wave
(
x = {
arg freq=220;
var source;
source = SinOsc.ar(freq,mul:0.1);
Out.kr(b, VarLag.kr(FluidSpectralShape.kr(source),1024/s.sampleRate));
source.dup;
}.play;
)
// at 220, the centroid is on the frequency, the spread is narrow, but as wide as the FFT Hann window ripples, the skewness is high as we are low and therefore far left of the middle bin (aka half-Nyquist), the Kurtosis is incredibly high as we have a very peaky spectrum. The rolloff is slightly higher than the frequency, taking into account the FFT windowing ripples, the flatness is incredibly low, as we have one peak and not much else, and the crest is quite high, because most of the energy is in a few peaky bins.
x.set(\freq, 440)
// at 440, the skewness has changed (we are nearer the middle of the spectrogram) and the Kurtosis too, although it is still so high it is quite in the same order of magnitude. The rest is stable, as expected.
x.set(\freq, 11000)
// at 11kHz, kurtosis is still in the thousand, but skewness is almost null, as expected.
x.free
// second, broadband noise
(
x = {
arg type = 0;
var source;
source = Select.ar(type,[WhiteNoise.ar(0.1),PinkNoise.ar(0.1)]);
Out.kr(b, VarLag.kr(FluidSpectralShape.kr(source),1024/s.sampleRate));
source.dup;
}.play;
)
// white noise has a linear repartition of energy, so we would expect a centroid in the middle bin (aka half-Nyquist) with a spread covering the full range (+/- a quarter-Nyquist), with a skewness almost null since we are centered, and a very low Kurtosis since we are flat. The rolloff should be almost at Nyquist, the flatness as high as it gets, and the crest quite low.
x.set(\type, 1)
// pink noise has a drop of 3dB per octave across the spectrum, so we would, by comparison, expect a lower centroid, a slighly higher skewness and kurtosis, a lower rolloff, a slighly lower flatness and a higher crest for the larger low-end energy.
x.free
// third, bands of noise
(
x = {
arg type = 0;
var source, chain;
chain = FFT(LocalBuf(1024), WhiteNoise.ar(0.5));
chain = chain.pvcollect(1024, {arg mag,phase;[mag,phase]},5,11,1);
source = Select.ar(type,[
BPF.ar(BPF.ar(WhiteNoise.ar(0.5),330,0.666),330,0.666),
IFFT(chain)]);
Out.kr(b, VarLag.kr(FluidSpectralShape.kr(source),1024/s.sampleRate));
source.dup;
}.play;
)
// a second-order bandpass filter on whitenoise, centred on 330Hz with one octave bandwidth, gives us a centroid quite high. This is due to the exponential behaviour of the filter, with a gentle slope. Observe the spectral analyser:
s.freqscope
// at first it seems quite centred, but then flip the argument FrqScl to lin(ear) and observe how high the spectrum goes. If we set it to a brickwall spectral filter tuned on the same frequencies:
x.set(\type, 1)
// we have a much narrower register, and our centroid and spread, as well as the kurtosis and flatness, agrees with this reading.
x.free
//fourth, equally spaced sines
(
x = {
arg freq = 220;
var source;
source = Mix.fill(7, {arg ind; SinOsc.ar(freq + (ind * (220 / 6)), 0, 0.02)});
Out.kr(b,FluidSpectralShape.kr(source));
source.dup;
}.play;
)
// this example shows a similar result to the brickwall spectral bandpass above. If we move the central frequency nearer the half-Nyquist:
x.set(\freq, 8800)
// we can observe that the linear spread is kept the same, since there is the same linear distance in Hz between our frequencies. Skewness is a good indication here of where we are in the spectrum with the shape.
::
Loading…
Cancel
Save