From c4a4d9ecaa99117463b091b8c9512d4726c0d157 Mon Sep 17 00:00:00 2001 From: Pierre Alexandre Tremblay Date: Sun, 19 May 2019 21:53:49 +0100 Subject: [PATCH] (buf) melbands helpfile and demo (plus typo in ampslice) --- .../Classes/FluidBufMelBands.schelp | 44 +++------ .../HelpSource/Classes/FluidMelBands.schelp | 92 +++++++++++-------- .../Classes/FluidSpectralShape.schelp | 2 +- 3 files changed, 70 insertions(+), 68 deletions(-) diff --git a/release-packaging/HelpSource/Classes/FluidBufMelBands.schelp b/release-packaging/HelpSource/Classes/FluidBufMelBands.schelp index 8398ad4..3175ba8 100644 --- a/release-packaging/HelpSource/Classes/FluidBufMelBands.schelp +++ b/release-packaging/HelpSource/Classes/FluidBufMelBands.schelp @@ -1,29 +1,13 @@ TITLE:: FluidBufMelBands -SUMMARY:: Seven Spectral Shape Descriptors on a Buffer +SUMMARY:: A Perceptually Spread Spectral Contour Descriptor on a Buffer CATEGORIES:: Libraries>FluidDecomposition -RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/SpecCentroid, Classes/SpecFlatness, Classes/SpecCentroid, Classes/SpecPcile +RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/FluidBufMFCC DESCRIPTION:: -This class implements seven of the most popular spectral shape descriptors, computed on a linear scale for both amplitude and frequency. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.FOOTNOTE:: This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).:: - -The descriptors are: -LIST:: -##the four first statistical moments (https://en.wikipedia.org/wiki/Moment_(mathematics) ), more commonly known as: - LIST:: - ## the spectral centroid (1) in spectral bins as units. This is the point that splits the spectrum in 2 halves of equal energy. It is the weighted average of the magnitude spectrum. - ## the spectral spread (2) in spectral bins. This is the standard deviation of the spectrum envelop, or the average of the distance to the centroid. - ## the normalised skewness (3) as ratio. This indicates how tilted is the spectral curve in relation to the middle of the spectral frame, i.e. half of the Nyquist frequency. If it is below the bin representing that frequency, i.e. the central bin of the magnitude spectrum, it is positive. - ## the normalised kurtosis (4) as ratio. This indicates how focused is the spectral curve. If it is peaky, it is high. - :: - ## the rolloff (5) in bin number. This indicates the bin under which 95% of the energy is included. - ## the flatness (6) in dB. This is the ratio of geometric mean of the magnitude, over the arithmetic mean of the magnitudes. It yields a very approximate measure on how noisy a signal is. - ## the crest (7) in dB. This is the ratio of the loudest magnitude over the RMS of the whole frame. A high number is an indication of a loud peak poking out from the overal spectral curve. - - The drawings in Peeters 2003 (http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Peeters_2003_cuidadoaudiofeatures.pdf) are useful, as are the commented examples below. For the mathematically-inclined reader, the tutorials and code offered here (https://www.audiocontentanalysis.org/) are interesting to further the understanding. -:: +This class implements a spectral shape descriptor where the amplitude is given for a number of equally spread perceptual bands. The spread is based on the Mel scale (https://en.wikipedia.org/wiki/Mel_scale) which is one of the first attempt to mimic pitch perception scientifically. This implementation allows to select the range and number of bands dynamically. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.FOOTNOTE:: This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).:: - The process will return a multichannel buffer with the seven channels per input channel, each containing the 7 shapes. Each sample represents a value, which is every hopSize. +The process will return a single multichannel buffer of STRONG::numBands:: per input channel. Each frame represents a value, which is every hopSize. CLASSMETHODS:: @@ -52,19 +36,19 @@ ARGUMENT:: features The destination buffer for the 7 spectral features describing the spectral shape. ARGUMENT:: numBands -(describe argument here) + The number of bands that will be perceptually equally distributed between STRONG::minFreq:: and STRONG::maxFreq::. It will decide how many channels are produce per channel of the source. ARGUMENT:: minFreq -(describe argument here) + The lower boundary of the lowest band of the model, in Hz. ARGUMENT:: maxFreq -(describe argument here) + The highest boundary of the highest band of the model, in Hz. ARGUMENT:: winSize - The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty + The window size. As spectral description relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty ARGUMENT:: hopSize - The window hope size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. + The window hop size. As spectral description relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. ARGUMENT:: fftSize The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. @@ -80,7 +64,7 @@ EXAMPLES:: code:: // create some buffers ( -b = Buffer.read(s,File.realpath(FluidBufSpectralShape.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Nicol-LoopE-M.wav"); +b = Buffer.read(s,File.realpath(FluidBufMelBands.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Nicol-LoopE-M.wav"); c = Buffer.new(s); ) @@ -88,7 +72,7 @@ c = Buffer.new(s); ( Routine{ t = Main.elapsedTime; - FluidBufMelBands.process(s, b, features: c,numBands:10); + FluidBufMelBands.process(s, b, features: c, numBands:10); (Main.elapsedTime - t).postln; }.play ) @@ -118,11 +102,11 @@ c = Buffer.new(s); ( Routine{ t = Main.elapsedTime; - FluidBufSpectralShape.process(s, b, features: c); + FluidBufMelBands.process(s, b, features: c, numBands:10); (Main.elapsedTime - t).postln; }.play ) -// look at the buffer: 7shapes for left, then 7 shapes for right -c.plot(minval:-25, maxval:150) +// look at the buffer: 10 bands for left, then 10 bands for right +c.plot(minval:0, maxval:100) :: \ No newline at end of file diff --git a/release-packaging/HelpSource/Classes/FluidMelBands.schelp b/release-packaging/HelpSource/Classes/FluidMelBands.schelp index e28773e..4e3ffbc 100644 --- a/release-packaging/HelpSource/Classes/FluidMelBands.schelp +++ b/release-packaging/HelpSource/Classes/FluidMelBands.schelp @@ -1,28 +1,12 @@ TITLE:: FluidMelBands -SUMMARY:: Seven Spectral Shape Descriptors in Real-Time +SUMMARY:: A Perceptually Spread Spectral Contour Descriptor in Real-Time CATEGORIES:: Libraries>FluidDecomposition -RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/SpecCentroid, Classes/SpecFlatness, Classes/SpecCentroid, Classes/SpecPcile +RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/FluidMFCC DESCRIPTION:: -This class implements seven of the most popular spectral shape descriptors, computed on a linear scale for both amplitude and frequency. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.FOOTNOTE:: This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).:: - -The descriptors are: -LIST:: -##the four first statistical moments (https://en.wikipedia.org/wiki/Moment_(mathematics) ), more commonly known as: - LIST:: - ## the spectral centroid (1) in spectral bins as units. This is the point that splits the spectrum in 2 halves of equal energy. It is the weighted average of the magnitude spectrum. - ## the spectral spread (2) in spectral bins. This is the standard deviation of the spectrum envelop, or the average of the distance to the centroid. - ## the normalised skewness (3) as ratio. This indicates how tilted is the spectral curve in relation to the middle of the spectral frame, i.e. half of the Nyquist frequency. If it is below the bin representing that frequency, i.e. the central bin of the magnitude spectrum, it is positive. - ## the normalised kurtosis (4) as ratio. This indicates how focused is the spectral curve. If it is peaky, it is high. - :: - ## the rolloff (5) in bin number. This indicates the bin under which 95% of the energy is included. - ## the flatness (6) in dB. This is the ratio of geometric mean of the magnitude, over the arithmetic mean of the magnitudes. It yields a very approximate measure on how noisy a signal is. - ## the crest (7) in dB. This is the ratio of the loudest magnitude over the RMS of the whole frame. A high number is an indication of a loud peak poking out from the overal spectral curve. - - The drawings in Peeters 2003 (http://recherche.ircam.fr/anasyn/peeters/ARTICLES/Peeters_2003_cuidadoaudiofeatures.pdf) are useful, as are the commented examples below. For the mathematically-inclined reader, the tutorials and code offered here (https://www.audiocontentanalysis.org/) are interesting to further the understanding. -:: +This class implements a spectral shape descriptor where the amplitude is given for a number of equally spread perceptual bands. The spread is based on the Mel scale (https://en.wikipedia.org/wiki/Mel_scale) which is one of the first attempt to mimic pitch perception scientifically. This implementation allows to select the range and number of bands dynamically. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.FOOTNOTE:: This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).:: - The process will return a multichannel control steam with the seven values, which will be repeated if no change happens within the algorythm, i.e. when the hopSize is larger than the server's kr period. +The process will return a multichannel control steam of size STRONG::maxNumBands::, which will be repeated if no change happens within the algorythm, i.e. when the hopSize is larger than the server's kr period. CLASSMETHODS:: @@ -33,13 +17,13 @@ ARGUMENT:: in The audio to be processed. ARGUMENT:: numBands -(describe argument here) + The number of bands that will be perceptually equally distributed between STRONG::minFreq:: and STRONG::maxFreq::. It is limited by the STRONG::maxNumBands:: parameter. When the number is smaller than the maximum, the output is zero-padded. ARGUMENT:: minFreq -(describe argument here) + The lower boundary of the lowest band of the model, in Hz. ARGUMENT:: maxFreq -(describe argument here) + The highest boundary of the highest band of the model, in Hz. ARGUMENT:: maxNumBands The maximum number of Mel bands that can be modelled. This sets the number of channels of the output, and therefore cannot be modulated. @@ -48,7 +32,7 @@ ARGUMENT:: winSize The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty ARGUMENT:: hopSize - The window hope size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2). + The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2). ARGUMENT:: fftSize The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize. @@ -57,12 +41,11 @@ ARGUMENT:: maxFFTSize How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated. RETURNS:: - A 7-channel KR signal with the seven spectral shape descriptors. The latency is winSize. + A KR signal of STRONG::maxNumBands:: channels, giving the measure amplitudes for each band. The latency is winSize. EXAMPLES:: - code:: //create a monitoring bus for the descriptors b = Bus.new(\control,0,40); @@ -74,11 +57,10 @@ w = Window("Mel Bands Monitor", Rect(10, 10, 620, 320)).front; a = MultiSliderView(w,Rect(10, 10, 600, 300)).elasticMode_(1).isFilled_(1); ) -//run the wondow updating routine. +//run the window updating routine. ( r = Routine { { - b.get({ arg val; { if(w.isClosed.not) { @@ -88,18 +70,54 @@ r = Routine { }); 0.01.wait; }.loop - }.play ) //play a simple sound to observe the values ( - { - var source; - // source = SinOsc.ar(220,0,0.1); - source = BPF.ar(WhiteNoise.ar(), 330, 55/330); - Out.kr(b,FluidMelBands.kr(source,maxNumBands:40)); - source.dup; - }.play; +x = { + var source = SinOsc.ar(LFTri.kr(0.1).exprange(80,800),0,0.1); + Out.kr(b,FluidMelBands.kr(source,maxNumBands:40) / 50); + source.dup; +}.play; +) + +// free this source +x.free + +// load a more exciting one +c = Buffer.read(s,File.realpath(FluidMelBands.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-AaS-SynthTwoVoices-M.wav"); + +// analyse with parameters to be changed +( +x = {arg bands = 40, low = 20, high = 20000; + var source = PlayBuf.ar(1,c,loop:1); + Out.kr(b,FluidMelBands.kr(source, bands, low, high, 40) / 10); + source.dup; +}.play; ) -:: \ No newline at end of file + +// observe the number of bands. The unused ones at the top are not updated +x.set(\bands,20) + +// back to the full range +x.set(\bands,40) + +// focus all the bands on a mid range +x.set(\low,320, \high, 8000) + +// focusing on the low end shows the fft resolution issue. One could restart the analysis with a larger fft to show more precision +x.set(\low,20, \high, 160) + +// back to full range +x.set(\low,20, \high, 20000) + +// free everything +x.free;b.free;c.free;r.stop; +:: + +STRONG::A musical example:: + +CODE:: +// todo: port the Max one +:: diff --git a/release-packaging/HelpSource/Classes/FluidSpectralShape.schelp b/release-packaging/HelpSource/Classes/FluidSpectralShape.schelp index 3977f16..5dc28c4 100644 --- a/release-packaging/HelpSource/Classes/FluidSpectralShape.schelp +++ b/release-packaging/HelpSource/Classes/FluidSpectralShape.schelp @@ -74,7 +74,7 @@ a = Array.fill(7, {arg i; }); ) -//run the wondow updating routine. +//run the window updating routine. ( r = Routine { {