In mode 2, the threshold of the high part for the percussive filter. That threshold applies to all frequencies above percThreshFreq2. The threshold between percThreshFreq1 and percThreshFreq2 is interpolated between percThreshAmp1 and percThreshAmp2. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size in samples. As HPSS relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
A flag to switch the computation of TruePeak. On by default, removing it makes the algorithm more CPU efficient by reverting to a simple absolute peak of the frame.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The size of the window on which the computation is done. By default 1024 to be similar with all other FluCoMa objects, the EBU specifies other values as per the examples below.
The highest boundary of the highest band of the model, in Hz.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As MFCC computation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
@ -109,6 +109,6 @@ Routine{
}.play
)
// look at the buffer: 5 coefs for left, then 5 coefs for right
// look at the buffer: 5 coefs for left, then 5 coefs for right (the first of each is linked to the loudness)
The highest boundary of the highest band of the model, in Hz.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As spectral description relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
@ -108,5 +108,5 @@ Routine{
)
// look at the buffer: 10 bands for left, then 10 bands for right
The FluidBufNMF object decomposes the spectrum of a sound into a number of components using Non-Negative Matrix Factorisation (NMF) footnote:: Lee, Daniel D., and H. Sebastian Seung. 1999. ‘Learning the Parts of Objects by Non-Negative Matrix Factorization’. Nature 401 (6755): 788–91. https://doi.org/10.1038/44565.
::. NMF has been a popular technique in signal processing research for things like source separation and transcription footnote:: Smaragdis and Brown, Non-Negative Matrix Factorization for Polyphonic Music Transcription.::, although its creative potential is so far relatively unexplored.
The algorithm takes a buffer in and divides it into a number of components, determined by the rank argument. It works iteratively, by trying to find a combination of spectral templates ('bases') and envelopes ('activations') that yield the original magnitude spectrogram when added together. By and large, there is no unique answer to this question (i.e. there are different ways of accounting for an evolving spectrum in terms of some set of templates and envelopes). In its basic form, NMF is a form of unsupervised learning: it starts with some random data and then converges towards something that minimizes the distance between its generated data and the original:it tends to converge very quickly at first and then level out. Fewer iterations mean less processing, but also less predictable results.
The algorithm takes a buffer in and divides it into a number of components, determined by the STRONG::Components:: argument. It works iteratively, by trying to find a combination of spectral templates ('bases') and envelopes ('activations') that yield the original magnitude spectrogram when added together. By and large, there is no unique answer to this question (i.e. there are different ways of accounting for an evolving spectrum in terms of some set of templates and envelopes). In its basic form, NMF is a form of unsupervised learning: it starts with some random data and then converges towards something that minimizes the distance between its generated data and the original:it tends to converge very quickly at first and then level out. Fewer iterations mean less processing, but also less predictable results.
The object can return either or all of the following: LIST::
## a spectral contour of each component in the form of a magnitude spectrogram (called a basis in NMF lingo);
@ -20,8 +20,8 @@ The bases and activations can be used to make a kind of vocoder based on what NM
Some additional options and flexibility can be found through combinations of the basesMode and actMode arguments. If these flags are set to 1, the object expects to be supplied with pre-formed spectra (or envelopes) that will be used as seeds for the decomposition, providing more guided results. When set to 2, the supplied buffers won't be updated, so become templates to match against instead. Note that having both bases and activations set to 2 doesn't make sense, so the object will complain.
If supplying pre-formed data, it's up to the user to make sure that the supplied buffers are the right size: LIST::
## bases must be STRONG::(fft size / 2) + 1:: frames and STRONG::(rank * input channels):: channels
## activations must be STRONG::(input frames / hopSize) + 1:: frames and STRONG::(rank * input channels):: channels
## bases must be STRONG::(fft size / 2) + 1:: frames and STRONG::(components * input channels):: channels
## activations must be STRONG::(input frames / hopSize) + 1:: frames and STRONG::(components * input channels):: channels
::
In this implementation, the components are reconstructed by masking the original spectrum, such that they will sum to yield the original sound.
@ -57,7 +57,7 @@ ARGUMENT:: numChans
For multichannel srcBuf, how many channel should be processed.
ARGUMENT:: destination
The index of the buffer where the different reconstructed ranks will be reconstructed. The buffer will be resized to STRONG::rank * numChannelsProcessed:: channels and STRONG::sourceDuration:: lenght. If STRONG::nil:: is provided, the reconstruction will not happen.
The index of the buffer where the different reconstructed components will be reconstructed. The buffer will be resized to STRONG::components * numChannelsProcessed:: channels and STRONG::sourceDuration:: lenght. If STRONG::nil:: is provided, the reconstruction will not happen.
ARGUMENT:: bases
The index of the buffer where the different bases will be written to and/or read from: the behaviour is set in the following argument. If STRONG::nil:: is provided, no bases will be returned.
@ -65,7 +65,7 @@ ARGUMENT:: bases
ARGUMENT:: basesMode
This flag decides of how the basis buffer passed as the previous argument is treated.
table::
## 0 || The bases are seeded randomly, and the resulting ones will be written after the process in the passed buffer. The buffer is resized to STRONG::rank * numChannelsProcessed:: channels and STRONG::(fftSize / 2 + 1):: lenght.
## 0 || The bases are seeded randomly, and the resulting ones will be written after the process in the passed buffer. The buffer is resized to STRONG::components * numChannelsProcessed:: channels and STRONG::(fftSize / 2 + 1):: lenght.
## 1 || The passed buffer is considered as seed for the bases. Its dimensions should match the values above. The resulting bases will replace the seed ones.
## 2 || The passed buffer is considered as a template for the bases, and will therefore not change. Its bases should match the values above.
::
@ -76,18 +76,18 @@ ARGUMENT:: activations
ARGUMENT:: actMode
This flag decides of how the activation buffer passed as the previous argument is treated.
table::
## 0 || The activations are seeded randomly, and the resulting ones will be written after the process in the passed buffer. The buffer is resized to STRONG::rank * numChannelsProcessed:: channels and STRONG::(sourceDuration / hopsize + 1):: lenght.
## 0 || The activations are seeded randomly, and the resulting ones will be written after the process in the passed buffer. The buffer is resized to STRONG::components * numChannelsProcessed:: channels and STRONG::(sourceDuration / hopsize + 1):: lenght.
## 1 || The passed buffer is considered as seed for the activations. Its dimensions should match the values above. The resulting activations will replace the seed ones.
## 2 || The passed buffer is considered as a template for the activations, and will therefore not change. Its dimensions should match the values above.
::
ARGUMENT:: rank
ARGUMENT:: components
The number of elements the NMF algorithm will try to divide the spectrogram of the source in.
ARGUMENT:: numIter
ARGUMENT:: iterations
The NMF process is iterative, trying to converge to the smallest error in its factorisation. The number of iterations will decide how many times it tries to adjust its estimates. Higher numbers here will be more CPU expensive, lower numbers will be more unpredictable in quality.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As NMF relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
@ -96,10 +96,10 @@ ARGUMENT:: hopSize
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision.
ARGUMENT:: winType
ARGUMENT:: windowType
The inner FFT/IFFT windowing type (not implemented yet)
ARGUMENT:: randSeed
ARGUMENT:: randomSeed
The NMF process needs to seed its starting point. If specified, the same values will be used. The default of -1 will randomly assign them. (not implemented yet)
The size of a smoothing filter that is applied on the novelty curve. A larger filter filter size allows for cleaner cuts on very sharp changes.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As novelty estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
The thresholding of a new slice. Value ranges are different for each function, from 0 upwards.
ARGUMENT:: debounce
ARGUMENT:: minSliceLength
The minimum duration of a slice in number of hopSize.
ARGUMENT:: filterSize
@ -61,11 +61,11 @@ ARGUMENT:: filterSize
ARGUMENT:: frameDelta
For certain functions (HFC, SpectralFlux, MKL, Cosine), the distance does not have to be computed between consecutive frames. By default (0) it is, otherwise this sets the distane between the comparison window in samples.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As spectral differencing relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hop size. As spectral differencing relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
The window hop size. As spectral differencing relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
## 2 || YinFFT: Implements the frequency domain version of the YIN algorithm, as described in FOOTNOTE::P. M. Brossier, "Automatic Annotation of Musical Audio for Interactive Applications.” QMUL, London, UK, 2007. :: See also https://essentia.upf.edu/documentation/reference/streaming_PitchYinFFT.html
::
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
@ -80,9 +80,7 @@ Routine{
)
// look at the analysis
c.plot(minval:0, maxval:400)
// plot with a different range to appreciate the confidence:
c.plot(minval:0, maxval:1)
c.plot(separately:true)
// The values are interleaved [pitch,confidence] in the buffer as they are on 2 channels: to get to the right frame, divide the SR of the input by the hopSize, then multiply by 2 because of the channel interleaving
// here we are querying from one frame before (the signal starts at 8192, which is frame 16 (8192/512), therefore starting the query at frame 15, which is index 30.
@ -117,7 +115,7 @@ Routine{
)
// look at the buffer: [pitch,confidence] for left then [pitch,confidence] for right
c.plot(minval:0, maxval:1500)
c.plot(separately:true)
::
STRONG::A musical example.::
@ -151,7 +149,7 @@ d.do({
e.postln;
//granulate only the frames that are in our buffer
// We need to convert our indices to frame start. Their position was (index * hopSize) - (winSize) in FluidBufPitch
// We need to convert our indices to frame start. Their position was (index * hopSize) - (windowSize) in FluidBufPitch
The weight of the frequency proximity of a peak when trying to associate it to an existing track (relative to magWeight - suggested between 0 to 1)
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
The destination buffer for the 7 spectral features describing the spectral shape.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As spectral shape estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
The threshold of the offset of the smoothed error function. As it proceeds backwards in time, it allows tight ending of the identification of the anomaly.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The averaging window of the error detection function. It needs smoothing as it is very jittery. The longer the window, the less precise, but the less false positives.
ARGUMENT:: debounce
ARGUMENT:: clumpLength
The window size in sample within which positive detections will be clumped together to avoid overdetecting in time.
The threshold of the offset of the smoothed error function. As it proceeds backwards in time, it allows tight ending of the identification of the anomaly.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The averaging window of the error detection function. It needs smoothing as it is very jittery. The longer the window, the less precise, but the less false positive.
ARGUMENT:: debounce
ARGUMENT:: clumpLength
The window size in sample within which positive detections will be clumped together to avoid overdetecting in time.
ARGUMENT:: action
@ -135,7 +135,7 @@ d = Buffer.new(s); e = Buffer.new(s);
In mode 2, the threshold of the high part for the percussive filter. That threshold applies to all frequencies above percThreshFreq2. The threshold between percThreshFreq1 and percThreshFreq2 is interpolated between percThreshAmp1 and percThreshAmp2. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
@ -85,7 +85,7 @@ ARGUMENT:: maxPercFilterSize
How large can the percussive filter be modulated to (percFilterSize), by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
An array of three audio streams: [0] is the harmonic part extracted, [1] is the percussive part extracted, [2] is the rest. The latency between the input and the output is ((harmFilterSize - 1) * hopSize) + winSize) samples.
An array of three audio streams: [0] is the harmonic part extracted, [1] is the percussive part extracted, [2] is the rest. The latency between the input and the output is ((harmFilterSize - 1) * hopSize) + windowSize) samples.
Discussion::
HPSS works by using median filters on the spectral magnitudes of a sound. It hinges on a simple modelling assumption that tonal components will tend to yield concentrations of energy across time, spread out in frequency, and percussive components will manifest as concentrations of energy across frequency, spread out in time. By using median filters across time and frequency respectively, we get initial esitmates of the tonal-ness / transient-ness of a point in time and frequency. These are then combined into 'masks' that are applied to the orginal spectral data in order to produce a separation.
@ -112,6 +112,6 @@ b = Buffer.read(s,File.realpath(FluidHPSS.class.filenameSymbol).dirname.withTrai
A flag to switch the computation of TruePeak. On by default, removing it makes the algorithm more CPU efficient by reverting to a simple absolute peak of the frame.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The size of the window on which the computation is done. By default 1024 to be similar with all other FluCoMa objects, the EBU specifies other values as per the examples below.
ARGUMENT:: hopSize
How much the buffered window moves forward, in samples. By default 512 to be similar with all other FluCoMa objects, the EBU specifies other values as per the examples below.
ARGUMENT:: maxWinSize
How large can the winSize be, by allocating memory at instantiation time. This cannot be modulated.
ARGUMENT:: maxwindowSize
How large can the windowSize be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
A 2-channel KR signal with the [pitch, confidence] descriptors. The latency is winSize.
A 2-channel KR signal with the [pitch, confidence] descriptors. The latency is windowSize.
EXAMPLES::
@ -88,7 +88,7 @@ x.free
x = {
arg freq=220, type = 1, noise = 0;
var source = PinkNoise.ar(noise) + Select.ar(type,[DC.ar(),SinOsc.ar(freq,mul:0.1), VarSaw.ar(freq,mul:0.1), Saw.ar(freq,0.1), Pulse.ar(freq,mul:0.1)]);
The maximum number of cepstral coefficients that can be computed. This sets the number of channels of the output, and therefore cannot be modulated.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As MFCC computation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hop size. As MFCC computation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
The window hop size. As MFCC computation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
@ -44,7 +44,7 @@ ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
A KR signal of STRONG::maxNumCoefs:: channels. The latency is winSize.
A KR signal of STRONG::maxNumCoefs:: channels. The latency is windowSize.
The maximum number of Mel bands that can be modelled. This sets the number of channels of the output, and therefore cannot be modulated.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
@ -41,7 +41,7 @@ ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
A KR signal of STRONG::maxNumBands:: channels, giving the measure amplitudes for each band. The latency is winSize.
A KR signal of STRONG::maxNumBands:: channels, giving the measure amplitudes for each band. The latency is windowSize.
@ -6,7 +6,7 @@ RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/FluidBufNMF, Classe
DESCRIPTION::
The FluidNMFFilter object decomposes and resynthesises an incoming audio signal against a set of spectral templates using an slimmed-down version of Nonnegative Matrix Factorisation (NMF) footnote:: Lee, Daniel D., and H. Sebastian Seung. 1999. ‘Learning the Parts of Objects by Non-Negative Matrix Factorization’. Nature 401 (6755): 788–91. https://doi.org/10.1038/44565. ::
It outputs at AR the resynthesis of the best factorisation. The spectral templates are presumed to have been produced by the offline NMF process (link::Classes/FluidBufNMF::), and must be the correct size with respect to the FFT settings being used (FFT size / 2 + 1 frames long). The rank of the decomposition is determined by the number of channels in the supplied buffer of templates, up to a maximum set by the STRONG::maxRank:: parameter.
It outputs at AR the resynthesis of the best factorisation. The spectral templates are presumed to have been produced by the offline NMF process (link::Classes/FluidBufNMF::), and must be the correct size with respect to the FFT settings being used (FFT size / 2 + 1 frames long). The rank of the decomposition (number of components) is determined by the number of channels in the supplied buffer of templates, up to a maximum set by the STRONG::maxComponents:: parameter.
NMF has been a popular technique in signal processing research for things like source separation and transcription footnote:: Smaragdis and Brown, Non-Negative Matrix Factorization for Polyphonic Music Transcription.::, although its creative potential is so far relatively unexplored. It works iteratively, by trying to find a combination of amplitudes ('activations') that yield the original magnitude spectrogram of the audio input when added together. By and large, there is no unique answer to this question (i.e. there are different ways of accounting for an evolving spectrum in terms of some set of templates and envelopes). In its basic form, NMF is a form of unsupervised learning: it starts with some random data and then converges towards something that minimizes the distance between its generated data and the original:it tends to converge very quickly at first and then level out. Fewer iterations mean less processing, but also less predictable results.
@ -20,25 +20,25 @@ FluidBufNMF is part of the Fluid Decomposition Toolkit of the FluCoMa project. f
CLASSMETHODS::
METHOD:: ar
The real-time processing method. It takes an audio input, and will yield a audio stream in the form of a multichannel array of size STRONG::maxRank:: . If the bases buffer has fewer than maxRank channels, the remaining outputs will be zeroed.
The real-time processing method. It takes an audio input, and will yield a audio stream in the form of a multichannel array of size STRONG::maxComponents:: . If the bases buffer has fewer than maxComponents channels, the remaining outputs will be zeroed.
ARGUMENT:: in
The signal input to the factorisation process.
ARGUMENT:: bases
The server index of the buffer containing the different bases that the input signal will be matched against. Bases must be STRONG::(fft size / 2) + 1:: frames. If the buffer has more than STRONG::maxRank:: channels, the excess will be ignored.
The server index of the buffer containing the different bases that the input signal will be matched against. Bases must be STRONG::(fft size / 2) + 1:: frames. If the buffer has more than STRONG::maxComponents:: channels, the excess will be ignored.
ARGUMENT::maxRank
ARGUMENT::maxComponents
The maximum number of elements the NMF algorithm will try to divide the spectrogram of the source in. This dictates the number of output channels for the ugen. This cannot be modulated.
ARGUMENT:: numIter
ARGUMENT:: iterations
The NMF process is iterative, trying to converge to the smallest error in its factorisation. The number of iterations will decide how many times it tries to adjust its estimates. Higher numbers here will be more CPU intensive, lower numbers will be more unpredictable in quality.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The number of samples that are analysed at a time. A lower number yields greater temporal resolution, at the expense of spectral resoultion, and vice-versa.
ARGUMENT:: hopSize
The window hop size. As NMF relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
The window hop size. As NMF relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
The FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
@ -79,9 +79,9 @@ d.plot
d.play //////(beware !!!! loud!!!)
(
// separate them in 2 ranks
// separate them in 2 components
Routine {
FluidBufNMF.process(s, d, bases: e, rank:2);
FluidBufNMF.process(s, d, bases: e, components:2);
@ -6,7 +6,7 @@ RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/FluidBufNMF, Classe
DESCRIPTION::
The FluidNMFMatch object matches an incoming audio signal against a set of spectral templates using an slimmed-down version of Nonnegative Matrix Factorisation (NMF) footnote:: Lee, Daniel D., and H. Sebastian Seung. 1999. ‘Learning the Parts of Objects by Non-Negative Matrix Factorization’. Nature 401 (6755): 788–91. https://doi.org/10.1038/44565. ::
It outputs at kr the degree of detected match for each template (the activation amount, in NMF-terms). The spectral templates are presumed to have been produced by the offline NMF process (link::Classes/FluidBufNMF::), and must be the correct size with respect to the FFT settings being used (FFT size / 2 + 1 frames long). The rank of the decomposition is determined by the number of channels in the supplied buffer of templates, up to a maximum set by the STRONG::maxRank:: parameter.
It outputs at kr the degree of detected match for each template (the activation amount, in NMF-terms). The spectral templates are presumed to have been produced by the offline NMF process (link::Classes/FluidBufNMF::), and must be the correct size with respect to the FFT settings being used (FFT size / 2 + 1 frames long). The rank of the decomposition (number of components) is determined by the number of channels in the supplied buffer of templates, up to a maximum set by the STRONG::maxComponents:: parameter.
NMF has been a popular technique in signal processing research for things like source separation and transcription footnote:: Smaragdis and Brown, Non-Negative Matrix Factorization for Polyphonic Music Transcription.::, although its creative potential is so far relatively unexplored. It works iteratively, by trying to find a combination of amplitudes ('activations') that yield the original magnitude spectrogram of the audio input when added together. By and large, there is no unique answer to this question (i.e. there are different ways of accounting for an evolving spectrum in terms of some set of templates and envelopes). In its basic form, NMF is a form of unsupervised learning: it starts with some random data and then converges towards something that minimizes the distance between its generated data and the original:it tends to converge very quickly at first and then level out. Fewer iterations mean less processing, but also less predictable results.
@ -20,25 +20,25 @@ FluidBufNMF is part of the Fluid Decomposition Toolkit of the FluCoMa project. f
CLASSMETHODS::
METHOD:: kr
The real-time processing method. It takes an audio or control input, and will yield a control stream in the form of a multichannel array of size STRONG::maxRank:: . If the bases buffer has fewer than maxRank channels, the remaining outputs will be zeroed.
The real-time processing method. It takes an audio or control input, and will yield a control stream in the form of a multichannel array of size STRONG::maxComponents:: . If the bases buffer has fewer than maxComponents channels, the remaining outputs will be zeroed.
ARGUMENT:: in
The signal input to the factorisation process.
ARGUMENT:: bases
The server index of the buffer containing the different bases that the input signal will be matched against. Bases must be STRONG::(fft size / 2) + 1:: frames. If the buffer has more than STRONG::maxRank:: channels, the excess will be ignored.
The server index of the buffer containing the different bases that the input signal will be matched against. Bases must be STRONG::(fft size / 2) + 1:: frames. If the buffer has more than STRONG::maxComponents:: channels, the excess will be ignored.
ARGUMENT::maxRank
ARGUMENT::maxComponents
The maximum number of elements the NMF algorithm will try to divide the spectrogram of the source in. This dictates the number of output channelsfor the ugen. This cannot be modulated.
ARGUMENT:: numIter
ARGUMENT:: iterations
The NMF process is iterative, trying to converge to the smallest error in its factorisation. The number of iterations will decide how many times it tries to adjust its estimates. Higher numbers here will be more CPU intensive, lower numbers will be more unpredictable in quality.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The number of samples that are analysed at a time. A lower number yields greater temporal resolution, at the expense of spectral resoultion, and vice-versa.
ARGUMENT:: hopSize
The window hop size. As NMF relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
The window hop size. As NMF relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
The FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
@ -79,9 +79,9 @@ d.plot
d.play //////(beware !!!! loud!!!)
(
// separate them in 2 ranks
// separate them in 2 components
Routine {
FluidBufNMF.process(s, d, bases: e, rank:2);
FluidBufNMF.process(s, d, bases: e, components:2);
The thresholding of a new slice. Value ranges are different for each function, from 0 upwards.
ARGUMENT:: debounce
ARGUMENT:: minSliceLength
The minimum duration of a slice in number of hopSize.
ARGUMENT:: filterSize
@ -44,11 +44,11 @@ ARGUMENT:: filterSize
ARGUMENT:: frameDelta
For certain functions (HFC, SpectralFlux, MKL, Cosine), the distance does not have to be computed between consecutive frames. By default (0) it is, otherwise this sets the distane between the comparison window in samples.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
@ -57,7 +57,7 @@ ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
An audio stream with impulses at detected transients. The latency between the input and the output is winSize at maximum.
An audio stream with impulses at detected transients. The latency between the input and the output is windowSize at maximum.
EXAMPLES::
@ -65,7 +65,7 @@ code::
//load some sounds
b = Buffer.read(s,File.realpath(FluidOnsetSlice.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Nicol-LoopE-M.wav");
// basic param (the process add a latency of winSize samples
// basic param (the process add a latency of windowSize samples
## 2 || YinFFT: Implements the frequency domain version of the YIN algorithm, as described in FOOTNOTE::P. M. Brossier, "Automatic Annotation of Musical Audio for Interactive Applications.” QMUL, London, UK, 2007. :: See also https://essentia.upf.edu/documentation/reference/streaming_PitchYinFFT.html
::
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
@ -37,7 +37,7 @@ ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
A 2-channel KR signal with the [pitch, confidence] descriptors. The latency is winSize.
A 2-channel KR signal with the [pitch, confidence] descriptors. The latency is windowSize.
EXAMPLES::
@ -103,7 +103,7 @@ x.set(\type, 3)
// adding noise shows the comparative sturdiness of the spectral pitch tracker
x.set(\noise, 0.05)
//if latency is no issue, getting a higher winSize will stabilise the algorithm even more
//if latency is no issue, getting a higher windowSize will stabilise the algorithm even more
The size of the buffered window to be analysed, in samples. It will add that much latency to the signal.
ARGUMENT:: hopSize
How much the buffered window moves forward, in samples. The -1 default value will default to half of winSize (overlap of 2).
How much the buffered window moves forward, in samples. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
How large will the FFT be, zero-padding the buffer to the right size, which should be bigger than the windowSize argument, bigger than 4 samples, and should be a power of 2. This is a way to oversample the FFT for extra precision. The -1 default value will default to windowSize.
The weight of the frequency proximity of a peak when trying to associate it to an existing track (relative to magWeight - suggested between 0 to 1)
ARGUMENT:: winSize
ARGUMENT:: windowSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
The window hop size. As sinusoidal estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
@ -49,7 +49,7 @@ ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
An array of two audio streams: [0] is the harmonic part extracted, [1] is the rest. The latency between the input and the output is (( hopSize * minTrackLen) + winSize) samples.
An array of two audio streams: [0] is the harmonic part extracted, [1] is the rest. The latency between the input and the output is (( hopSize * minTrackLen) + windowSize) samples.
EXAMPLES::
@ -59,11 +59,11 @@ CODE::
b = Buffer.read(s,File.realpath(FluidSines.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-AaS-SynthTwoVoices-M.wav");
// run with large parameters - left is sinusoidal model, right is residual
The window size. As spectral shape estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hop size. As spectral shape estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
The window hop size. As spectral shape estimation relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
@ -45,7 +45,7 @@ ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
A 7-channel KR signal with the seven spectral shape descriptors. The latency is winSize.
A 7-channel KR signal with the seven spectral shape descriptors. The latency is windowSize.
The threshold of the offset of the smoothed error function. As it proceeds backwards in time, it allows tight ending of the identification of the anomaly.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The averaging window of the error detection function. It needs smoothing as it is very jittery. The longer the window, the less precise, but the less false positives.
ARGUMENT:: debounce
ARGUMENT:: clumpLength
The window size in sample within with positive detections will be clumped together to avoid overdetecting in time.
ARGUMENT:: minSlice
ARGUMENT:: minSliceLength
The minimum duration of a slice in samples.
RETURNS::
@ -57,14 +57,14 @@ b = Buffer.read(s,File.realpath(FluidTransientSlice.class.filenameSymbol).dirnam
The threshold of the offset of the smoothed error function. As it proceeds backwards in time, it allows tight ending of the identification of the anomaly.
ARGUMENT:: winSize
ARGUMENT:: windowSize
The averaging window of the error detection function. It needs smoothing as it is very jittery. The longer the window, the less precise, but the less false positive.
ARGUMENT:: debounce
ARGUMENT:: clumpLength
The window size in sample within which positive detections will be clumped together to avoid overdetecting in time.