A FluidBufCompose object provides a flexible utility for combining the contents of buffers on the server. It can be used for thing like mixing down multichannel buffers, or converting from left-right stereo to mid-side. It is used extensively in all the example code of LINK::Guides/FluidDecomposition:: as part of the FluCoMa project. footnote::
A FluidBufCompose object provides a flexible utility for combining the contents of buffers on the server. It can be used for thing like mixing down multichannel buffers, or converting from left-right stereo to mid-side. It is used extensively in all the example code of LINK::Guides/FluidDecomposition:: as part of the FluCoMa project. footnote::
This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).::
This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).::
At its most simple, the object copies the content of a source buffer into a destination buffer. The flexibility comes from the various flags controlling which portions and channels of the sources to use, and by applying gains (which can be positive or negative) to the source data and the portion of the destination that would be overwritten.
At its most simple, the object copies the content of a source buffer into a destination buffer. The flexibility comes from the various flags controlling which portions and channels of the source to use, and by applying gains (which can be positive or negative) to the source data and the portion of the destination that would be overwritten.
The algorithm takes a srcBuf, and writes the information at the provided dstBuf. These buffer arguments can all point to the same buffer, which gives great flexibility in transforming and reshaping.
The algorithm takes a srcBuf, and writes the information at the provided dstBuf. These buffer arguments can all point to the same buffer, which gives great flexibility in transforming and reshaping.
@ -21,38 +21,38 @@ METHOD:: process
ARGUMENT:: server
ARGUMENT:: server
The server on which the buffers to be processed are allocated.
The server on which the buffers to be processed are allocated.
ARGUMENT:: srcBufNum
ARGUMENT:: source
The bufNum of the source buffer.
The bufNum of the source buffer.
ARGUMENT:: startAt
ARGUMENT:: startFrame
The starting point (in samples) from which to copy in the source buffer.
The starting point (in samples) from which to copy in the source buffer.
ARGUMENT:: nFrames
ARGUMENT:: numFrames
The duration (in samples) to copy from the source buffer. The default (-1) copies the full lenght of the buffer.
The duration (in samples) to copy from the source buffer. The default (-1) copies the full lenght of the buffer.
ARGUMENT:: startChan
ARGUMENT:: startChan
The first channel from which to copy in the source buffer.
The first channel from which to copy in the source buffer.
ARGUMENT:: nChans
ARGUMENT:: numChans
The number of channels from which to copy in the source buffer. This parameter will wrap around the number of channels in the source buffer. The default (-1) copies all of the buffer's channel.
The number of channels from which to copy in the source buffer. This parameter will wrap around the number of channels in the source buffer. The default (-1) copies all of the buffer's channel.
ARGUMENT:: srcGain
ARGUMENT:: gain
The gain applied to the samples to be copied from the source buffer.
The gain applied to the samples to be copied from the source buffer.
ARGUMENT:: dstBufNum
ARGUMENT:: destination
The bufNum of the destination buffer.
The bufNum of the destination buffer.
ARGUMENT:: dstStartAt
ARGUMENT:: destStartFrame
The time offset (in samples) in the destination buffer to start writing the source at. The destination buffer will be resized if the portion to copy is overflowing.
The time offset (in samples) in the destination buffer to start writing the source at. The destination buffer will be resized if the portion to copy is overflowing.
ARGUMENT:: dstStartChan
ARGUMENT:: destStartChan
The channel offest in the destination buffer to start writing the source at. The destination buffer will be resized if the number of channels to copy is overflowing.
The channel offest in the destination buffer to start writing the source at. The destination buffer will be resized if the number of channels to copy is overflowing.
ARGUMENT:: dstGain
ARGUMENT:: destGain
The gain applied to the samples in the region of the destination buffer over which the source is to be copied. The default value (0) will overwrite that section of the destination buffer, and a value of 1.0 would sum the source to the material that was present.
The gain applied to the samples in the region of the destination buffer over which the source is to be copied. The default value (0) will overwrite that section of the destination buffer, and a value of 1.0 would sum the source to the material that was present.
ARGUMENT:: action
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and dstBufNum instance variables have been updated on the client side. The function will be passed dstBufNum as an argument.
A Function to be evaluated once the offline process has finished and destination instance variables have been updated on the client side. The function will be passed destination as an argument.
RETURNS::
RETURNS::
Nothing, as the destination buffer is declared in the function call.
Nothing, as the destination buffer is declared in the function call.
@ -71,29 +71,29 @@ d = Buffer.new(s);
)
)
// with basic params (basic summing of each full buffer in all dimensions)
// with basic params (basic summing of each full buffer in all dimensions)
The algorithm takes a buffer in, and divides it into two or three outputs, depending on the mode: LIST::
The algorithm takes a buffer in, and divides it into two or three outputs, depending on the mode: LIST::
## an harmonic component;
## an harmonic component;
## a percussive component;
## a percussive component;
## a residual of the previous two if the flag is set to inter-dependant thresholds. See the modeFlag below.::
## a residual of the previous two if the flag is set to inter-dependant thresholds. See the maskingMode below.::
It is part of the Fluid Decomposition Toolkit of the FluCoMa project. footnote::
It is part of the Fluid Decomposition Toolkit of the FluCoMa project. footnote::
This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).
This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).
@ -31,68 +31,68 @@ This is the method that calls for the HPSS to be calculated on a given source bu
ARGUMENT:: server
ARGUMENT:: server
The server on which the buffers to be processed are allocated.
The server on which the buffers to be processed are allocated.
ARGUMENT:: srcBufNum
ARGUMENT:: source
The index of the buffer to use as the source material. The channels of multichannel buffers will be processed sequentially.
The index of the buffer to use as the source material. The channels of multichannel buffers will be processed sequentially.
ARGUMENT:: startAt
ARGUMENT:: startFrame
Where in the srcBuf should the NMF process start, in samples.
Where in the srcBuf should the HPSS process start, in samples.
ARGUMENT:: nFrames
ARGUMENT:: numFrames
How many frames should be processed.
How many frames should be processed.
ARGUMENT:: startChan
ARGUMENT:: startChan
For multichannel srcBuf, which channel to start processing at.
For multichannel srcBuf, which channel to start processing at.
ARGUMENT:: nChans
ARGUMENT:: numChans
For multichannel srcBuf, how many channels should be processed.
For multichannel srcBuf, how many channels should be processed.
ARGUMENT:: harmBufNum
ARGUMENT:: harmonic
The index of the buffer where the extracted harmonic component will be reconstructed.
The index of the buffer where the extracted harmonic component will be reconstructed.
ARGUMENT:: percBufNum
ARGUMENT:: percussive
The index of the buffer where the extracted percussive component will be reconstructed.
The index of the buffer where the extracted percussive component will be reconstructed.
ARGUMENT:: resBufNum
ARGUMENT:: residual
The index of the buffer where the residual component will be reconstructed in mode 2.
The index of the buffer where the residual component will be reconstructed in mode 2.
ARGUMENT:: hFiltSize
ARGUMENT:: harmFilterSize
The size, in spectral frames, of the median filter for the harmonic component. Must be an odd number, >= 3.
The size, in spectral frames, of the median filter for the harmonic component. Must be an odd number, >= 3.
ARGUMENT:: pFiltSize
ARGUMENT:: percFilterSize
The size, in spectral bins, of the median filter for the percussive component. Must be an odd number, >=3
The size, in spectral bins, of the median filter for the percussive component. Must be an odd number, >=3
ARGUMENT:: modeFlag
ARGUMENT:: maskingMode
The way the masking is applied to the original spectrogram. (0,1,2)
The way the masking is applied to the original spectrogram. (0,1,2)
table::
table::
## 0 || The traditional soft mask used in Fitzgerald's original method of 'Wiener-inspired' filtering. Complimentary, soft masks are made for the harmonic and percussive parts by allocating some fraction of a point in time-frequency to each. This provides the fewest artefacts, but the weakest separation. The two resulting buffers will sum to exactly the original material.
## 0 || The traditional soft mask used in Fitzgerald's original method of 'Wiener-inspired' filtering. Complimentary, soft masks are made for the harmonic and percussive parts by allocating some fraction of a point in time-frequency to each. This provides the fewest artefacts, but the weakest separation. The two resulting buffers will sum to exactly the original material.
## 1 || Relative mode - Better separation, with more artefacts. The harmonic mask is constructed using a binary decision, based on whether a threshold is exceeded at a given time-frequency point (these are set using htf1, hta1, htf2, hta2, see below). The percussive mask is then formed as the inverse of the harmonic one, meaning that as above, the two components will sum to the original sound.
## 1 || Relative mode - Better separation, with more artefacts. The harmonic mask is constructed using a binary decision, based on whether a threshold is exceeded at a given time-frequency point (these are set using harmThreshFreq1, harmThreshAmp1, harmThreshFreq2, harmThreshAmp2, see below). The percussive mask is then formed as the inverse of the harmonic one, meaning that as above, the two components will sum to the original sound.
## 2 || Inter-dependent mode - Thresholds can be varied independently, but are coupled in effect. Binary masks are made for each of the harmonic and percussive components, and the masks are converted to soft at the end so that everything null sums even if the params are independent, that is what makes it harder to control. These aren't guranteed to cover the whole sound; in this case the 'leftovers' will placed into a third buffer.
## 2 || Inter-dependent mode - Thresholds can be varied independently, but are coupled in effect. Binary masks are made for each of the harmonic and percussive components, and the masks are converted to soft at the end so that everything null sums even if the params are independent, that is what makes it harder to control. These aren't guranteed to cover the whole sound; in this case the 'leftovers' will placed into a third buffer.
::
::
ARGUMENT:: htf1
ARGUMENT:: harmThreshFreq1
In modes 1 and 2, the frequency of the low part of the threshold for the harmonic filter (0-1)
In modes 1 and 2, the frequency of the low part of the threshold for the harmonic filter (0-1)
ARGUMENT:: hta1
ARGUMENT:: harmThreshAmp1
In modes 1 and 2, the threshold of the low part for the harmonic filter. That threshold applies to all frequencies up to htf1: how much more powerful (in dB) the harmonic median filter needs to be than the percussive median filter for this bin to be counted as harmonic.
In modes 1 and 2, the threshold of the low part for the harmonic filter. That threshold applies to all frequencies up to harmThreshFreq1: how much more powerful (in dB) the harmonic median filter needs to be than the percussive median filter for this bin to be counted as harmonic.
ARGUMENT:: htf2
ARGUMENT:: harmThreshFreq2
In modes 1 and 2, the frequency of the hight part of the threshold for the harmonic filter. (0-1)
In modes 1 and 2, the frequency of the hight part of the threshold for the harmonic filter. (0-1)
ARGUMENT:: hta2
ARGUMENT:: harmThreshAmp2
In modes 1 and 2, the threshold of the high part for the harmonic filter. That threshold applies to all frequencies above htf2. The threshold between htf1 and htf2 is interpolated between hta1 and hta2. How much more powerful (in dB) the harmonic median filter needs to be than the percussive median filter for this bin to be counted as harmonic.
In modes 1 and 2, the threshold of the high part for the harmonic filter. That threshold applies to all frequencies above harmThreshFreq2. The threshold between harmThreshFreq1 and harmThreshFreq2 is interpolated between harmThreshAmp1 and harmThreshAmp2. How much more powerful (in dB) the harmonic median filter needs to be than the percussive median filter for this bin to be counted as harmonic.
ARGUMENT:: ptf1
ARGUMENT:: percThreshFreq1
In mode 2, the frequency of the low part of the threshold for the percussive filter. (0-1)
In mode 2, the frequency of the low part of the threshold for the percussive filter. (0-1)
ARGUMENT:: pta1
ARGUMENT:: percThreshAmp1
In mode 2, the threshold of the low part for the percussive filter. That threshold applies to all frequencies up to ptf1. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
In mode 2, the threshold of the low part for the percussive filter. That threshold applies to all frequencies up to percThreshFreq1. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
ARGUMENT:: ptf2
ARGUMENT:: percThreshFreq2
In mode 2, the frequency of the hight part of the threshold for the percussive filter. (0-1)
In mode 2, the frequency of the hight part of the threshold for the percussive filter. (0-1)
ARGUMENT:: pta2
ARGUMENT:: percThreshAmp2
In mode 2, the threshold of the high part for the percussive filter. That threshold applies to all frequencies above ptf2. The threshold between ptf1 and ptf2 is interpolated between pta1 and pta2. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
In mode 2, the threshold of the high part for the percussive filter. That threshold applies to all frequencies above percThreshFreq2. The threshold between percThreshFreq1 and percThreshFreq2 is interpolated between percThreshAmp1 and percThreshAmp2. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
ARGUMENT:: winSize
ARGUMENT:: winSize
The window size in samples. As HPSS relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
The window size in samples. As HPSS relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
@ -104,7 +104,7 @@ ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long; at least the size of the window; and a power of 2. Making it larger than the window size provides interpolation in frequency.
The inner FFT/IFFT size. It should be at least 4 samples long; at least the size of the window; and a power of 2. Making it larger than the window size provides interpolation in frequency.
ARGUMENT:: action
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [harmBufNum, percBufNum, resBufNum] as an argument.
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [harmonic, percussive, residual] as an argument.
RETURNS::
RETURNS::
Nothing, as the various destination buffers are declared in the function call.
Nothing, as the various destination buffers are declared in the function call.
@ -112,7 +112,7 @@ RETURNS::
Discussion::
Discussion::
HPSS works by using median filters on the spectral magnitudes of a sound. It hinges on a simple modelling assumption that tonal components will tend to yield concentrations of energy across time, spread out in frequency, and percussive components will manifest as concentrations of energy across frequency, spread out in time. By using median filters across time and frequency respectively, we get initial esitmates of the tonal-ness / transient-ness of a point in time and frequency. These are then combined into 'masks' that are applied to the orginal spectral data in order to produce a separation.
HPSS works by using median filters on the spectral magnitudes of a sound. It hinges on a simple modelling assumption that tonal components will tend to yield concentrations of energy across time, spread out in frequency, and percussive components will manifest as concentrations of energy across frequency, spread out in time. By using median filters across time and frequency respectively, we get initial esitmates of the tonal-ness / transient-ness of a point in time and frequency. These are then combined into 'masks' that are applied to the orginal spectral data in order to produce a separation.
The modeFlag parameter provides different approaches to combinging estimates and producing masks. Some settings (especially in modes 1 & 2) will provide better separation but with more artefacts. These can, in principle, be ameliorated by applying smoothing filters to the masks before transforming back to the time-domain (not yet implemented).
The maskingMode parameter provides different approaches to combinging estimates and producing masks. Some settings (especially in modes 1 & 2) will provide better separation but with more artefacts. These can, in principle, be ameliorated by applying smoothing filters to the masks before transforming back to the time-domain (not yet implemented).
The FluidBufNMF object decomposes the spectrum of a sound into a number of components using Non-Negative Matrix Factorisation (NMF) footnote:: Lee, Daniel D., and H. Sebastian Seung. 1999. ‘Learning the Parts of Objects by Non-Negative Matrix Factorization’. Nature 401 (6755): 788–91. https://doi.org/10.1038/44565.
The FluidBufNMF object decomposes the spectrum of a sound into a number of components using Non-Negative Matrix Factorisation (NMF) footnote:: Lee, Daniel D., and H. Sebastian Seung. 1999. ‘Learning the Parts of Objects by Non-Negative Matrix Factorization’. Nature 401 (6755): 788–91. https://doi.org/10.1038/44565.
::. NMF has been a popular technique in signal processing research for things like source separation and transcription footnote:: Smaragdis and Brown, Non-Negative Matrix Factorization for Polyphonic Music Transcription.::, although its creative potential is so far relatively unexplored.
::. NMF has been a popular technique in signal processing research for things like source separation and transcription footnote:: Smaragdis and Brown, Non-Negative Matrix Factorization for Polyphonic Music Transcription.::, although its creative potential is so far relatively unexplored.
The algorithm takes a buffer in and divides it into a number of components, determined by the rank argument. It works iteratively, by trying to find a combination of spectral templates ('dictionaries') and envelopes ('activations') that yield the original magnitude spectrogram when added together. By and large, there is no unique answer to this question (i.e. there are different ways of accounting for an evolving spectrum in terms of some set of templates and envelopes). In its basic form, NMF is a form of unsupervised learning: it starts with some random data and then converges towards something that minimizes the distance between its generated data and the original:it tends to converge very quickly at first and then level out. Fewer iterations mean less processing, but also less predictable results.
The algorithm takes a buffer in and divides it into a number of components, determined by the rank argument. It works iteratively, by trying to find a combination of spectral templates ('bases') and envelopes ('activations') that yield the original magnitude spectrogram when added together. By and large, there is no unique answer to this question (i.e. there are different ways of accounting for an evolving spectrum in terms of some set of templates and envelopes). In its basic form, NMF is a form of unsupervised learning: it starts with some random data and then converges towards something that minimizes the distance between its generated data and the original:it tends to converge very quickly at first and then level out. Fewer iterations mean less processing, but also less predictable results.
The object can return either or all of the following: LIST::
The object can return either or all of the following: LIST::
## a spectral contour of each component in the form of a magnitude spectrogram (called a dictionary in NMF lingo);
## a spectral contour of each component in the form of a magnitude spectrogram (called a basis in NMF lingo);
## an amplitude envelope of each component in the form of gains for each consecutive frame of the underlying spectrogram (called an activation in NMF lingo);
## an amplitude envelope of each component in the form of gains for each consecutive frame of the underlying spectrogram (called an activation in NMF lingo);
## an audio reconstruction of each components in the time domain. ::
## an audio reconstruction of each components in the time domain. ::
The dictionaries and activations can be used to make a kind of vocoder based on what NMF has 'learned' from the original data. Alternatively, taking the matrix product of a dictionary and an activation will yield a synthetic magnitude spectrogram of a component (which could be reconsructed, given some phase informaiton from somewhere).
The bases and activations can be used to make a kind of vocoder based on what NMF has 'learned' from the original data. Alternatively, taking the matrix product of a basis and an activation will yield a synthetic magnitude spectrogram of a component (which could be reconsructed, given some phase informaiton from somewhere).
Some additional options and flexibility can be found through combinations of the dictFlag and actFlag arguments. If these flags are set to 1, the object expects to be supplied with pre-formed spectra (or envelopes) that will be used as seeds for the decomposition, providing more guided results. When set to 2, the supplied buffers won't be updated, so become templates to match against instead. Note that having both dictionaries and activations set to 2 doesn't make sense, so the object will complain.
Some additional options and flexibility can be found through combinations of the basesMode and actMode arguments. If these flags are set to 1, the object expects to be supplied with pre-formed spectra (or envelopes) that will be used as seeds for the decomposition, providing more guided results. When set to 2, the supplied buffers won't be updated, so become templates to match against instead. Note that having both bases and activations set to 2 doesn't make sense, so the object will complain.
If supplying pre-formed data, it's up to the user to make sure that the supplied buffers are the right size: LIST::
If supplying pre-formed data, it's up to the user to make sure that the supplied buffers are the right size: LIST::
## dictionaries must be STRONG::(fft size / 2) + 1:: frames and STRONG::(rank * input channels):: channels
## bases must be STRONG::(fft size / 2) + 1:: frames and STRONG::(rank * input channels):: channels
## activations must be STRONG::(input frames / hopSize) + 1:: frames and STRONG::(rank * input channels):: channels
## activations must be STRONG::(input frames / hopSize) + 1:: frames and STRONG::(rank * input channels):: channels
::
::
@ -41,39 +41,39 @@ This is the method that calls for the factorisation to be calculated on a given
ARGUMENT:: server
ARGUMENT:: server
The server on which the buffers to be processed are allocated.
The server on which the buffers to be processed are allocated.
ARGUMENT:: srcBufNum
ARGUMENT:: source
The index of the buffer to use as the source material to be decomposed through the NMF process. The different channels of multichannel buffers will be processing sequentially.
The index of the buffer to use as the source material to be decomposed through the NMF process. The different channels of multichannel buffers will be processing sequentially.
ARGUMENT:: startAt
ARGUMENT:: startFrame
Where in the srcBuf should the NMF process start, in sample.
Where in the srcBuf should the NMF process start, in sample.
ARGUMENT:: nFrames
ARGUMENT:: numFrames
How many frames should be processed.
How many frames should be processed.
ARGUMENT:: startChan
ARGUMENT:: startChan
For multichannel srcBuf, which channel should be processed first.
For multichannel srcBuf, which channel should be processed first.
ARGUMENT:: nChans
ARGUMENT:: numChans
For multichannel srcBuf, how many channel should be processed.
For multichannel srcBuf, how many channel should be processed.
ARGUMENT:: dstBufNum
ARGUMENT:: destination
The index of the buffer where the different reconstructed ranks will be reconstructed. The buffer will be resized to STRONG::rank * numChannelsProcessed:: channels and STRONG::sourceDuration:: lenght. If STRONG::nil:: is provided, the reconstruction will not happen.
The index of the buffer where the different reconstructed ranks will be reconstructed. The buffer will be resized to STRONG::rank * numChannelsProcessed:: channels and STRONG::sourceDuration:: lenght. If STRONG::nil:: is provided, the reconstruction will not happen.
ARGUMENT:: dictBufNum
ARGUMENT:: bases
The index of the buffer where the different dictionaries will be written to and/or read from: the behaviour is set in the following argument. If STRONG::nil:: is provided, no dictionary will be returned.
The index of the buffer where the different bases will be written to and/or read from: the behaviour is set in the following argument. If STRONG::nil:: is provided, no bases will be returned.
ARGUMENT:: dictFlag
ARGUMENT:: basesMode
This flag decides of how the dictionnary buffer passed as the previous argument is treated.
This flag decides of how the basis buffer passed as the previous argument is treated.
table::
table::
## 0 || The dictionaries are seeded randomly, and the resulting ones will be written after the process in the passed buffer. The buffer is resized to STRONG::rank * numChannelsProcessed:: channels and STRONG::(fftSize / 2 + 1):: lenght.
## 0 || The bases are seeded randomly, and the resulting ones will be written after the process in the passed buffer. The buffer is resized to STRONG::rank * numChannelsProcessed:: channels and STRONG::(fftSize / 2 + 1):: lenght.
## 1 || The passed buffer is considered as seed for the dictionaries. Its dimensions should match the values above. The resulting dictionaries will replace the seed ones.
## 1 || The passed buffer is considered as seed for the bases. Its dimensions should match the values above. The resulting bases will replace the seed ones.
## 2 || The passed buffer is considered as a template for the dictionaries, and will therefore not change. Its dictionaries should match the values above.
## 2 || The passed buffer is considered as a template for the bases, and will therefore not change. Its bases should match the values above.
::
::
ARGUMENT:: actBufNum
ARGUMENT:: activations
The index of the buffer where the different activations will be written to and/or read from: the behaviour is set in the following argument. If STRONG::nil:: is provided, no activation will be returned.
The index of the buffer where the different activations will be written to and/or read from: the behaviour is set in the following argument. If STRONG::nil:: is provided, no activation will be returned.
ARGUMENT:: actFlag
ARGUMENT:: actMode
This flag decides of how the activation buffer passed as the previous argument is treated.
This flag decides of how the activation buffer passed as the previous argument is treated.
table::
table::
## 0 || The activations are seeded randomly, and the resulting ones will be written after the process in the passed buffer. The buffer is resized to STRONG::rank * numChannelsProcessed:: channels and STRONG::(sourceDuration / hopsize + 1):: lenght.
## 0 || The activations are seeded randomly, and the resulting ones will be written after the process in the passed buffer. The buffer is resized to STRONG::rank * numChannelsProcessed:: channels and STRONG::(sourceDuration / hopsize + 1):: lenght.
@ -84,12 +84,9 @@ ARGUMENT:: actFlag
ARGUMENT:: rank
ARGUMENT:: rank
The number of elements the NMF algorithm will try to divide the spectrogram of the source in.
The number of elements the NMF algorithm will try to divide the spectrogram of the source in.
ARGUMENT:: nIter
ARGUMENT:: numIter
The NMF process is iterative, trying to converge to the smallest error in its factorisation. The number of iterations will decide how many times it tries to adjust its estimates. Higher numbers here will be more CPU expensive, lower numbers will be more unpredictable in quality.
The NMF process is iterative, trying to converge to the smallest error in its factorisation. The number of iterations will decide how many times it tries to adjust its estimates. Higher numbers here will be more CPU expensive, lower numbers will be more unpredictable in quality.
ARGUMENT:: sortFlag
This allows to choose between the different methods of sorting the ranks in order to get similar sonic qualities on a given rank (not implemented yet)
ARGUMENT:: winSize
ARGUMENT:: winSize
The window size. As NMF relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
The window size. As NMF relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
@ -106,7 +103,7 @@ ARGUMENT:: randSeed
The NMF process needs to seed its starting point. If specified, the same values will be used. The default of -1 will randomly assign them. (not implemented yet)
The NMF process needs to seed its starting point. If specified, the same values will be used. The default of -1 will randomly assign them. (not implemented yet)
ARGUMENT:: action
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [dstBufNum, dictBufNum, actBufNum] as an argument.
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [destination, bases, activations] as an argument.
RETURNS::
RETURNS::
Nothing, as the various destination buffers are declared in the function call.
Nothing, as the various destination buffers are declared in the function call.
@ -18,31 +18,31 @@ This is the method that calls for the slicing to be calculated on a given source
ARGUMENT:: server
ARGUMENT:: server
The server on which the buffers to be processed are allocated.
The server on which the buffers to be processed are allocated.
ARGUMENT:: srcBufNum
ARGUMENT:: source
The index of the buffer to use as the source material to be sliced through novelty identification. The different channels of multichannel buffers will be summed.
The index of the buffer to use as the source material to be sliced through novelty identification. The different channels of multichannel buffers will be summed.
ARGUMENT:: startAt
ARGUMENT:: startFrame
Where in the srcBuf should the slicing process start, in sample.
Where in the srcBuf should the slicing process start, in sample.
ARGUMENT:: nFrames
ARGUMENT:: numFrames
How many frames should be processed.
How many frames should be processed.
ARGUMENT:: startChan
ARGUMENT:: startChan
For multichannel srcBuf, which channel should be processed.
For multichannel srcBuf, which channel should be processed.
ARGUMENT:: nChans
ARGUMENT:: numChans
For multichannel srcBuf, how many channel should be summed.
For multichannel srcBuf, how many channel should be summed.
ARGUMENT:: indBufNum
ARGUMENT:: indices
The index of the buffer where the indices (in sample) of the estimated starting points of slices will be written. The first and last points are always the boundary points of the analysis.
The index of the buffer where the indices (in sample) of the estimated starting points of slices will be written. The first and last points are always the boundary points of the analysis.
ARGUMENT:: kernSize
ARGUMENT:: kernSize
The granularity of the window in which the algorithm looks for change, in samples. A small number will be sensitive to short term changes, and a large number should look for long term changes.
The granularity of the window in which the algorithm looks for change, in samples. A small number will be sensitive to short term changes, and a large number should look for long term changes.
ARGUMENT:: thresh
ARGUMENT:: threshold
The normalised threshold, between 0 an 1, on the novelty curve to consider it a segmentation point.
The normalised threshold, between 0 an 1, on the novelty curve to consider it a segmentation point.
ARGUMENT:: filtSize
ARGUMENT:: filterSize
The size of a smoothing filter that is applied on the novelty curve. A larger filter filter size allows for cleaner cuts on very sharp changes.
The size of a smoothing filter that is applied on the novelty curve. A larger filter filter size allows for cleaner cuts on very sharp changes.
ARGUMENT:: winSize
ARGUMENT:: winSize
@ -55,7 +55,7 @@ ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision.
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision.
ARGUMENT:: action
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and indBufNum instance variables have been updated on the client side. The function will be passed indBufNum as an argument.
A Function to be evaluated once the offline process has finished and indices instance variables have been updated on the client side. The function will be passed indices as an argument.
RETURNS::
RETURNS::
Nothing, as the various destination buffers are declared in the function call.
Nothing, as the various destination buffers are declared in the function call.
This class implements many spectral-based onset detection functions, most of them taken from the literature. (http://www.dafx.ca/proceedings/papers/p_133.pdf) Some are already available in SuperCollider's LINK::Classes/Onsets:: object yet not as offline processes. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.footnote::This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).::
The process will return a buffer which contains indices (in sample) of estimated starting points of different slices.
CLASSMETHODS::
METHOD:: process
This is the method that calls for the slicing to be calculated on a given source buffer.
ARGUMENT:: server
The server on which the buffers to be processed are allocated.
ARGUMENT:: source
The index of the buffer to use as the source material to be sliced through novelty identification. The different channels of multichannel buffers will be summed.
ARGUMENT:: startFrame
Where in the srcBuf should the slicing process start, in sample.
ARGUMENT:: numFrames
How many frames should be processed.
ARGUMENT:: startChan
For multichannel sources, which channel should be processed.
ARGUMENT:: numChans
For multichannel sources, how many channel should be summed.
ARGUMENT:: indices
The index of the buffer where the indices (in sample) of the estimated starting points of slices will be written. The first and last points are always the boundary points of the analysis.
ARGUMENT:: function
The function used to derive a difference curve between spectral frames. It can be any of the following:
TABLE::
##0 || Energy || thresholds on (sum of squares of magnitudes / nBins) (like Onsets \power)
##1 || HFC || thresholds on (sum of (squared magnitudes * binNum) / nBins)
##2 || SpectralFlux || thresholds on (diffence in magnitude between consecutive frames, half rectified)
##3 || MKL || thresholds on (sum of log of magnitude ratio per bin) (or equivalent: sum of difference of the log magnitude per bin) (like Onsets \mkl)
##4 || IS || (WILL PROBABLY BE REMOVED) Itakura - Saito divergence (see literature)
##5 || Cosine || thresholds on (cosine distance between comparison frames)
##6 || PhaseDev || takes the past 2 frames, projects to the current, as anticipated if it was a steady state, then compute the sum of the differences, on which it thresholds (like Onsets \phase)
##7 || WPhaseDev || same as PhaseDev, but weighted by the magnitude in order to remove chaos noise floor (like Onsets \wphase)
##8 || ComplexDev || same as PhaseDev, but in the complex domain - the anticipated amp is considered steady, and the phase is projected, then a complex subtraction is done with the actual present frame. The sum of magnitudes is used to threshold (like Onsets \complex)
##9 || RComplexDev || same as above, but rectified (like Onsets \rcomplex)
::
ARGUMENT:: threshold
The thresholding of a new slice. Value ranges are different for each function, from 0 upwards.
ARGUMENT:: debounce
The minimum duration of a slice in number of hopSize.
ARGUMENT:: filterSize
The size of a smoothing filter that is applied on the novelty curve. A larger filter filter size allows for cleaner cuts on very sharp changes.
ARGUMENT:: frameDelta
For certain functions (HFC, SpectralFlux, MKL, Cosine), the distance does not have to be computed between consecutive frames. By default (0) it is, otherwise this sets the distane between the comparison window in samples.
ARGUMENT:: winSize
The window size. As spectral differencing relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
ARGUMENT:: hopSize
The window hope size. As spectral differencing relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of winSize (overlap of 2).
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will default to windowSize.
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and indices instance variables have been updated on the client side. The function will be passed indices as an argument.
RETURNS::
Nothing, as the various destination buffers are declared in the function call.
EXAMPLES::
CODE::
(
//prep some buffers
b = Buffer.read(s,File.realpath(FluidBufOnsetSlice.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Nicol-LoopE-M.wav");
The server on which the buffers to be processed are allocated.
The server on which the buffers to be processed are allocated.
ARGUMENT:: srcBufNum
ARGUMENT:: source
The index of the buffer to use as the source material to be decomposed through the NMF process. The different channels of multichannel buffers will be processing sequentially.
The index of the buffer to use as the source material to be decomposed through the sinusoidal modelling process. The different channels of multichannel buffers will be processing sequentially.
ARGUMENT:: startAt
ARGUMENT:: startFrame
Where in the srcBuf should the process start, in sample.
Where in the srcBuf should the process start, in sample.
ARGUMENT:: nFrames
ARGUMENT:: numFrames
How many frames should be processed.
How many frames should be processed.
ARGUMENT:: startChan
ARGUMENT:: startChan
For multichannel srcBuf, which channel should be processed first.
For multichannel srcBuf, which channel should be processed first.
ARGUMENT:: nChans
ARGUMENT:: numChans
For multichannel srcBuf, how many channel should be processed.
For multichannel srcBuf, how many channel should be processed.
ARGUMENT:: sineBufNum
ARGUMENT:: sines
The index of the buffer where the extracted sinusoidal component will be reconstructed.
The index of the buffer where the extracted sinusoidal component will be reconstructed.
ARGUMENT:: resBufNum
ARGUMENT:: residual
The index of the buffer where the residual of the sinusoidal component will be reconstructed.
The index of the buffer where the residual of the sinusoidal component will be reconstructed.
ARGUMENT:: bw
ARGUMENT:: bandwidth
The width in bins of the fragment of the fft window that is considered a normal deviation for a potential continuous sinusoidal track. It has an effect on CPU cost: the widest is more accurate but more computationally expensive.
The width in bins of the fragment of the fft window that is considered a normal deviation for a potential continuous sinusoidal track. It has an effect on CPU cost: the widest is more accurate but more computationally expensive.
ARGUMENT:: thresh
ARGUMENT:: threshold
The normalised threshold, between 0 an 1, to consider a peak as a sinusoidal component from the normalized cross-correlation.
The normalised threshold, between 0 an 1, to consider a peak as a sinusoidal component from the normalized cross-correlation.
ARGUMENT:: minTrackLen
ARGUMENT:: minTrackLen
@ -67,7 +67,7 @@ ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision.
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision.
ARGUMENT:: action
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [sineBufNum, resBufNum] as an argument.
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [sines, residual] as an argument.
RETURNS::
RETURNS::
Nothing, as the various destination buffers are declared in the function call.
Nothing, as the various destination buffers are declared in the function call.
The server on which the buffers to be processed are allocated.
The server on which the buffers to be processed are allocated.
ARGUMENT:: srcBufNum
ARGUMENT:: source
The index of the buffer to use as the source material to be sliced through transient identification. The different channels of multichannel buffers will be summed.
The index of the buffer to use as the source material to be sliced through transient identification. The different channels of multichannel buffers will be summed.
ARGUMENT:: startAt
ARGUMENT:: startFrame
Where in the srcBuf should the slicing process start, in sample.
Where in the srcBuf should the slicing process start, in sample.
ARGUMENT:: nFrames
ARGUMENT:: numFrames
How many frames should be processed.
How many frames should be processed.
ARGUMENT:: startChan
ARGUMENT:: startChan
For multichannel srcBuf, which channel should be processed.
For multichannel srcBuf, which channel should be processed.
ARGUMENT:: nChans
ARGUMENT:: numChans
For multichannel srcBuf, how many channel should be summed.
For multichannel srcBuf, how many channel should be summed.
ARGUMENT:: indBufNum
ARGUMENT:: indices
The index of the buffer where the indices (in sample) of the estimated starting points of slices will be written. The first and last points are always the boundary points of the analysis.
The index of the buffer where the indices (in sample) of the estimated starting points of slices will be written. The first and last points are always the boundary points of the analysis.
ARGUMENT:: order
ARGUMENT:: order
@ -64,7 +64,7 @@ ARGUMENT:: minSlice
The minimum duration of a slice in samples.
The minimum duration of a slice in samples.
ARGUMENT:: action
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and indBufNum instance variables have been updated on the client side. The function will be passed indBufNum as an argument.
A Function to be evaluated once the offline process has finished and indices instance variables have been updated on the client side. The function will be passed indices as an argument.
RETURNS::
RETURNS::
Nothing, as the destination buffer is declared in the function call.
Nothing, as the destination buffer is declared in the function call.
@ -23,25 +23,25 @@ This is the method that calls for the transient extraction to be performed on a
ARGUMENT:: server
ARGUMENT:: server
The server on which the buffers to be processed are allocated.
The server on which the buffers to be processed are allocated.
ARGUMENT:: srcBufNum
ARGUMENT:: source
The index of the buffer to use as the source material to be decomposed through the NMF process. The different channels of multichannel buffers will be processing sequentially.
The index of the buffer to use as the source material to be decomposed through the NMF process. The different channels of multichannel buffers will be processing sequentially.
ARGUMENT:: startAt
ARGUMENT:: startFrame
Where in the srcBuf should the NMF process start, in sample.
Where in the srcBuf should the NMF process start, in sample.
ARGUMENT:: nFrames
ARGUMENT:: numFrames
How many frames should be processed.
How many frames should be processed.
ARGUMENT:: startChan
ARGUMENT:: startChan
For multichannel srcBuf, which channel should be processed first.
For multichannel srcBuf, which channel should be processed first.
ARGUMENT:: nChans
ARGUMENT:: numChans
For multichannel srcBuf, how many channel should be processed.
For multichannel srcBuf, how many channel should be processed.
ARGUMENT:: transBufNum
ARGUMENT:: transients
The index of the buffer where the extracted transient component will be reconstructed.
The index of the buffer where the extracted transient component will be reconstructed.
ARGUMENT:: resBufNum
ARGUMENT:: residual
The index of the buffer where the estimated continuous component will be reconstructed.
The index of the buffer where the estimated continuous component will be reconstructed.
ARGUMENT:: order
ARGUMENT:: order
@ -69,7 +69,7 @@ ARGUMENT:: debounce
The window size in sample within which positive detections will be clumped together to avoid overdetecting in time.
The window size in sample within which positive detections will be clumped together to avoid overdetecting in time.
ARGUMENT:: action
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [transBufNum, resBufNum] as an argument.
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [transients, residual] as an argument.
RETURNS::
RETURNS::
Nothing, as the various destination buffers are declared in the function call.
Nothing, as the various destination buffers are declared in the function call.
The algorithm takes an audio in, and divides it into two or three outputs, depending on the mode: LIST::
The algorithm takes an audio in, and divides it into two or three outputs, depending on the mode: LIST::
## an harmonic component;
## an harmonic component;
## a percussive component;
## a percussive component;
## a residual of the previous two if the flag is set to inter-dependant thresholds. See the modeFlag below.::
## a residual of the previous two if the flag is set to inter-dependant thresholds. See the maskingMode below.::
It is part of the Fluid Decomposition Toolkit of the FluCoMa project. footnote::
It is part of the Fluid Decomposition Toolkit of the FluCoMa project. footnote::
This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).
This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).
@ -27,44 +27,44 @@ METHOD:: ar
ARGUMENT:: in
ARGUMENT:: in
The input to be processed.
The input to be processed.
ARGUMENT:: hFiltSize
ARGUMENT:: harmFilterSize
The size, in spectral frames, of the median filter for the harmonic component. Must be an odd number, >= 3.
The size, in spectral frames, of the median filter for the harmonic component. Must be an odd number, >= 3.
ARGUMENT:: pFiltSize
ARGUMENT:: percFilterSize
The size, in spectral bins, of the median filter for the percussive component. Must be an odd number, >=3
The size, in spectral bins, of the median filter for the percussive component. Must be an odd number, >=3
ARGUMENT:: modeFlag
ARGUMENT:: maskingMode
The way the masking is applied to the original spectrogram. (0,1,2)
The way the masking is applied to the original spectrogram. (0,1,2)
table::
table::
## 0 || The traditional soft mask used in Fitzgerald's original method of 'Wiener-inspired' filtering. Complimentary, soft masks are made for the harmonic and percussive parts by allocating some fraction of a point in time-frequency to each. This provides the fewest artefacts, but the weakest separation. The two resulting buffers will sum to exactly the original material.
## 0 || The traditional soft mask used in Fitzgerald's original method of 'Wiener-inspired' filtering. Complimentary, soft masks are made for the harmonic and percussive parts by allocating some fraction of a point in time-frequency to each. This provides the fewest artefacts, but the weakest separation. The two resulting buffers will sum to exactly the original material.
## 1 || Relative mode - Better separation, with more artefacts. The harmonic mask is constructed using a binary decision, based on whether a threshold is exceeded at a given time-frequency point (these are set using htf1, hta1, htf2, hta2, see below). The percussive mask is then formed as the inverse of the harmonic one, meaning that as above, the two components will sum to the original sound.
## 1 || Relative mode - Better separation, with more artefacts. The harmonic mask is constructed using a binary decision, based on whether a threshold is exceeded at a given time-frequency point (these are set using harmThreshFreq1, harmThreshAmp1, harmThreshFreq2, harmThreshAmp2, see below). The percussive mask is then formed as the inverse of the harmonic one, meaning that as above, the two components will sum to the original sound.
## 2 || Inter-dependent mode - Thresholds can be varied independently, but are coupled in effect. Binary masks are made for each of the harmonic and percussive components, and the masks are converted to soft at the end so that everything null sums even if the params are independent, that is what makes it harder to control. These aren't guranteed to cover the whole sound; in this case the 'leftovers' will placed into a third buffer.
## 2 || Inter-dependent mode - Thresholds can be varied independently, but are coupled in effect. Binary masks are made for each of the harmonic and percussive components, and the masks are converted to soft at the end so that everything null sums even if the params are independent, that is what makes it harder to control. These aren't guranteed to cover the whole sound; in this case the 'leftovers' will placed into a third buffer.
::
::
ARGUMENT:: htf1
ARGUMENT:: harmThreshFreq1
In modes 1 and 2, the frequency of the low part of the threshold for the harmonic filter (0-1)
In modes 1 and 2, the frequency of the low part of the threshold for the harmonic filter (0-1)
ARGUMENT:: hta1
ARGUMENT:: harmThreshAmp1
In modes 1 and 2, the threshold of the low part for the harmonic filter. That threshold applies to all frequencies up to htf1: how much more powerful (in dB) the harmonic median filter needs to be than the percussive median filter for this bin to be counted as harmonic.
In modes 1 and 2, the threshold of the low part for the harmonic filter. That threshold applies to all frequencies up to harmThreshFreq1: how much more powerful (in dB) the harmonic median filter needs to be than the percussive median filter for this bin to be counted as harmonic.
ARGUMENT:: htf2
ARGUMENT:: harmThreshFreq2
In modes 1 and 2, the frequency of the hight part of the threshold for the harmonic filter. (0-1)
In modes 1 and 2, the frequency of the hight part of the threshold for the harmonic filter. (0-1)
ARGUMENT:: hta2
ARGUMENT:: harmThreshAmp2
In modes 1 and 2, the threshold of the high part for the harmonic filter. That threshold applies to all frequencies above htf2. The threshold between htf1 and htf2 is interpolated between hta1 and hta2. How much more powerful (in dB) the harmonic median filter needs to be than the percussive median filter for this bin to be counted as harmonic.
In modes 1 and 2, the threshold of the high part for the harmonic filter. That threshold applies to all frequencies above harmThreshFreq2. The threshold between harmThreshFreq1 and harmThreshFreq2 is interpolated between harmThreshAmp1 and harmThreshAmp2. How much more powerful (in dB) the harmonic median filter needs to be than the percussive median filter for this bin to be counted as harmonic.
ARGUMENT:: ptf1
ARGUMENT:: percThreshFreq1
In mode 2, the frequency of the low part of the threshold for the percussive filter. (0-1)
In mode 2, the frequency of the low part of the threshold for the percussive filter. (0-1)
ARGUMENT:: pta1
ARGUMENT:: percThreshAmp1
In mode 2, the threshold of the low part for the percussive filter. That threshold applies to all frequencies up to ptf1. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
In mode 2, the threshold of the low part for the percussive filter. That threshold applies to all frequencies up to percThreshFreq1. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
ARGUMENT:: ptf2
ARGUMENT:: percThreshFreq2
In mode 2, the frequency of the hight part of the threshold for the percussive filter. (0-1)
In mode 2, the frequency of the hight part of the threshold for the percussive filter. (0-1)
ARGUMENT:: pta2
ARGUMENT:: percThreshAmp2
In mode 2, the threshold of the high part for the percussive filter. That threshold applies to all frequencies above ptf2. The threshold between ptf1 and ptf2 is interpolated between pta1 and pta2. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
In mode 2, the threshold of the high part for the percussive filter. That threshold applies to all frequencies above percThreshFreq2. The threshold between percThreshFreq1 and percThreshFreq2 is interpolated between percThreshAmp1 and percThreshAmp2. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
ARGUMENT:: winSize
ARGUMENT:: winSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
@ -78,21 +78,21 @@ ARGUMENT:: fftSize
ARGUMENT:: maxFFTSize
ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
ARGUMENT::maxHFlitSize
ARGUMENT::maxHarmFilterSize
How large can the harmonic filter be modulated to (hFiltSize), by allocating memory at instantiation time. This cannot be modulated.
How large can the harmonic filter be modulated to (harmFilterSize), by allocating memory at instantiation time. This cannot be modulated.
ARGUMENT:: maxPFiltSize
ARGUMENT:: maxPercFilterSize
How large can the percussive filter be modulated to (pFiltSize), by allocating memory at instantiation time. This cannot be modulated.
How large can the percussive filter be modulated to (percFilterSize), by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
RETURNS::
An array of three audio streams: [0] is the harmonic part extracted, [1] is the percussive part extracted, [2] is the rest. The latency between the input and the output is ((hFiltSize - 1) * hopSize) + winSize) samples.
An array of three audio streams: [0] is the harmonic part extracted, [1] is the percussive part extracted, [2] is the rest. The latency between the input and the output is ((harmFilterSize - 1) * hopSize) + winSize) samples.
Discussion::
Discussion::
HPSS works by using median filters on the spectral magnitudes of a sound. It hinges on a simple modelling assumption that tonal components will tend to yield concentrations of energy across time, spread out in frequency, and percussive components will manifest as concentrations of energy across frequency, spread out in time. By using median filters across time and frequency respectively, we get initial esitmates of the tonal-ness / transient-ness of a point in time and frequency. These are then combined into 'masks' that are applied to the orginal spectral data in order to produce a separation.
HPSS works by using median filters on the spectral magnitudes of a sound. It hinges on a simple modelling assumption that tonal components will tend to yield concentrations of energy across time, spread out in frequency, and percussive components will manifest as concentrations of energy across frequency, spread out in time. By using median filters across time and frequency respectively, we get initial esitmates of the tonal-ness / transient-ness of a point in time and frequency. These are then combined into 'masks' that are applied to the orginal spectral data in order to produce a separation.
The modeFlag parameter provides different approaches to combinging estimates and producing masks. Some settings (especially in modes 1 & 2) will provide better separation but with more artefacts. These can, in principle, be ameliorated by applying smoothing filters to the masks before transforming back to the time-domain (not yet implemented).
The maskingMode parameter provides different approaches to combinging estimates and producing masks. Some settings (especially in modes 1 & 2) will provide better separation but with more artefacts. These can, in principle, be ameliorated by applying smoothing filters to the masks before transforming back to the time-domain (not yet implemented).
EXAMPLES::
EXAMPLES::
@ -102,7 +102,7 @@ CODE::
b = Buffer.read(s,File.realpath(FluidHPSS.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-AaS-SynthTwoVoices-M.wav");
b = Buffer.read(s,File.realpath(FluidHPSS.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-AaS-SynthTwoVoices-M.wav");
// run with basic parameters (left is harmonic, right is percussive)
// run with basic parameters (left is harmonic, right is percussive)
@ -20,18 +20,18 @@ FluidBufNMF is part of the Fluid Decomposition Toolkit of the FluCoMa project. f
CLASSMETHODS::
CLASSMETHODS::
METHOD:: kr
METHOD:: kr
The real-time processing method. It takes an audio or control input, and will yield a control stream in the form of a multichannel array of size STRONG::maxRank:: . If the dictionary buffer has fewer than maxRank channels, the remaining outputs will be zeroed.
The real-time processing method. It takes an audio or control input, and will yield a control stream in the form of a multichannel array of size STRONG::maxRank:: . If the bases buffer has fewer than maxRank channels, the remaining outputs will be zeroed.
ARGUMENT:: in
ARGUMENT:: in
The signal input to the factorisation process.
The signal input to the factorisation process.
ARGUMENT:: dictBufNum
ARGUMENT:: bases
The server index of the buffer containing the different dictionaries that the input signal will be matched against. Dictionaries must be STRONG::(fft size / 2) + 1:: frames. If the buffer has more than STRONG::maxRank:: channels, the excess will be ignored.
The server index of the buffer containing the different bases that the input signal will be matched against. Bases must be STRONG::(fft size / 2) + 1:: frames. If the buffer has more than STRONG::maxRank:: channels, the excess will be ignored.
ARGUMENT::maxRank
ARGUMENT::maxRank
The maximum number of elements the NMF algorithm will try to divide the spectrogram of the source in. This dictates the number of output channelsfor the ugen. This cannot be modulated.
The maximum number of elements the NMF algorithm will try to divide the spectrogram of the source in. This dictates the number of output channelsfor the ugen. This cannot be modulated.
ARGUMENT:: nIter
ARGUMENT:: numIter
The NMF process is iterative, trying to converge to the smallest error in its factorisation. The number of iterations will decide how many times it tries to adjust its estimates. Higher numbers here will be more CPU intensive, lower numbers will be more unpredictable in quality.
The NMF process is iterative, trying to converge to the smallest error in its factorisation. The number of iterations will decide how many times it tries to adjust its estimates. Higher numbers here will be more CPU intensive, lower numbers will be more unpredictable in quality.
ARGUMENT:: winSize
ARGUMENT:: winSize
@ -47,7 +47,7 @@ ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
RETURNS::
A multichannel kr output, giving for each dictionary component the activation amount.
A multichannel kr output, giving for each basis component the activation amount.
//using this trained dictionary we can then see the activation...
//using this trained basis we can then see the activation...
(
(
{
{
var source, blips;
var source, blips;
@ -260,7 +260,7 @@ e.plot;
::
::
STRONG::Pretrained piano::
STRONG::Pretrained piano::
CODE::
CODE::
//load in the sound in and a pretrained dictionary
//load in the sound in and a pretrained basis
(
(
b = Buffer.read(s,File.realpath(FluidNMFMatch.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-SA-UprightPianoPedalWide.wav");
b = Buffer.read(s,File.realpath(FluidNMFMatch.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-SA-UprightPianoPedalWide.wav");
c = Buffer.read(s,File.realpath(FluidNMFMatch.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/filters/piano-dicts.wav");
c = Buffer.read(s,File.realpath(FluidNMFMatch.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/filters/piano-dicts.wav");
@ -268,7 +268,7 @@ e.plot;
b.play
b.play
c.query
c.query
//use the pretrained dictionary to compute activations of each notes to drive the amplitude of a resynth
//use the pretrained bases to compute activations of each notes to drive the amplitude of a resynth
This class implements many spectral based onset detection algorythms, most of them taken from the literature (http://www.dafx.ca/proceedings/papers/p_133.pdf)
This class implements many spectral based onset detection functions, most of them taken from the literature. (http://www.dafx.ca/proceedings/papers/p_133.pdf) Some are already available in SuperCollider's LINK::Classes/Onsets:: object. It is part of the Fluid Decomposition Toolkit of the FluCoMa project.footnote::This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).::
It is part of the Fluid Decomposition Toolkit of the FluCoMa project.footnote::This was made possible thanks to the FluCoMa project ( http://www.flucoma.org/ ) funded by the European Research Council ( https://erc.europa.eu/ ) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 725899).::
The process will return an audio steam with sample-long impulses at estimated starting points of the different slices.
The process will return an audio steam with sample-long impulses at estimated starting points of the different slices.
@ -20,28 +18,31 @@ ARGUMENT:: in
The audio to be processed.
The audio to be processed.
ARGUMENT:: function
ARGUMENT:: function
0 - Energy =sum of squares of magnitudes / nBins (\power)
The function used to derive a difference curve between spectral frames. It can be any of the following:
1 - HFC = sum of (squared mag * binNum) / nBins
TABLE::
2 - SpectralFlux = dif in mag between consecutive frames (half rectified)
##0 || Energy || thresholds on (sum of squares of magnitudes / nBins) (like Onsets \power)
3 - MKL = sum of log of mag ratio per bin (or: sum of dif of log mag per bin) (\mkl)
##1 || HFC || thresholds on (sum of (squared magnitudes * binNum) / nBins)
4 - IS = itakura - saito divergence
##2 || SpectralFlux || thresholds on (diffence in magnitude between consecutive frames, half rectified)
5 - Cosine = cosine distance
##3 || MKL || thresholds on (sum of log of magnitude ratio per bin) (or equivalent: sum of difference of the log magnitude per bin) (like Onsets \mkl)
6 - PhaseDev = takes 2 past frames, project to current, as anticipated steady state, then compute differences (sums) (\phase)
##4 || IS || (WILL PROBABLY BE REMOVED) Itakura - Saito divergence (see literature)
7 - WPhaseDev = same, but weighted by magnitude to remove chaos noise floor (\wphase)
##5 || Cosine || thresholds on (cosine distance between comparison frames)
8 - ComplexDev = same as kPhaseDev, but in the complex domain - anticipated amp(steady) and phase(projected) - complex subtraction -> sum of mag (\complex)
##6 || PhaseDev || takes the past 2 frames, projects to the current, as anticipated if it was a steady state, then compute the sum of the differences, on which it thresholds (like Onsets \phase)
9 - RComplexDev =same as above, but rectified (\rcomplex)
##7 || WPhaseDev || same as PhaseDev, but weighted by the magnitude in order to remove chaos noise floor (like Onsets \wphase)
##8 || ComplexDev || same as PhaseDev, but in the complex domain - the anticipated amp is considered steady, and the phase is projected, then a complex subtraction is done with the actual present frame. The sum of magnitudes is used to threshold (like Onsets \complex)
ARGUMENT:: thresh
##9 || RComplexDev || same as above, but rectified (like Onsets \rcomplex)
diff for each...
::
ARGUMENT:: threshold
The thresholding of a new slice. Value ranges are different for each function, from 0 upwards.
ARGUMENT:: debounce
ARGUMENT:: debounce
The minimum duration of a slice in number of hops.
The minimum duration of a slice in number of hopSize.
ARGUMENT:: filtSize
ARGUMENT:: filterSize
The size of a smoothing filter that is applied on the novelty curve. A larger filter filter size allows for cleaner cuts on very sharp changes.
The size of a smoothing filter that is applied on the novelty curve. A larger filter filter size allows for cleaner cuts on very sharp changes.
ARGUMENT:: frameDelta
ARGUMENT:: frameDelta
distance in samples between the comparison window (flux,mkl,kls,cosine)
For certain functions (HFC, SpectralFlux, MKL, Cosine), the distance does not have to be computed between consecutive frames. By default (0) it is, otherwise this sets the distane between the comparison window in samples.
ARGUMENT:: winSize
ARGUMENT:: winSize
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
The window size. As sinusoidal estimation relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
@ -56,10 +57,29 @@ ARGUMENT:: maxFFTSize
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
How large can the FFT be, by allocating memory at instantiation time. This cannot be modulated.
RETURNS::
RETURNS::
An audio stream with impulses at detected transients. The latency between the input and the output is winSize.
An audio stream with impulses at detected transients. The latency between the input and the output is winSize at maximum.
EXAMPLES::
EXAMPLES::
code::
code::
(some example code)
//load some sounds
b = Buffer.read(s,File.realpath(FluidOnsetSlice.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Nicol-LoopE-M.wav");
// basic param (the process add a latency of winSize samples
The width in bins of the fragment of the fft window that is considered a normal deviation for a potential continuous sinusoidal track. It has an effect on CPU cost: the widest is more accurate but more computationally expensive.
The width in bins of the fragment of the fft window that is considered a normal deviation for a potential continuous sinusoidal track. It has an effect on CPU cost: the widest is more accurate but more computationally expensive.
ARGUMENT:: thresh
ARGUMENT:: threshold
The normalised threshold, between 0 an 1, to consider a peak as a sinusoidal component from the normalized cross-correlation.
The normalised threshold, between 0 an 1, to consider a peak as a sinusoidal component from the normalized cross-correlation.
ARGUMENT:: minTrackLen
ARGUMENT:: minTrackLen
@ -59,7 +59,7 @@ CODE::
b = Buffer.read(s,File.realpath(FluidSines.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-AaS-SynthTwoVoices-M.wav");
b = Buffer.read(s,File.realpath(FluidSines.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-AaS-SynthTwoVoices-M.wav");
// run with large parameters - left is sinusoidal model, right is residual
// run with large parameters - left is sinusoidal model, right is residual
// at first it seems quite centred, but then flipped the argument FrqScl to lin(ear) and observe how high the spectrum goes. If we set it to a brickwall spectral filter tuned on the same frequencies:
// at first it seems quite centred, but then flip the argument FrqScl to lin(ear) and observe how high the spectrum goes. If we set it to a brickwall spectral filter tuned on the same frequencies:
x.set(\type, 1)
x.set(\type, 1)
@ -235,7 +235,7 @@ x = {
}.play;
}.play;
)
)
// this example shows a similar result to the brickwall bspectral bandpass above. If we move the central frequency nearer the half-Nyquist:
// this example shows a similar result to the brickwall spectral bandpass above. If we move the central frequency nearer the half-Nyquist: