The FluidBufNMF object decomposes the spectrum of a sound into a number of components using Non-Negative Matrix Factorisation (NMF) footnote:: Lee, Daniel D., and H. Sebastian Seung. 1999. ‘Learning the Parts of Objects by Non-Negative Matrix Factorization’. Nature 401 (6755): 788–91. https://doi.org/10.1038/44565.
::. NMF has been a popular technique in signal processing research for things like source separation and transcription footnote:: Smaragdis and Brown, Non-Negative Matrix Factorization for Polyphonic Music Transcription.::, although its creative potential is so far relatively unexplored.
The algorithm takes a buffer in and divides it into a number of components, determined by the STRONG::Components:: argument. It works iteratively, by trying to find a combination of spectral templates ('bases') and envelopes ('activations') that yield the original magnitude spectrogram when added together. By and large, there is no unique answer to this question (i.e. there are different ways of accounting for an evolving spectrum in terms of some set of templates and envelopes). In its basic form, NMF is a form of unsupervised learning: it starts with some random data and then converges towards something that minimizes the distance between its generated data and the original:it tends to converge very quickly at first and then level out. Fewer iterations mean less processing, but also less predictable results.
Decomposes the spectrum of a sound into a number of components using Non-Negative Matrix Factorisation (NMF)
The object can return either or all of the following: LIST::
## a spectral contour of each component in the form of a magnitude spectrogram (called a basis in NMF lingo);
## an amplitude envelope of each component in the form of gains for each consecutive frame of the underlying spectrogram (called an activation in NMF lingo);
## an audio reconstruction of each components in the time domain. ::
The bases and activations can be used to make a kind of vocoder based on what NMF has 'learned' from the original data. Alternatively, taking the matrix product of a basis and an activation will yield a synthetic magnitude spectrogram of a component (which could be reconsructed, given some phase informaiton from somewhere).
NMF has been a popular technique in signal processing research for things like source separation and transcription (see Smaragdis and Brown, Non-Negative Matrix Factorization for Polyphonic Music Transcription.), although its creative potential is so far relatively unexplored.
Some additional options and flexibility can be found through combinations of the basesMode and actMode arguments. If these flags are set to 1, the object expects to be supplied with pre-formed spectra (or envelopes) that will be used as seeds for the decomposition, providing more guided results. When set to 2, the supplied buffers won't be updated, so become templates to match against instead. Note that having both bases and activations set to 2 doesn't make sense, so the object will complain.
The algorithm takes a buffer in and divides it into a number of components, determined by the components argument. It works iteratively, by trying to find a combination of spectral templates ('bases') and envelopes ('activations') that yield the original magnitude spectrogram when added together. By and large, there is no unique answer to this question (i.e. there are different ways of accounting for an evolving spectrum in terms of some set of templates and envelopes). In its basic form, NMF is a form of unsupervised learning: it starts with some random data and then converges towards something that minimizes the distance between its generated data and the original:it tends to converge very quickly at first and then level out. Fewer iterations mean less processing, but also less predictable results.
If supplying pre-formed data, it's up to the user to make sure that the supplied buffers are the right size: LIST::
## bases must be STRONG::(fft size / 2) + 1:: frames and STRONG::(components * input channels):: channels
## activations must be STRONG::(input frames / hopSize) + 1:: frames and STRONG::(components * input channels):: channels
::
DEFINITIONLIST::
## The object can return either or all of the following:
||
LIST::
##
a spectral contour of each component in the form of a magnitude spectrogram (called a basis in NMF lingo);
##
an amplitude envelope of each component in the form of gains for each consecutive frame of the underlying spectrogram (called an activation in NMF lingo);
##
an audio reconstruction of each components in the time domain.
::
::
The bases and activations can be used to make a kind of vocoder based on what NMF has 'learned' from the original data. Alternatively, taking the matrix product of a basis and an activation will yield a synthetic magnitude spectrogram of a component (which could be reconsructed, given some phase informaiton from somewhere).
Some additional options and flexibility can be found through combinations of the basesMode and actMode arguments. If these flags are set to 1, the object expects to be supplied with pre-formed spectra (or envelopes) that will be used as seeds for the decomposition, providing more guided results. When set to 2, the supplied buffers won't be updated, so become templates to match against instead. Note that having both bases and activations set to 2 doesn't make sense, so the object will complain.
In this implementation, the components are reconstructed by masking the original spectrum, such that they will sum to yield the original sound.
DEFINITIONLIST::
## If supplying pre-formed data, it's up to the user to make sure that the supplied buffers are the right size:
||
LIST::
##
bases must be frames and channels
The whole process can be related to a channel vocoder where, instead of fixed bandpass filters, we get more complex filter shapes that are learned from the data, and the activations correspond to channel envelopes.
##
activations must be frames and channels
FluidBufNMF is part of the LINK::Guides/FluidCorpusManipulation::. For more explanations, learning material, and discussions on its musicianly uses, visit http://www.flucoma.org/
::
::
In this implementation, the components are reconstructed by masking the original spectrum, such that they will sum to yield the original sound.
STRONG::Threading::
The whole process can be related to a channel vocoder where, instead of fixed bandpass filters, we get more complex filter shapes that are learned from the data, and the activations correspond to channel envelopes.
By default, this UGen spawns a new thread to avoid blocking the server command queue, so it is free to go about with its business. For a more detailed discussion of the available threading and monitoring options, including the two undocumented Class Methods below (.processBlocking and .kr) please read the guide LINK::Guides/FluidBufMultiThreading::.
CLASSMETHODS::
METHOD:: process, processBlocking
This is the method that calls for the factorisation to be calculated on a given source buffer.
Processs the source LINK::Classes/Buffer:: on the LINK::Classes/Server::. CODE::processBlocking:: will execute directly in the server command FIFO, whereas CODE::process:: will delegate to a separate worker thread. The latter is generally only worthwhile for longer-running jobs where you don't wish to tie up the server.
ARGUMENT:: server
The server on which the buffers to be processed are allocated.
The LINK::Classes/Server:: on which the buffers to be processed are allocated.
ARGUMENT:: source
The index of the buffer to use as the source material to be decomposed through the NMF process. The different channels of multichannel buffers will be processing sequentially.
The index of the buffer to use as the source material to be decomposed through the NMF process. The different channels of multichannel buffers will be processing sequentially.
ARGUMENT:: startFrame
Where in the srcBuf should the NMF process start, in sample.
Where in the srcBuf should the NMF process start, in sample.
STRONG::Constraints::
LIST::
##
Minimum: 0
::
ARGUMENT:: numFrames
How many frames should be processed.
How many frames should be processed.
ARGUMENT:: startChan
For multichannel srcBuf, which channel should be processed first.
For multichannel srcBuf, which channel should be processed first.
STRONG::Constraints::
LIST::
##
Minimum: 0
::
ARGUMENT:: numChans
For multichannel srcBuf, how many channel should be processed.
For multichannel srcBuf, how many channel should be processed.
ARGUMENT:: resynth
The index of the buffer where the different reconstructed components will be reconstructed. The buffer will be resized to STRONG::components * numChannelsProcessed:: channels and STRONG::sourceDuration:: lenght. If STRONG::nil:: is provided, the reconstruction will not happen.
The index of the buffer where the different reconstructed components will be reconstructed. The buffer will be resized to channels and lenght. If is provided, the reconstruction will not happen.
ARGUMENT:: bases
The index of the buffer where the different bases will be written to and/or read from: the behaviour is set in the following argument. If STRONG::nil:: is provided, no bases will be returned.
The index of the buffer where the different bases will be written to and/or read from: the behaviour is set in the following argument. If is provided, no bases will be returned.
ARGUMENT:: basesMode
This flag decides of how the basis buffer passed as the previous argument is treated.
table::
## 0 || The bases are seeded randomly, and the resulting ones will be written after the process in the passed buffer. The buffer is resized to STRONG::components * numChannelsProcessed:: channels and STRONG::(fftSize / 2 + 1):: lenght.
## 1 || The passed buffer is considered as seed for the bases. Its dimensions should match the values above. The resulting bases will replace the seed ones.
## 2 || The passed buffer is considered as a template for the bases, and will therefore not change. Its bases should match the values above.
::
This flag decides of how the basis buffer passed as the previous argument is treated.
ARGUMENT:: activations
The index of the buffer where the different activations will be written to and/or read from: the behaviour is set in the following argument. If STRONG::nil:: is provided, no activation will be returned.
The index of the buffer where the different activations will be written to and/or read from: the behaviour is set in the following argument. If is provided, no activation will be returned.
ARGUMENT:: actMode
This flag decides of how the activation buffer passed as the previous argument is treated.
table::
## 0 || The activations are seeded randomly, and the resulting ones will be written after the process in the passed buffer. The buffer is resized to STRONG::components * numChannelsProcessed:: channels and STRONG::(sourceDuration / hopsize + 1):: lenght.
## 1 || The passed buffer is considered as seed for the activations. Its dimensions should match the values above. The resulting activations will replace the seed ones.
## 2 || The passed buffer is considered as a template for the activations, and will therefore not change. Its dimensions should match the values above.
::
This flag decides of how the activation buffer passed as the previous argument is treated.
ARGUMENT:: components
The number of elements the NMF algorithm will try to divide the spectrogram of the source in.
The number of elements the NMF algorithm will try to divide the spectrogram of the source in.
STRONG::Constraints::
LIST::
##
Minimum: 1
::
ARGUMENT:: iterations
The NMF process is iterative, trying to converge to the smallest error in its factorisation. The number of iterations will decide how many times it tries to adjust its estimates. Higher numbers here will be more CPU expensive, lower numbers will be more unpredictable in quality.
The NMF process is iterative, trying to converge to the smallest error in its factorisation. The number of iterations will decide how many times it tries to adjust its estimates. Higher numbers here will be more CPU expensive, lower numbers will be more unpredictable in quality.
STRONG::Constraints::
LIST::
##
Minimum: 1
::
ARGUMENT:: windowSize
The window size. As NMF relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
The window size. As NMF relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. LINK::http://www.subsurfwiki.org/wiki/Gabor_uncertainty::
ARGUMENT:: hopSize
The window hop size. As NMF relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
The window hop size. As NMF relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts.
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision. The -1 default value will use the next power of 2 equal or above the windowSize.
ARGUMENT:: windowType
The inner FFT/IFFT windowing type (not implemented yet)
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision.
ARGUMENT:: randomSeed
The NMF process needs to seed its starting point. If specified, the same values will be used. The default of -1 will randomly assign them. (not implemented yet)
ARGUMENT:: freeWhenDone
Free the server instance when processing complete. Default true
Free the server instance when processing complete. Default CODE::true::
ARGUMENT:: action
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [resynth, bases, activations] as an argument.
A function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed CODE::[features]:: as an argument.
returns:: an instance of the processor
RETURNS:: An instance of the processor
METHOD:: kr
Trigger the equivalent behaviour to CODE::processBlocking / process:: from a LINK::Classes/Synth::. Can be useful for expressing a sequence of buffer and data processing jobs to execute. Note that the work still executes on the server command FIFO (not the audio thread), and it is the caller's responsibility to manage the sequencing, using the CODE::done:: status of the various UGens.
ARGUMENT:: source
The index of the buffer to use as the source material to be decomposed through the NMF process. The different channels of multichannel buffers will be processing sequentially.
ARGUMENT:: startFrame
Where in the srcBuf should the NMF process start, in sample.
STRONG::Constraints::
LIST::
##
Minimum: 0
::
ARGUMENT:: numFrames
How many frames should be processed.
ARGUMENT:: startChan
For multichannel srcBuf, which channel should be processed first.
STRONG::Constraints::
LIST::
##
Minimum: 0
::
ARGUMENT:: numChans
For multichannel srcBuf, how many channel should be processed.
ARGUMENT:: resynth
The index of the buffer where the different reconstructed components will be reconstructed. The buffer will be resized to channels and lenght. If is provided, the reconstruction will not happen.
ARGUMENT:: bases
The index of the buffer where the different bases will be written to and/or read from: the behaviour is set in the following argument. If is provided, no bases will be returned.
ARGUMENT:: basesMode
This flag decides of how the basis buffer passed as the previous argument is treated.
ARGUMENT:: activations
The index of the buffer where the different activations will be written to and/or read from: the behaviour is set in the following argument. If is provided, no activation will be returned.
ARGUMENT:: actMode
This flag decides of how the activation buffer passed as the previous argument is treated.
ARGUMENT:: components
The number of elements the NMF algorithm will try to divide the spectrogram of the source in.
STRONG::Constraints::
LIST::
##
Minimum: 1
::
ARGUMENT:: iterations
The NMF process is iterative, trying to converge to the smallest error in its factorisation. The number of iterations will decide how many times it tries to adjust its estimates. Higher numbers here will be more CPU expensive, lower numbers will be more unpredictable in quality.
STRONG::Constraints::
LIST::
##
Minimum: 1
::
ARGUMENT:: windowSize
The window size. As NMF relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. LINK::http://www.subsurfwiki.org/wiki/Gabor_uncertainty::
ARGUMENT:: hopSize
The window hop size. As NMF relies on spectral frames, we need to move the window forward. It can be any size but low overlap will create audible artefacts.
ARGUMENT:: fftSize
The inner FFT/IFFT size. It should be at least 4 samples long, at least the size of the window, and a power of 2. Making it larger allows an oversampling of the spectral precision.
ARGUMENT:: trig
A CODE::kr:: signal that will trigger execution
ARGUMENT:: blocking
Whether to execute this process directly on the server command FIFO or delegate to a worker thread. See CODE::processBlocking/process:: for caveats.
INSTANCEMETHODS::
METHOD:: kr
Returns a UGen that reports the progress of the running task when executing in a worker thread. Calling code::scope:: with this can be used for a convinient progress monitor
METHOD:: cancel
Cancels non-blocking processing
METHOD:: wait
When called in the context of a LINK::Classes/Routine:: (it won't work otherwise), will block execution until the processor has finished. This can be convinient for writing sequences of processes more linearly than using lots of nested actions.
EXAMPLES::
STRONG::A didactic example::
CODE::
(
// create buffers
b = Buffer.alloc(s,44100);
c = Buffer.alloc(s, 44100);
d = Buffer.new(s);
e = Buffer.new(s);
f = Buffer.new(s);
g = Buffer.new(s);
)
// =============== decompose some sounds ===============
// let's decompose the drum loop that comes with the FluCoMa extension:
//10 channel are therefore giving 70 channels: the 7 shapes of component0, then 7 shapes of compoenent1, etc
~spectralshapes.query
// we then run the bufstats on them. Each channel, which had a time series (an envelop) of each descriptor, is reduced to 7 frames
// we then run the bufstats on them. Each channel, which had a time series (an envelope) of each descriptor, is reduced to 7 frames
~stats.query
// we then need to retrieve the values that are where we want: the first of every 7 for the centroid, and the 6th frame of them as we want the median. Because we retrieve the values in an interleave format, the select function gets a bit tricky but we get the following values: