|
|
TITLE:: FluidBufHPSS
|
|
|
SUMMARY:: Buffer-Based Harmonic-Percussive Source Separation Using Median Filtering
|
|
|
CATEGORIES:: Libraries>FluidDecomposition, UGens>Buffer
|
|
|
RELATED:: Guides/FluCoMa, Guides/FluidDecomposition, Guides/FluidBufMultiThreading
|
|
|
|
|
|
|
|
|
DESCRIPTION::
|
|
|
A FluidBufHPSS object performs Harmonic-Percussive Source Separation (HPSS) on the contents of a link::Classes/Buffer::. The class performs HPSS as described in its original form footnote::
|
|
|
Fitzgerald, Derry. 2010. ‘Harmonic/Percussive Separation Using Median Filtering’. In Proceedings DaFx 10. https://arrow.dit.ie/argcon/67.
|
|
|
:: as well as a variation on the extension propsoed by Driedger et al. footnote::
|
|
|
Driedger, Jonathan, Meinard Uller, and Sascha Disch. 2014. ‘Extending Harmonic-Percussive Separation of Audio Signals’. In Proc. ISMIR. http://www.terasoft.com.tw/conf/ismir2014/proceedings/T110_127_Paper.pdf.
|
|
|
::
|
|
|
|
|
|
The algorithm takes a buffer in, and divides it into two or three outputs, depending on the mode: LIST::
|
|
|
## an harmonic component;
|
|
|
## a percussive component;
|
|
|
## a residual of the previous two if the flag is set to inter-dependant thresholds. See the maskingMode below.::
|
|
|
|
|
|
It is part of the LINK:: Guides/FluidDecomposition:: of LINK:: Guides/FluCoMa::. For more explanations, learning material, and discussions on its musicianly uses, visit http://www.flucoma.org/
|
|
|
|
|
|
STRONG::Threading::
|
|
|
|
|
|
By default, this UGen spawns a new thread to avoid blocking the server command queue, so it is free to go about with its business. For a more detailed discussion of the available threading and monitoring options, including the two undocumented Class Methods below (.processBlocking and .kr) please read the guide LINK::Guides/FluidBufMultiThreading::.
|
|
|
|
|
|
CLASSMETHODS::
|
|
|
|
|
|
METHOD:: process, processBlocking
|
|
|
This is the method that calls for the HPSS to be calculated on a given source buffer.
|
|
|
|
|
|
ARGUMENT:: server
|
|
|
The server on which the buffers to be processed are allocated.
|
|
|
|
|
|
ARGUMENT:: source
|
|
|
The index of the buffer to use as the source material. The channels of multichannel buffers will be processed sequentially.
|
|
|
|
|
|
ARGUMENT:: startFrame
|
|
|
Where in the srcBuf should the HPSS process start, in samples.
|
|
|
|
|
|
ARGUMENT:: numFrames
|
|
|
How many frames should be processed.
|
|
|
|
|
|
ARGUMENT:: startChan
|
|
|
For multichannel srcBuf, which channel to start processing at.
|
|
|
|
|
|
ARGUMENT:: numChans
|
|
|
For multichannel srcBuf, how many channels should be processed.
|
|
|
|
|
|
ARGUMENT:: harmonic
|
|
|
The index of the buffer where the extracted harmonic component will be reconstructed.
|
|
|
|
|
|
ARGUMENT:: percussive
|
|
|
The index of the buffer where the extracted percussive component will be reconstructed.
|
|
|
|
|
|
ARGUMENT:: residual
|
|
|
The index of the buffer where the residual component will be reconstructed in mode 2.
|
|
|
|
|
|
ARGUMENT:: harmFilterSize
|
|
|
The size, in spectral frames, of the median filter for the harmonic component. Must be an odd number, >= 3.
|
|
|
|
|
|
ARGUMENT:: percFilterSize
|
|
|
The size, in spectral bins, of the median filter for the percussive component. Must be an odd number, >=3
|
|
|
|
|
|
ARGUMENT:: maskingMode
|
|
|
The way the masking is applied to the original spectrogram. (0,1,2)
|
|
|
table::
|
|
|
## 0 || The traditional soft mask used in Fitzgerald's original method of 'Wiener-inspired' filtering. Complimentary, soft masks are made for the harmonic and percussive parts by allocating some fraction of a point in time-frequency to each. This provides the fewest artefacts, but the weakest separation. The two resulting buffers will sum to exactly the original material.
|
|
|
## 1 || Relative mode - Better separation, with more artefacts. The harmonic mask is constructed using a binary decision, based on whether a threshold is exceeded at a given time-frequency point (these are set using harmThreshFreq1, harmThreshAmp1, harmThreshFreq2, harmThreshAmp2, see below). The percussive mask is then formed as the inverse of the harmonic one, meaning that as above, the two components will sum to the original sound.
|
|
|
|
|
|
## 2 || Inter-dependent mode - Thresholds can be varied independently, but are coupled in effect. Binary masks are made for each of the harmonic and percussive components, and the masks are converted to soft at the end so that everything null sums even if the params are independent, that is what makes it harder to control. These aren't guranteed to cover the whole sound; in this case the 'leftovers' will placed into a third buffer.
|
|
|
::
|
|
|
|
|
|
ARGUMENT:: harmThreshFreq1
|
|
|
In modes 1 and 2, the frequency of the low part of the threshold for the harmonic filter (0-1)
|
|
|
|
|
|
ARGUMENT:: harmThreshAmp1
|
|
|
In modes 1 and 2, the threshold of the low part for the harmonic filter. That threshold applies to all frequencies up to harmThreshFreq1: how much more powerful (in dB) the harmonic median filter needs to be than the percussive median filter for this bin to be counted as harmonic.
|
|
|
|
|
|
ARGUMENT:: harmThreshFreq2
|
|
|
In modes 1 and 2, the frequency of the hight part of the threshold for the harmonic filter. (0-1)
|
|
|
|
|
|
ARGUMENT:: harmThreshAmp2
|
|
|
In modes 1 and 2, the threshold of the high part for the harmonic filter. That threshold applies to all frequencies above harmThreshFreq2. The threshold between harmThreshFreq1 and harmThreshFreq2 is interpolated between harmThreshAmp1 and harmThreshAmp2. How much more powerful (in dB) the harmonic median filter needs to be than the percussive median filter for this bin to be counted as harmonic.
|
|
|
|
|
|
ARGUMENT:: percThreshFreq1
|
|
|
In mode 2, the frequency of the low part of the threshold for the percussive filter. (0-1)
|
|
|
|
|
|
ARGUMENT:: percThreshAmp1
|
|
|
In mode 2, the threshold of the low part for the percussive filter. That threshold applies to all frequencies up to percThreshFreq1. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
|
|
|
|
|
|
ARGUMENT:: percThreshFreq2
|
|
|
In mode 2, the frequency of the hight part of the threshold for the percussive filter. (0-1)
|
|
|
|
|
|
ARGUMENT:: percThreshAmp2
|
|
|
In mode 2, the threshold of the high part for the percussive filter. That threshold applies to all frequencies above percThreshFreq2. The threshold between percThreshFreq1 and percThreshFreq2 is interpolated between percThreshAmp1 and percThreshAmp2. How much more powerful (in dB) the percussive median filter needs to be than the harmonic median filter for this bin to be counted as percussive.
|
|
|
|
|
|
ARGUMENT:: windowSize
|
|
|
The window size in samples. As HPSS relies on spectral frames, we need to decide what precision we give it spectrally and temporally, in line with Gabor Uncertainty principles. http://www.subsurfwiki.org/wiki/Gabor_uncertainty
|
|
|
|
|
|
ARGUMENT:: hopSize
|
|
|
The window hop size in samples. As HPSS relies on spectral frames, we need to move the window forward. It can be any size but low overlap may create audible artefacts. The -1 default value will default to half of windowSize (overlap of 2).
|
|
|
|
|
|
ARGUMENT:: fftSize
|
|
|
The inner FFT/IFFT size. It should be at least 4 samples long; at least the size of the window; and a power of 2. Making it larger than the window size provides interpolation in frequency. The -1 default value will use the next power of 2 equal or above the windowSize.
|
|
|
|
|
|
ARGUMENT:: freeWhenDone
|
|
|
Free the server instance when processing complete. Default true
|
|
|
|
|
|
ARGUMENT:: action
|
|
|
A Function to be evaluated once the offline process has finished and all Buffer's instance variables have been updated on the client side. The function will be passed [harmonic, percussive, residual] as an argument.
|
|
|
|
|
|
returns:: an instance of the processor
|
|
|
|
|
|
Discussion::
|
|
|
HPSS works by using median filters on the spectral magnitudes of a sound. It hinges on a simple modelling assumption that tonal components will tend to yield concentrations of energy across time, spread out in frequency, and percussive components will manifest as concentrations of energy across frequency, spread out in time. By using median filters across time and frequency respectively, we get initial esitmates of the tonal-ness / transient-ness of a point in time and frequency. These are then combined into 'masks' that are applied to the orginal spectral data in order to produce a separation.
|
|
|
|
|
|
The maskingMode parameter provides different approaches to combinging estimates and producing masks. Some settings (especially in modes 1 & 2) will provide better separation but with more artefacts. These can, in principle, be ameliorated by applying smoothing filters to the masks before transforming back to the time-domain (not yet implemented).
|
|
|
|
|
|
EXAMPLES::
|
|
|
|
|
|
code::
|
|
|
//load buffers
|
|
|
(
|
|
|
b = Buffer.read(s,File.realpath(FluidBufHPSS.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-AaS-SynthTwoVoices-M.wav");
|
|
|
c = Buffer.new(s);
|
|
|
d = Buffer.new(s);
|
|
|
e = Buffer.new(s);
|
|
|
)
|
|
|
|
|
|
// run with basic parameters
|
|
|
(
|
|
|
Routine{
|
|
|
t = Main.elapsedTime;
|
|
|
FluidBufHPSS.process(s, b, harmonic: c, percussive: d).wait;
|
|
|
(Main.elapsedTime - t).postln;
|
|
|
}.play
|
|
|
)
|
|
|
c.query
|
|
|
d.query
|
|
|
//play the harmonic
|
|
|
c.play;
|
|
|
//play the percussive
|
|
|
d.play;
|
|
|
|
|
|
//nullsumming tests
|
|
|
{(PlayBuf.ar(1,c))+(PlayBuf.ar(1,d))+(-1*PlayBuf.ar(1,b,doneAction:2))}.play
|
|
|
|
|
|
//more daring parameters, in mode 2
|
|
|
(
|
|
|
Routine{
|
|
|
t = Main.elapsedTime;
|
|
|
FluidBufHPSS.process(s, b, harmonic: c, percussive: d, residual:e, harmFilterSize:31, maskingMode:2, harmThreshFreq1: 0.005, harmThreshAmp1: 7.5, harmThreshFreq2: 0.168, harmThreshAmp2: 7.5, percThreshFreq1: 0.004, percThreshAmp1: 26.5, percThreshFreq2: 0.152, percThreshAmp2: 26.5,windowSize:4096,hopSize:512)
|
|
|
.wait;
|
|
|
(Main.elapsedTime - t).postln;
|
|
|
}.play
|
|
|
)
|
|
|
|
|
|
//play the harmonic
|
|
|
c.play;
|
|
|
//play the percussive
|
|
|
d.play;
|
|
|
//play the residual
|
|
|
e.play;
|
|
|
|
|
|
//still nullsumming
|
|
|
{PlayBuf.ar(1,c) + PlayBuf.ar(1,d) + PlayBuf.ar(1,e) - PlayBuf.ar(1,b,doneAction:2)}.play;
|
|
|
::
|
|
|
|
|
|
STRONG::A stereo buffer example.::
|
|
|
CODE::
|
|
|
|
|
|
// load two very different files
|
|
|
(
|
|
|
b = Buffer.read(s,File.realpath(FluidBufHPSS.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-SA-UprightPianoPedalWide.wav");
|
|
|
c = Buffer.read(s,File.realpath(FluidBufHPSS.class.filenameSymbol).dirname.withTrailingSlash ++ "../AudioFiles/Tremblay-AaS-AcousticStrums-M.wav");
|
|
|
)
|
|
|
|
|
|
// composite one on left one on right as test signals
|
|
|
(
|
|
|
Routine{
|
|
|
FluidBufCompose.process(s, c, numFrames:b.numFrames, startFrame:555000,destStartChan:1, destination:b).wait;
|
|
|
b.play
|
|
|
}.play
|
|
|
)
|
|
|
// create 2 new buffers as destinations
|
|
|
d = Buffer.new(s); e = Buffer.new(s);
|
|
|
|
|
|
//run the process on them
|
|
|
(
|
|
|
Routine{
|
|
|
t = Main.elapsedTime;
|
|
|
FluidBufHPSS.process(s, b, harmonic: d, percussive:e).wait;
|
|
|
(Main.elapsedTime - t).postln;
|
|
|
}.play
|
|
|
)
|
|
|
|
|
|
//listen: stereo preserved!
|
|
|
d.play
|
|
|
e.play
|
|
|
::
|