TITLE:: FluidKMeans summary:: Cluster data points with K-Means categories:: FluidManipulation related:: Classes/FluidDataSet, Classes/FluidLabelSet, Classes/FluidKNNClassifier, Classes/FluidKNNRegressor DESCRIPTION:: Uses the K-Means algorithm to learn clusters from a link::Classes/FluidDataSet:: https://scikit-learn.org/stable/tutorial/statistical_inference/unsupervised_learning.html#clustering-grouping-observations-together CLASSMETHODS:: METHOD:: new Construct a new K Means model on the passed server. The parameters code::numClusters:: and code::maxIter:: can be modulated on an existing instance. ARGUMENT:: server If nil will use Server.default. ARGUMENT:: numClusters The number of clusters to classify data into. ARGUMENT:: maxIter The maximum number of iterations the algorithm will use whilst fitting. INSTANCEMETHODS:: PRIVATE::k METHOD:: fit Identify code::numClusters:: clusters in a link::Classes/FluidDataSet::. It will optimise until no improvement is possible, or up to code::maxIter::, whichever comes first. Subsequent calls will continue training from the stopping point with the same conditions. ARGUMENT:: dataSet A link::Classes/FluidDataSet:: of data points. ARGUMENT:: action A function to run when fitting is complete, taking as its argument an array with the number of data points for each cluster. METHOD:: predict Given a trained object, return the cluster ID for each data point in a link::Classes/FluidDataSet:: to a link::Classes/FluidLabelSet::. ARGUMENT:: dataSet A link::Classes/FluidDataSet:: containing the data to predict. ARGUMENT:: labelSet A link::Classes/FluidLabelSet:: to retrieve the predicted clusters. ARGUMENT:: action A function to run when the server responds. METHOD:: fitPredict Run link::Classes/FluidKMeans#*fit:: and link::Classes/FluidKMeans#*predict:: in a single pass: i.e. train the model on the incoming link::Classes/FluidDataSet:: and then return the learned clustering to the passed link::Classes/FluidLabelSet:: ARGUMENT:: dataSet a link::Classes/FluidDataSet:: containing the data to fit and predict. ARGUMENT:: labelSet a link::Classes/FluidLabelSet:: to retrieve the predicted clusters. ARGUMENT:: action A function to run when the server responds METHOD:: predictPoint Given a trained object, return the cluster ID for a data point in a link::Classes/Buffer:: ARGUMENT:: buffer A link::Classes/Buffer:: containing a data point. ARGUMENT:: action A function to run when the server responds, taking the ID of the cluster as its argument. METHOD:: transform Given a trained object, return for each item of a provided link::Classes/FluidDataSet:: its distance to each cluster as an array, often reffered to as the cluster-distance space. ARGUMENT:: srcDataSet A link::Classes/FluidDataSet:: of data points to transform. ARGUMENT:: dstDataSet A link::Classes/FluidDataSet:: to contain the new cluster-distance space. ARGUMENT:: action A function to run when complete. METHOD:: fitTransform Run link::Classes/FluidKMeans#*fit:: and link::Classes/FluidKMeans#*transform:: in a single pass: i.e. train the model on the incoming link::Classes/FluidDataSet:: and then return its cluster-distance space in the destination link::Classes/FluidDataSet:: ARGUMENT:: srcDataSet A link::Classes/FluidDataSet:: containing the data to fit and transform. ARGUMENT:: dstDataSet A link::Classes/FluidDataSet:: to contain the new cluster-distance space. ARGUMENT:: action A function to run when complete. METHOD:: transformPoint Given a trained object, return the distance of the provided point to each cluster. Both points are handled as link::Classes/Buffer:: ARGUMENT:: sourceBuffer A link::Classes/Buffer:: containing a data point to query. ARGUMENT:: targetBuffer A link::Classes/Buffer:: containing a the distance of the source point to each cluster. ARGUMENT:: action A function to run when complete. METHOD:: getMeans Given a trained object, retrieve the means (centroids) of each cluster as a link::Classes/FluidDataSet:: ARGUMENT:: dataSet A link::Classes/FluidDataSet:: of clusers with a mean per column. ARGUMENT:: action A function to run when complete. METHOD:: setMeans Overwrites the means (centroids) of each cluster, and declare the object trained. ARGUMENT:: dataSet A link::Classes/FluidDataSet:: of clusers with a mean per column. ARGUMENT:: action A function to run when complete. METHOD:: clear Reset the object status to not fitted and untrained. ARGUMENT:: action A function to run when complete. EXAMPLES:: code:: ( //Make some clumped 2D points and place into a DataSet ~points = (4.collect{ 64.collect{(1.sum3rand) + [1,-1].choose}.clump(2) }).flatten(1) * 0.5; fork{ ~dataSet = FluidDataSet(s); d = Dictionary.with( *[\cols -> 2,\data -> Dictionary.newFrom( ~points.collect{|x, i| [i, x]}.flatten)]); s.sync; ~dataSet.load(d, {~dataSet.print}); } ) // Create a KMeans instance and a LabelSet for the cluster labels in the server ~clusters = FluidLabelSet(s); ~kmeans = FluidKMeans(s); // Fit into 4 clusters ( ~kmeans.fitPredict(~dataSet,~clusters,action: {|c| "Fitted.\n # Points in each cluster:".postln; c.do{|x,i| ("Cluster" + i + "->" + x.asInteger + "points").postln; } }); ) // Cols of kmeans should match DataSet, size is the number of clusters ~kmeans.cols; ~kmeans.size; ~kmeans.dump; // Retrieve labels of clustered points ( ~assignments = Array.new(128); fork{ 128.do{ |i| ~clusters.getLabel(i,{|clusterID| (i.asString+clusterID).postln; ~assignments.add(clusterID) }); s.sync; } } ) //or faster by sorting the IDs ~clusters.dump{|x|~assignments = x.at("data").atAll(x.at("data").keys.asArray.sort{|a,b|a.asInteger < b.asInteger}).flatten.postln;} //Visualise: we're hoping to see colours neatly mapped to quandrants... ( d = ((~points + 1) * 0.5).flatten(1).unlace; w = Window("scatter", Rect(128, 64, 200, 200)); ~colours = [Color.blue,Color.red,Color.green,Color.magenta]; w.drawFunc = { Pen.use { d[0].size.do{|i| var x = (d[0][i]*200); var y = (d[1][i]*200); var r = Rect(x,y,5,5); Pen.fillColor = ~colours[~assignments[i].asInteger]; Pen.fillOval(r); } } }; w.refresh; w.front; ) // single point transform on arbitrary value ~inbuf = Buffer.loadCollection(s,0.5.dup); ~kmeans.predictPoint(~inbuf,{|x|x.postln;}); :: subsection:: Accessing the means We can get and set the means for each cluster, their centroid. code:: // with the dataset and kmeans generated and trained in the code above ~centroids = FluidDataSet(s); ~kmeans.getMeans(~centroids, {~centroids.print}); // We can also set them to arbitrary values to seed the process ~centroids.load(Dictionary.newFrom([\cols, 2, \data, Dictionary.newFrom([\0, [0.5,0.5], \1, [-0.5,0.5], \2, [0.5,-0.5], \3, [-0.5,-0.5]])])); ~centroids.print ~kmeans.setMeans(~centroids, {~kmeans.predict(~dataSet,~clusters,{~clusters.dump{|x|var count = 0.dup(4); x["data"].keysValuesDo{|k,v|count[v[0].asInteger] = count[v[0].asInteger] + 1;};count.postln}})}); // We can further fit from the seeded means ~kmeans.fit(~dataSet) // then retreive the improved means ~kmeans.getMeans(~centroids, {~centroids.print}); //subtle in this case but still.. each quadrant is where we seeded it. :: subsection:: Cluster-distance Space We can get the euclidian distance of a given point to each cluster. This is often referred to as the cluster-distance space as it creates new dimensions for each given point, one distance per cluster. code:: // with the dataset and kmeans generated and trained in the code above b = Buffer.sendCollection(s,[0.5,0.5]) c = Buffer(s) // get the distance of our given point (b) to each cluster, thus giving us 4 dimensions in our cluster-distance space ~kmeans.transformPoint(b,c,{|x|x.query;x.getn(0,x.numFrames,{|y|y.postln})}) // we can also transform a full dataset ~srcDS = FluidDataSet(s) ~cdspace = FluidDataSet(s) // make a new dataset with 4 points ~srcDS.load(Dictionary.newFrom([\cols, 2, \data, Dictionary.newFrom([\pp, [0.5,0.5], \np, [-0.5,0.5], \pn, [0.5,-0.5], \nn, [-0.5,-0.5]])])); ~kmeans.transform(~srcDS, ~cdspace, {~cdspace.print}) :: subsection:: Queries in a Synth This is the equivalent of predictPoint, but wholly on the server code:: ( { var trig = Impulse.kr(5); var point = WhiteNoise.kr(1.dup); var inputPoint = LocalBuf(2); var outputPoint = LocalBuf(1); Poll.kr(trig, point, [\pointX,\pointY]); point.collect{ |p,i| BufWr.kr([p],inputPoint,i)}; ~kmeans.kr(trig,inputPoint,outputPoint); Poll.kr(trig,BufRd.kr(1,outputPoint,0,interpolation:0),\cluster); }.play; ) // to sonify the output, here are random values alternating quadrant, generated more quickly as the cursor moves rightwards ( { var trig = Impulse.kr(MouseX.kr(0,1).exprange(0.5,ControlRate.ir / 2)); var step = Stepper.kr(trig,max:3); var point = TRand.kr(-0.1, [0.1, 0.1], trig) + [step.mod(2).linlin(0,1,-0.6,0.6),step.div(2).linlin(0,1,-0.6,0.6)] ; var inputPoint = LocalBuf(2); var outputPoint = LocalBuf(1); point.collect{|p,i| BufWr.kr([p],inputPoint,i)}; ~kmeans.kr(trig,inputPoint,outputPoint); SinOsc.ar((BufRd.kr(1,outputPoint,0,interpolation:0) + 69).midicps,mul: 0.1); }.play; ) ::