diff --git a/release-packaging/HelpSource/Classes/FluidKMeans.schelp b/release-packaging/HelpSource/Classes/FluidKMeans.schelp index 8c49e9a..58d895c 100644 --- a/release-packaging/HelpSource/Classes/FluidKMeans.schelp +++ b/release-packaging/HelpSource/Classes/FluidKMeans.schelp @@ -25,7 +25,7 @@ INSTANCEMETHODS:: PRIVATE::k METHOD:: fit -Identify code::k:: clusters in a link::Classes/FluidDataSet:: +Identify code::numClusters:: clusters in a link::Classes/FluidDataSet::. It will optimise until no improvement is possible, or up to code::maxIter::, whichever comes first. Subsequent calls will continue training from the stopping point with the same conditions. ARGUMENT:: dataSet A link::Classes/FluidDataSet:: of data points. ARGUMENT:: action @@ -56,55 +56,51 @@ a link::Classes/Buffer:: containing a data point. ARGUMENT:: action A function to run when the server responds, taking the ID of the cluster as its argument. -METHOD:: predict -Report cluster assignments for previously unseen data. -ARGUMENT:: dataSet -A link::Classes/FluidDataSet:: of data points. -ARGUMENT:: labelSet -A link::Classes/FluidLabelSet:: to contain assignments. +METHOD:: transform +Given a trained object, return for each item of a provided DataSet its distance to each cluster as an array, often reffered to as the cluster-distance space. +ARGUMENT:: srcDataSet +A link::Classes/FluidDataSet:: of data points to transform. +ARGUMENT:: dstDataSet +A link::Classes/FluidDataSet:: to contain the new cluster-distance space. ARGUMENT:: action -A function to run when complete, taking an array of the counts for each category as its argument. +A function to run when complete. METHOD:: fitTransform -Run link::Classes/FluidKMeans#*fit:: and link::Classes/FluidKMeans#*predict:: in a single pass: i.e. train the model on the incoming link::Classes/FluidDataSet:: and then return the learned clustering to the passed link::Classes/FluidLabelSet:: +Run link::Classes/FluidKMeans#*fit:: and link::Classes/FluidKMeans#*transform:: in a single pass: i.e. train the model on the incoming link::Classes/FluidDataSet:: and then return its cluster-distance space in the destination link::Classes/FluidDataSet:: ARGUMENT:: srcDataSet a link::Classes/FluidDataSet:: containing the data to fit and predict. ARGUMENT:: dstDataSet -a link::Classes/FluidLabelSet:: to retrieve the predicted clusters. +A link::Classes/FluidDataSet:: to contain the new cluster-distance space. ARGUMENT:: action -A function to run when the server responds +A function to run when complete. METHOD:: transformPoint -Given a trained object, return the cluster ID for a data point in a link::Classes/Buffer:: +Given a trained object, return the distance of the provided point to each cluster. Both points are handled as link::Classes/Buffer:: ARGUMENT:: sourceBuffer -a link::Classes/Buffer:: containing a data point. +a link::Classes/Buffer:: containing a data point to query. ARGUMENT:: targetBuffer -a link::Classes/Buffer:: containing a data point. -ARGUMENT:: action -A function to run when the server responds, taking the ID of the cluster as its argument. - -METHOD:: transform -Report cluster assignments for previously unseen data. -ARGUMENT:: srcDataSet -A link::Classes/FluidDataSet:: of data points. -ARGUMENT:: dstDataSet -A link::Classes/FluidLabelSet:: to contain assignments. +a link::Classes/Buffer:: containing a the distance of the source point to each cluster. ARGUMENT:: action -A function to run when complete, taking an array of the counts for each category as its argument. +A function to run when complete. METHOD:: getMeans -Report cluster assignments for previously unseen data. +Given a trained object, retrieve the means (centroids) of each cluster as a link::Classes/FluidDataSet:: ARGUMENT:: dataSet -A link::Classes/FluidDataSet:: of data points. +A link::Classes/FluidDataSet:: of clusers with a mean per column. ARGUMENT:: action -A function to run when complete, taking an array of the counts for each category as its argument. +A function to run when complete. METHOD:: setMeans -Report cluster assignments for previously unseen data. +Overwrites the means (centroids) of each cluster, and declare the object trained. ARGUMENT:: dataSet -A link::Classes/FluidDataSet:: of data points. +A link::Classes/FluidDataSet:: of clusers with a mean per column. +ARGUMENT:: action +A function to run when complete. + +METHOD:: clear +Reset the object status to not fitted and untrained. ARGUMENT:: action -A function to run when complete, taking an array of the counts for each category as its argument. +A function to run when complete. EXAMPLES:: code:: @@ -192,30 +188,40 @@ subsection:: Accessing the means We can get and set the means for each cluster, their centroid. code:: +// with the dataset and kmeans generated and trained in the code above ~centroids = FluidDataSet(s); ~kmeans.getMeans(~centroids, {~centroids.print}); - - +// We can also set them to arbitrary values to seed the process ~centroids.load(Dictionary.newFrom([\cols, 2, \data, Dictionary.newFrom([\0, [0.5,0.5], \1, [-0.5,0.5], \2, [0.5,-0.5], \3, [-0.5,-0.5]])])); ~centroids.print ~kmeans.setMeans(~centroids, {~kmeans.predict(~dataSet,~clusters,{~clusters.dump{|x|var count = 0.dup(4); x["data"].keysValuesDo{|k,v|count[v[0].asInteger] = count[v[0].asInteger] + 1;};count.postln}})}); -~kmeans.clear -~kmeans.predict(~dataSet,~clusters) - +// We can further fit from the seeded means +~kmeans.fit(~dataSet) +// then retreive the improved means +~kmeans.getMeans(~centroids, {~centroids.print}); +//subtle in this case but still.. each quadrant is where we seeded it. :: subsection:: Cluster-distance Space -You can get the euclidian distance of a given point to each cluster. +We can get the euclidian distance of a given point to each cluster. This is often referred to as the cluster-distance space as it creates new dimensions for each given point, one distance per cluster. code:: +// with the dataset and kmeans generated and trained in the code above b = Buffer.sendCollection(s,[0.5,0.5]) c = Buffer(s) +// get the distance of our given point (b) to each cluster, thus giving us 4 dimensions in our cluster-distance space ~kmeans.transformPoint(b,c,{|x|x.query;x.getn(0,x.numFrames,{|y|y.postln})}) +// we can also transform a full dataset +~srcDS = FluidDataSet(s) +~cdspace = FluidDataSet(s) +// make a new dataset with 4 points +~srcDS.load(Dictionary.newFrom([\cols, 2, \data, Dictionary.newFrom([\pp, [0.5,0.5], \np, [-0.5,0.5], \pn, [0.5,-0.5], \nn, [-0.5,-0.5]])])); +~kmeans.transform(~srcDS, ~cdspace, {~cdspace.print}) :: subsection:: Queries in a Synth