Identify code::k:: clusters in a link::Classes/FluidDataSet::
Identify code::numClusters:: clusters in a link::Classes/FluidDataSet::. It will optimise until no improvement is possible, or up to code::maxIter::, whichever comes first. Subsequent calls will continue training from the stopping point with the same conditions.
ARGUMENT:: dataSet
A link::Classes/FluidDataSet:: of data points.
ARGUMENT:: action
@ -56,55 +56,51 @@ a link::Classes/Buffer:: containing a data point.
ARGUMENT:: action
A function to run when the server responds, taking the ID of the cluster as its argument.
METHOD:: predict
Report cluster assignments for previously unseen data.
ARGUMENT:: dataSet
A link::Classes/FluidDataSet:: of data points.
ARGUMENT:: labelSet
A link::Classes/FluidLabelSet:: to contain assignments.
METHOD:: transform
Given a trained object, return for each item of a provided DataSet its distance to each cluster as an array, often reffered to as the cluster-distance space.
ARGUMENT:: srcDataSet
A link::Classes/FluidDataSet:: of data points to transform.
ARGUMENT:: dstDataSet
A link::Classes/FluidDataSet:: to contain the new cluster-distance space.
ARGUMENT:: action
A function to run when complete, taking an array of the counts for each category as its argument.
A function to run when complete.
METHOD:: fitTransform
Run link::Classes/FluidKMeans#*fit:: and link::Classes/FluidKMeans#*predict:: in a single pass: i.e. train the model on the incoming link::Classes/FluidDataSet:: and then return the learned clustering to the passed link::Classes/FluidLabelSet::
Run link::Classes/FluidKMeans#*fit:: and link::Classes/FluidKMeans#*transform:: in a single pass: i.e. train the model on the incoming link::Classes/FluidDataSet:: and then return its cluster-distance space in the destination link::Classes/FluidDataSet::
ARGUMENT:: srcDataSet
a link::Classes/FluidDataSet:: containing the data to fit and predict.
ARGUMENT:: dstDataSet
a link::Classes/FluidLabelSet:: to retrieve the predicted clusters.
A link::Classes/FluidDataSet:: to contain the new cluster-distance space.
ARGUMENT:: action
A function to run when the server responds
A function to run when complete.
METHOD:: transformPoint
Given a trained object, return the cluster ID for a data point in a link::Classes/Buffer::
Given a trained object, return the distance of the provided point to each cluster. Both points are handled as link::Classes/Buffer::
ARGUMENT:: sourceBuffer
a link::Classes/Buffer:: containing a data point.
a link::Classes/Buffer:: containing a data point to query.
ARGUMENT:: targetBuffer
a link::Classes/Buffer:: containing a data point.
ARGUMENT:: action
A function to run when the server responds, taking the ID of the cluster as its argument.
METHOD:: transform
Report cluster assignments for previously unseen data.
ARGUMENT:: srcDataSet
A link::Classes/FluidDataSet:: of data points.
ARGUMENT:: dstDataSet
A link::Classes/FluidLabelSet:: to contain assignments.
a link::Classes/Buffer:: containing a the distance of the source point to each cluster.
ARGUMENT:: action
A function to run when complete, taking an array of the counts for each category as its argument.
A function to run when complete.
METHOD:: getMeans
Report cluster assignments for previously unseen data.
Given a trained object, retrieve the means (centroids) of each cluster as a link::Classes/FluidDataSet::
ARGUMENT:: dataSet
A link::Classes/FluidDataSet:: of data points.
A link::Classes/FluidDataSet:: of clusers with a mean per column.
ARGUMENT:: action
A function to run when complete, taking an array of the counts for each category as its argument.
A function to run when complete.
METHOD:: setMeans
Report cluster assignments for previously unseen data.
Overwrites the means (centroids) of each cluster, and declare the object trained.
ARGUMENT:: dataSet
A link::Classes/FluidDataSet:: of data points.
A link::Classes/FluidDataSet:: of clusers with a mean per column.
ARGUMENT:: action
A function to run when complete.
METHOD:: clear
Reset the object status to not fitted and untrained.
ARGUMENT:: action
A function to run when complete, taking an array of the counts for each category as its argument.
A function to run when complete.
EXAMPLES::
code::
@ -192,30 +188,40 @@ subsection:: Accessing the means
We can get and set the means for each cluster, their centroid.
code::
// with the dataset and kmeans generated and trained in the code above
~centroids = FluidDataSet(s);
~kmeans.getMeans(~centroids, {~centroids.print});
// We can also set them to arbitrary values to seed the process
//subtle in this case but still.. each quadrant is where we seeded it.
::
subsection:: Cluster-distance Space
You can get the euclidian distance of a given point to each cluster.
We can get the euclidian distance of a given point to each cluster. This is often referred to as the cluster-distance space as it creates new dimensions for each given point, one distance per cluster.
code::
// with the dataset and kmeans generated and trained in the code above
b = Buffer.sendCollection(s,[0.5,0.5])
c = Buffer(s)
// get the distance of our given point (b) to each cluster, thus giving us 4 dimensions in our cluster-distance space