From ddce5473a8b00537f4f41ef980d9cb3338df8637 Mon Sep 17 00:00:00 2001
From: Owen Green <gungwho@gmail.com>
Date: Tue, 19 May 2020 17:00:18 +0100
Subject: [PATCH] Improving help files: Dataset, Labelset, KDTree and KMeans

---
 .../HelpSource/Classes/FluidDataSet.schelp    | 141 +++++++--------
 .../HelpSource/Classes/FluidKDTree.schelp     | 113 ++++++++----
 .../HelpSource/Classes/FluidKMeans.schelp     | 167 +++++++++++++-----
 .../HelpSource/Classes/FluidLabelSet.schelp   |  91 +++++-----
 4 files changed, 322 insertions(+), 190 deletions(-)

diff --git a/release-packaging/HelpSource/Classes/FluidDataSet.schelp b/release-packaging/HelpSource/Classes/FluidDataSet.schelp
index f09b433..0212061 100644
--- a/release-packaging/HelpSource/Classes/FluidDataSet.schelp
+++ b/release-packaging/HelpSource/Classes/FluidDataSet.schelp
@@ -4,137 +4,138 @@ categories:: UGens>FluidManipulation
 related:: Classes/FluidLabelSet, Classes/FluidKDTree, Classes/FluidKNN, Classes/FluidKMeans
 ​
 DESCRIPTION::
-A server-side container associating labels with multi-dimensional data. FluidDataSet is identified by its name, and multiple instances of the object with the same name point to the same instance on the server.
+A server-side container associating labels with multi-dimensional data. FluidDataSet is identified by its name.
+
+
 ​
 CLASSMETHODS::
 ​
-PRIVATE::kr
+PRIVATE:: asUGenInput
 
 METHOD:: new
-Create a new instance of the dataset, with the given name and dimensionality. If an instance already exists on the server, then the existing dimensionality takes precedence.
-​
+Create a new instance of the dataset, with the given name. If a Dataset with this name already exists, an exception will be thrown (see link::Classes/FluidDataSet#at:: to access an extant Dataset)
+
 ARGUMENT:: server
 The link::Classes/Server:: on which to create the data set
-
 ARGUMENT:: name
 A symbol or string with the name of the dataset.
 ​
-ARGUMENT:: dims
-An integer number of dimensions
-​
 returns:: The new instance
 
+METHOD:: at
+Retreives a cached instance of a FluidDataSet with the given name, or returns nil if no such object exists.
+
+ARGUMENT:: server
+The server associated with this dataset instance
+ARGUMENT:: id
+The name of the Dataset to retreive from the cache
+
+
 INSTANCEMETHODS::
 ​
-PRIVATE::init,id
+PRIVATE:: init,id,cache
 
-METHOD:: synth
-The internal synth the object uses to communicate with the server
-​
-returns:: A link::Classes/Synth::
-​
-METHOD:: server
-The server instance the object uses
-​
-returns:: A link::Classes/Server::
-​​​
+METHOD:: addPoint
+Add a new point to the data set. The dimensionality of the dataset is governed by the size of the first point added.
+Will report an error if the label already exists, or if the size of the data does not match the dimensionality of the dataset.
+ARGUMENT:: label
+A symbol or string with the label for the new point
+ARGUMENT:: buffer
+A link::Classes/Buffer:: with the new data point
+ARGUMENT:: action
+A function to run when the point has been added
+​​
 METHOD:: updatePoint
 Update an existing label's data. Will report an error if the label doesn't exist, or if the size of the data does not match the given dimensionality of the dataset.
-​
 ARGUMENT:: label
 symbol or string with the label
-​
 ARGUMENT:: buffer
 A link::Classes/Buffer:: containing the updated data
-​
 ARGUMENT:: action
 A function to run when the server has updated
-​​​
 METHOD:: size
 Report the number of items currently in the data set
 ​
-ARGUMENT:: action
-A function to run when the server responds, whose argument is the data set size
-​​
-METHOD:: addPoint
-Add a new point to the data set. Will report an error if the label already exists, or if the size of the data does not match the given dimensionality of the dataset.
-​
+METHOD:: getPoint
+Retreive a point from the data set into a link::Classes/Buffer::. Will report an error if the label or buffer doesn't exist​
 ARGUMENT:: label
-A symbol or string with the label for the new point
-​
+symbol or string with the label to retreive
 ARGUMENT:: buffer
-A link::Classes/Buffer:: with the new data point
-​
-ARGUMENT:: action
-A function to run when the point has been added
-​​
-METHOD:: write
-Write the data set to disk as a JSON file. Will not overwrite existing files
-​
-ARGUMENT:: filename
-Absolute path for the new file
-​
+link::Classes/Buffer:: to fill
 ARGUMENT:: action
-A function to run when the file has been written
-​​
-METHOD:: asString
-​
-returns:: The name of the data set as a string
-​
+function to run when the point has been retreived
+
+
 METHOD:: deletePoint
 Remove a point from the data set. Will report an error if the label doesn't exist.
-​
 ARGUMENT:: label
 symbol or string with the label to remove
-​
 ARGUMENT:: action
 Function to run when the point has been deleted
 ​​
 METHOD:: clear
 Empty the data set
-​
 ARGUMENT:: action
 Function to run when the data set has been emptied
 ​
-METHOD:: getPoint
-Retreive a point from the data set into a link::Classes/Buffer::. Will report an error if the label or buffer doesn't exist
-​
-ARGUMENT:: label
-symbol or string with the label to retreive
-​
-ARGUMENT:: buffer
-link::Classes/Buffer:: to fill
-​
+METHOD:: free
+Destroy the object on the server
+
+METHOD:: cols
+Report the dimensionality of the data set. If action is nil, will default to posting result.
 ARGUMENT:: action
-function to run when the point has been retreived
-​​
+A function to run when the server responds, whose argument is the data set dimensionality. By default, the method will print the response to the post window.
+
+METHOD:: size
+Report the number of points in the data set. If action is nil, will default to posting result.
+ARGUMENT:: action
+A function to run when the server responds, whose argument is the data set size. By default, the method will print the response to the post window.
+
+
 METHOD:: read
 Read a data set from a JSON file on disk
-​
 ARGUMENT:: filename
 The absolute path of the JSON file to read
-​
 ARGUMENT:: action
 A function to run when the file has been read
+​
+METHOD:: write
+Write the data set to disk as a JSON file.
+ARGUMENT:: filename
+Absolute path for the new file
+ARGUMENT:: action
+A function to run when the file has been written
 ​​
-METHOD:: cols
-Report the dimensionality of the data set
+METHOD:: asString
+​Responds with the name of the data set as a pretty(ish) string
+
+METHOD:: asSymbol
+​Responds with the name of the data set as a symbol
+
+METHOD:: synth
+The internal synth the object uses to communicate with the server
+​
+returns:: A link::Classes/Synth::
+​
+METHOD:: server
+The server instance the object uses
 ​
+returns:: A link::Classes/Server::
+
 EXAMPLES::
 
 CODE::
-
+(
 // Make a one-dimensional data set called 'simple1data'
 ~ds = FluidDataSet.new(s,\simple1data,1)
 // Make a buffer to use for adding points
 ~point = Buffer.alloc(s,1,1)
 //Add 10 points, using the index as a label.
-(
 Routine{
     10.do{|i|
         ~point.set(0,i);
-        s.sync;
-        ~ds.addPoint(i.asString,~point,{("addPoint"+i).postln})
+        ~ds.addPoint(i.asString,~point,{("addPoint"+i).postln});
+		s.sync;
     }
 }.play
 )
diff --git a/release-packaging/HelpSource/Classes/FluidKDTree.schelp b/release-packaging/HelpSource/Classes/FluidKDTree.schelp
index 1476d77..21dbc79 100644
--- a/release-packaging/HelpSource/Classes/FluidKDTree.schelp
+++ b/release-packaging/HelpSource/Classes/FluidKDTree.schelp
@@ -4,20 +4,41 @@ categories:: FluidManipulation
 related:: Classes/FluidDataSet
 
 DESCRIPTION::
-A server-side K-Dimensional tree for efficient neighbourhood searches of multi-dimensional data. See https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms for more on KD Trees
+A server-side K-Dimensional tree for efficient neighbourhood searches of multi-dimensional data.
+
+See https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms for more on KD Trees
 
 CLASSMETHODS::
 
+METHOD:: new
+Make a new KDTree model for the given server
+ARGUMENT:: server
+The server on which to make the model
+
 INSTANCEMETHODS::
 
-METHOD:: read
-Set the object's state from a JSON file
+METHOD:: fit
+Build the tree by scanning the points of a LINK::Classes/FluidDataSet::
 
-ARGUMENT:: filename
-The location of a JSON file on disk
+ARGUMENT:: dataset
+The LINK::Classes/FluidDataSet:: of interest. This can either be a data set object itself, or the name of one.
 
 ARGUMENT:: action
-function to run when the data is loaded
+A function to run when indexing is complete
+
+
+METHOD:: kNearest
+Returns the IDs of the CODE::k:: points nearest to the one passed
+
+ARGUMENT:: buffer
+A LINK::Classes/Buffer:: containing a data point to match against. The number of frames in the buffer must match the dimensionality of the LINK::Classes/FluidDataSet:: the tree was fitted to.
+
+ARGUMENT:: k
+The number of neighbours to return
+
+
+ARGUMENT:: action
+A function that will run when the query returns, whose argument is an array of point IDs from the tree's LINK::Classes/FluidDataSet::
 
 METHOD:: kNearestDist
 Get the distances of the K nearest neighbours to a point
@@ -31,16 +52,22 @@ The number of neighbours to search
 ARGUMENT:: action
 A function that will run when the query returns, whose argument is an array of distances
 
-returns:: nothing, but could return an array if you like
+METHOD:: cols
+Get the dimensionality of the data that the tree is indexed against
 
-METHOD:: fit
-Build the tree by scanning the points of a LINK::Classes/FluidDataSet::
+ARGUMENT:: action
+A function that runs when the query returns, whose argument is the dimensionality
 
-ARGUMENT:: dataset
-The LINK::Classes/FluidDataSet:: of interest. This can either be a data set object itself, or the name of one.
+
+METHOD:: read
+Set the object's state from a JSON file
+
+ARGUMENT:: filename
+The location of a JSON file on disk
 
 ARGUMENT:: action
-A function to run when indexing is complete
+function to run when the data is loaded
+
 
 METHOD:: write
 Write the index of the tree to disk. Currently this will not overwrite extant files.
@@ -51,28 +78,50 @@ The path of a JSON file to write
 ARGUMENT:: action
 A function to run when writing is complete
 
-METHOD:: kNearest
-Returns the IDs of the CODE::k:: points nearest to the one passed
-
-ARGUMENT:: buffer
-A LINK::Classes/Buffer:: containing a data point to match against. The number of frames in the buffer must match the dimensionality of the LINK::Classes/FluidDataSet:: the tree was fitted to.
-
-ARGUMENT:: k
-The number of neighbours to return
-
-ARGUMENT:: action
-A function that will run when the query returns, whose argument is an array of point IDs from the tree's LINK::Classes/FluidDataSet::
-
-returns:: Nothing, but could return an array of IDs if you like
-
-METHOD:: cols
-Get the dimensionality of the data that the tree is indexed against
-
-ARGUMENT:: action
-A function that runs when the query returns, whose argument is the dimensionality
 
 EXAMPLES::
 
 code::
-(some example code)
+//Make some 2D points and place into a dataset
+(
+~points = 100.collect{ [ 1.0.linrand,1.0.linrand] };
+~dataset= FluidDataSet(s,\kdtree_help_rand2d);
+~dataset.free
+~tmpbuf = Buffer.alloc(s,2) ;
+fork{
+	~dataset.ready.wait;
+	~points.do{|x,i|
+		(""++(i+1)++"/100").postln;
+		~tmpbuf.setn(0,x);
+		~dataset.addPoint(i,~tmpbuf);
+		s.sync
+	}
+}
+)
+
+
+
+
+
+//Make a new tree, and fit it to the dataset
+(
+fork{
+	~tree = FluidKDTree(s);
+	~tree.ready.wait;
+	s.sync;
+	~tree.fit(~dataset);
+}
+)
+
+//Dims of tree should match dataset
+~tree.cols
+
+//Return labels of k nearest points to new data
+(
+~tmpbuf.setn(0,[ 1.0.linrand,1.0.linrand ]);
+~tree.kNearest(~tmpbuf,5, { |a| a.postln });
+)
+
+//or the distances
+~tree.kNearestDist(~tmpbuf,5, { |a| a.postln });
 ::
diff --git a/release-packaging/HelpSource/Classes/FluidKMeans.schelp b/release-packaging/HelpSource/Classes/FluidKMeans.schelp
index fa27fd3..bd2384e 100644
--- a/release-packaging/HelpSource/Classes/FluidKMeans.schelp
+++ b/release-packaging/HelpSource/Classes/FluidKMeans.schelp
@@ -10,87 +10,168 @@ https://scikit-learn.org/stable/tutorial/statistical_inference/unsupervised_lear
 
 CLASSMETHODS::
 
+METHOD:: new
+Construct a new K Means model on the passed server
+ARGUMENT:: server
+If nil will use Server.default
+
 INSTANCEMETHODS::
 
 PRIVATE::k
 
-METHOD:: predictPoint
-Given a trained object, return the cluster ID for a data point in a link::Classes/Buffer::
-
-ARGUMENT:: buffer
-a link::Classes/Buffer:: containing a data point
-
-ARGUMENT:: action
-A function to run when the server responds, taking the ID of the cluser as its argument
-
 METHOD:: fit
 Identify code::k:: clusters in a link::Classes/FluidDataSet::
-
 ARGUMENT:: dataset
 A link::Classes/FluidDataSet:: of data points
-
 ARGUMENT:: k
 The number of clusters to identify in the data set
-
 ARGUMENT:: maxIter
 Maximum number of iterations to use partitioning the data
-
 ARGUMENT:: buffer
 Seed centroids for clusters WARNING:: Not yet implemented ::
-
 ARGUMENT:: action
 A function to run when fitting is complete, taking as its argument an array with the number of data points for each cluster
 
-METHOD:: write
-write learned clusters to disk as a JSON file. Will not overwrite existing files
-
-ARGUMENT:: filename
-Absolute path for file
-
-ARGUMENT:: action
-A function to run when the file is written
-
-METHOD:: read
-Read a learned clustering of a data set from a JSON file
-
-ARGUMENT:: filename
-Absolute path of the JSON file
-
+METHOD:: predict
+Given a trained object, return the cluster ID for each data point in a dataset to a label set.
+ARGUMENT:: dataset
+a link::Classes/FluidDataSet:: containing the data to predict
+ARGUMENT:: labelset
+a link::Classes/FluidLabelSet:: to reveive the predicted clusters
 ARGUMENT:: action
-Function to run when the file has been read
-
-METHOD:: getClusters
-Fill a link::Classes/FluidLabelSet:: with the assignments for each point in the passed link::Classes/FluidDataSet:: that was used to train this instance
+A function to run when the server responds
 
+METHOD:: fitPredict
+Run link::Classes/FluidKMeans#*fit:: and link::Classes/FluidKMeans#*predict:: in a single pass: i.e. train the model on the incoming link::Classes/FluidDataSet:: and then return the learned clustering to the passed link::Classes/FluidLabelSet::
 ARGUMENT:: dataset
-The link::Classes/FluidDataSet:: used to train this instance
-
+a link::Classes/FluidDataSet:: containing the data to fit and predict
 ARGUMENT:: labelset
-A link::Classes/FluidLabelSet:: to fill with assignments
+a link::Classes/FluidLabelSet:: to reveive the predicted clusters
+ARGUMENT:: k
+The number of clusters to identify in the data set
+ARGUMENT:: maxIter
+Maximum number of iterations to use partitioning the data
+ARGUMENT:: action
+A function to run when the server responds
 
+METHOD:: predictPoint
+Given a trained object, return the cluster ID for a data point in a link::Classes/Buffer::
+ARGUMENT:: buffer
+a link::Classes/Buffer:: containing a data point
 ARGUMENT:: action
-A function to run when the operation is complete
+A function to run when the server responds, taking the ID of the cluser as its argument
+
+
 
 METHOD:: cols
 Retreive the dimentionality of the dataset this instance is trained on
-
 ARGUMENT:: action
 A function to run when the server responds, taking the dimensionality as its argument
 
 METHOD:: predict
 Report cluster assignments for previously unseen data
-
 ARGUMENT:: dataset
 A link::Classes/FluidDataSet:: of data points
-
 ARGUMENT:: labelset
 A link::Classes/FluidLabelSet:: to contain assigments
-
 ARGUMENT:: action
 A function to run when complete, taking an array of the counts for each catgegory as its argument
 
-EXAMPLES::
 
+
+METHOD:: write
+write learned clusters to disk as a JSON file. Will not overwrite existing files
+ARGUMENT:: filename
+Absolute path for file
+ARGUMENT:: action
+A function to run when the file is written
+
+METHOD:: read
+Read a learned clustering of a data set from a JSON file
+ARGUMENT:: filename
+Absolute path of the JSON file
+ARGUMENT:: action
+Function to run when the file has been read
+
+
+EXAMPLES::
+Server.default.options.outDevice = "Built-in Output"
 code::
-(some example code)
+
+//A dataset for our points, a labelset for cluster labels
+(
+~dataset= FluidDataSet(s,\kdtree_help_rand2d);
+
+~clusters = FluidLabelSet(s,\kmeans_help_clusters);
+)
+
+//Make some clumped 2D points and place into a dataset
+(
+~points = (4.collect{64.collect{(1.sum3rand) + [1,-1].choose}.clump(2)}).flatten(1) * 0.5;
+~dataset.clear;
+~tmpbuf = Buffer.alloc(s,2);
+fork{
+    s.sync;
+    ~points.do{|x,i|
+        (""++(i+1)++"/128").postln;
+        ~tmpbuf.setn(0,x);
+        ~dataset.addPoint(i,~tmpbuf);
+        s.sync
+    }
+}
+)
+
+//Make a new k means model, fit it to the dataset and return the discovered clusters to a labelset
+(
+fork{
+	~clusters.clear;
+	~kmeans = FluidKMeans(s);
+    s.sync;
+	~kmeans.fitPredict(~dataset,~clusters, 4,action: {|c|
+		"Fitted.\n # Points in each cluster:".postln;
+		c.do{|x,i|
+			("Cluster" + i + "->" + x.asInteger + "points").postln;
+		}
+	});
+}
+)
+
+//Dims of kmeans should match dataset
+~kmeans.cols
+
+//Return labels of clustered points
+(
+~assignments = Array.new(128);
+fork{
+	128.do{ |i|
+		~clusters.getLabel(i,{|clusterID|
+			(i.asString+clusterID).postln;
+			~assignments.add(clusterID)
+		});
+		s.sync;
+	}
+}
+)
+
+//Visualise: we're hoping to see colours neatly mapped to quandrants...
+(
+d = ((~points + 1) * 0.5).flatten(1).unlace;
+// d = [20.collect{1.0.rand}, 20.collect{1.0.rand}];
+w = Window("scatter", Rect(128, 64, 200, 200));
+~colours = [Color.blue,Color.red,Color.green,Color.magenta];
+w.drawFunc = {
+	Pen.use {
+		d[0].size.do{|i|
+			var x = (d[0][i]*200);
+			var y = (d[1][i]*200);
+			var r = Rect(x,y,5,5);
+			Pen.fillColor = ~colours[~assignments[i].asInteger];
+			Pen.fillOval(r);
+		}
+	}
+};
+w.refresh;
+w.front;
+)
+
 ::
diff --git a/release-packaging/HelpSource/Classes/FluidLabelSet.schelp b/release-packaging/HelpSource/Classes/FluidLabelSet.schelp
index 4bd10f6..77d812c 100644
--- a/release-packaging/HelpSource/Classes/FluidLabelSet.schelp
+++ b/release-packaging/HelpSource/Classes/FluidLabelSet.schelp
@@ -12,84 +12,85 @@ CLASSMETHODS::
 PRIVATE:: kr
 
 METHOD:: new
-Make a new instance of a label set, uniquely identified by its name. Multiple instances to of this class with the same name refer to the same server-side entity.
-
+Make a new instance of a label set, uniquely identified by its name. Creating an instance with a name already in use will throw an exception. Use link::Classes/FluidLabelSet#*at:: or free the existing instance.
 ARGUMENT:: server
 The link::Classes/Server:: on which to create the label set
-
 ARGUMENT:: name
 symbol or string with the label set's name
 
+METHOD:: at
+Retreive a label set from the cache
+ARGUMENT:: server
+The link::Classes/Server:: on which to create the label set
+ARGUMENT:: id
+symbol or string with the label set's name
+
+
 INSTANCEMETHODS::
 
-PRIVATE:: init, id, server, synth
+PRIVATE:: init, id
+
+METHOD:: addLabel
+Add a label to the label set
+ARGUMENT:: id
+symbol or string with the ID for this label
+ARGUMENT:: label
+symbol or string with the label to add
+ARGUMENT:: action
+function to run when the operation completes
+
+METHOD:: updateLabel
+Change a label in the label set
+ARGUMENT:: id
+symbol or string with the ID for this label
+ARGUMENT:: label
+symbol or string with the label to add
+ARGUMENT:: action
+function to run when the operation completes
 
 METHOD:: getLabel
 Retreive the label associated with an ID. Will report an error if the ID isn't present in the set
-
 ARGUMENT:: id
 symbol or string with the ID to retreive.
-
 ARGUMENT:: action
 A function to run when the server responds, with the label as its argument
 
+METHOD:: deleteLabel
+Remove a id-label pair from the label set
+ARGUMENT:: id
+symbol or string with the ID to remove
+ARGUMENT:: action
+A function to run when the label has been removed
+
+METHOD:: clear
+Empty the label set
+ARGUMENT:: action
+Function to run whrn the action completes
+
+METHOD:: size
+Report the number of items in the label set
+ARGUMENT:: action
+A function to run when the server responds, taking the size as its argument
+
 METHOD:: cols
 Returns the dimensionality of the link::Classes/FluidDataSet:: associated with this label set
-
 ARGUMENT:: action
 A function to run when the server responds, with the dimensionality as its argument
 
 METHOD:: write
-Write this label set to disk as a JSON file. Will not overwrite existing files.
-
+Write this label set to disk as a JSON file.
 ARGUMENT:: filename
 Absolute path of file to write
-
 ARGUMENT:: action
 A function to run when the file is written
 
 METHOD:: read
 Read a label set from a JSON file on disk
-
 ARGUMENT:: filename
 Absolute path of the file to read
-
 ARGUMENT:: action
 A function to run when the file is read
 
-METHOD:: deleteLabel
-Remove a id-label pair from the label set
-
-ARGUMENT:: id
-symbol or string with the ID to remove
-
-ARGUMENT:: action
-A function to run when the label has been removed
-
-METHOD:: size
-Report the num er of items in the label set
-
-ARGUMENT:: action
-A function to run when the server responds, taking the size as its argument
-
-METHOD:: addLabel
-Add a label to the label set
-
-ARGUMENT:: id
-symbol or string with the ID for this label
-
-ARGUMENT:: label
-symbol or string with the label to add
-
-ARGUMENT:: action
-function to run when the operation completes
-
-METHOD:: clear
-Empty the label set
-
-ARGUMENT:: action
-Function to run whrn the action completes
-
 EXAMPLES::
 
 code::