Improving help files: Dataset, Labelset, KDTree and KMeans

6 years ago · ddce5473a8
parent 00a238dae7
commit ddce5473a8
4 changed files with 322 additions and 190 deletions
--- a/release-packaging/HelpSource/Classes/FluidDataSet.schelp
+++ b/release-packaging/HelpSource/Classes/FluidDataSet.schelp
@ -4,137 +4,138 @@ categories:: UGens>FluidManipulation
 related:: Classes/FluidLabelSet, Classes/FluidKDTree, Classes/FluidKNN, Classes/FluidKMeans
 DESCRIPTION::
-A server-side container associating labels with multi-dimensional data. FluidDataSet is identified by its name, and multiple instances of the object with the same name point to the same instance on the server.
+A server-side container associating labels with multi-dimensional data. FluidDataSet is identified by its name.
 CLASSMETHODS::
-PRIVATE::kr
+PRIVATE:: asUGenInput
 METHOD:: new
-Create a new instance of the dataset, with the given name and dimensionality. If an instance already exists on the server, then the existing dimensionality takes precedence.
+Create a new instance of the dataset, with the given name. If a Dataset with this name already exists, an exception will be thrown (see link::Classes/FluidDataSet#at:: to access an extant Dataset)
-
+
 ARGUMENT:: server
 The link::Classes/Server:: on which to create the data set
 ARGUMENT:: name
 A symbol or string with the name of the dataset.
 ARGUMENT:: dims
 An integer number of dimensions
 returns:: The new instance
 METHOD:: at
 Retreives a cached instance of a FluidDataSet with the given name, or returns nil if no such object exists.
 ARGUMENT:: server
 The server associated with this dataset instance
 ARGUMENT:: id
 The name of the Dataset to retreive from the cache
 INSTANCEMETHODS::
-PRIVATE::init,id
+PRIVATE:: init,id,cache
-METHOD:: synth
+METHOD:: addPoint
-The internal synth the object uses to communicate with the server
+Add a new point to the data set. The dimensionality of the dataset is governed by the size of the first point added.
-
+Will report an error if the label already exists, or if the size of the data does not match the dimensionality of the dataset.
-returns:: A link::Classes/Synth::
+ARGUMENT:: label
-
+A symbol or string with the label for the new point
-METHOD:: server
+ARGUMENT:: buffer
-The server instance the object uses
+A link::Classes/Buffer:: with the new data point
-
+ARGUMENT:: action
-returns:: A link::Classes/Server::
+A function to run when the point has been added
-
+
 METHOD:: updatePoint
 Update an existing label's data. Will report an error if the label doesn't exist, or if the size of the data does not match the given dimensionality of the dataset.
 ARGUMENT:: label
 symbol or string with the label
 ARGUMENT:: buffer
 A link::Classes/Buffer:: containing the updated data
 ARGUMENT:: action
 A function to run when the server has updated
 METHOD:: size
 Report the number of items currently in the data set
-ARGUMENT:: action
+METHOD:: getPoint
-A function to run when the server responds, whose argument is the data set size
+Retreive a point from the data set into a link::Classes/Buffer::. Will report an error if the label or buffer doesn't exist
 METHOD:: addPoint
 Add a new point to the data set. Will report an error if the label already exists, or if the size of the data does not match the given dimensionality of the dataset.
 ARGUMENT:: label
-A symbol or string with the label for the new point
+symbol or string with the label to retreive
 ARGUMENT:: buffer
-A link::Classes/Buffer:: with the new data point
+link::Classes/Buffer:: to fill
 ARGUMENT:: action
 A function to run when the point has been added
 METHOD:: write
 Write the data set to disk as a JSON file. Will not overwrite existing files
 ARGUMENT:: filename
 Absolute path for the new file
 ARGUMENT:: action
-A function to run when the file has been written
+function to run when the point has been retreived
-
+
-METHOD:: asString
+
 returns:: The name of the data set as a string
 METHOD:: deletePoint
 Remove a point from the data set. Will report an error if the label doesn't exist.
 ARGUMENT:: label
 symbol or string with the label to remove
 ARGUMENT:: action
 Function to run when the point has been deleted
 METHOD:: clear
 Empty the data set
 ARGUMENT:: action
 Function to run when the data set has been emptied
-METHOD:: getPoint
+METHOD:: free
-Retreive a point from the data set into a link::Classes/Buffer::. Will report an error if the label or buffer doesn't exist
+Destroy the object on the server
-
+
-ARGUMENT:: label
+METHOD:: cols
-symbol or string with the label to retreive
+Report the dimensionality of the data set. If action is nil, will default to posting result.
 ARGUMENT:: buffer
 link::Classes/Buffer:: to fill
 ARGUMENT:: action
-function to run when the point has been retreived
+A function to run when the server responds, whose argument is the data set dimensionality. By default, the method will print the response to the post window.
-
+
 METHOD:: size
 Report the number of points in the data set. If action is nil, will default to posting result.
 ARGUMENT:: action
 A function to run when the server responds, whose argument is the data set size. By default, the method will print the response to the post window.
 METHOD:: read
 Read a data set from a JSON file on disk
 ARGUMENT:: filename
 The absolute path of the JSON file to read
 ARGUMENT:: action
 A function to run when the file has been read
 METHOD:: write
 Write the data set to disk as a JSON file.
 ARGUMENT:: filename
 Absolute path for the new file
 ARGUMENT:: action
 A function to run when the file has been written
-METHOD:: cols
+METHOD:: asString
-Report the dimensionality of the data set
+Responds with the name of the data set as a pretty(ish) string
 METHOD:: asSymbol
 Responds with the name of the data set as a symbol
 METHOD:: synth
 The internal synth the object uses to communicate with the server
 returns:: A link::Classes/Synth::
 METHOD:: server
 The server instance the object uses
 returns:: A link::Classes/Server::
 EXAMPLES::
 CODE::
-
+(
 // Make a one-dimensional data set called 'simple1data'
 ~ds = FluidDataSet.new(s,\simple1data,1)
 // Make a buffer to use for adding points
 ~point = Buffer.alloc(s,1,1)
 //Add 10 points, using the index as a label.
 (
 Routine{
    10.do{|i|
        ~point.set(0,i);
-        s.sync;
+        ~ds.addPoint(i.asString,~point,{("addPoint"+i).postln});
-        ~ds.addPoint(i.asString,~point,{("addPoint"+i).postln})
+		s.sync;
    }
 }.play
 )
--- a/release-packaging/HelpSource/Classes/FluidKDTree.schelp
+++ b/release-packaging/HelpSource/Classes/FluidKDTree.schelp
@ -4,20 +4,41 @@ categories:: FluidManipulation
 related:: Classes/FluidDataSet
 DESCRIPTION::
-A server-side K-Dimensional tree for efficient neighbourhood searches of multi-dimensional data. See https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms for more on KD Trees
+A server-side K-Dimensional tree for efficient neighbourhood searches of multi-dimensional data.
 See https://scikit-learn.org/stable/modules/neighbors.html#nearest-neighbor-algorithms for more on KD Trees
 CLASSMETHODS::
 METHOD:: new
 Make a new KDTree model for the given server
 ARGUMENT:: server
 The server on which to make the model
 INSTANCEMETHODS::
-METHOD:: read
+METHOD:: fit
-Set the object's state from a JSON file
+Build the tree by scanning the points of a LINK::Classes/FluidDataSet::
-ARGUMENT:: filename
+ARGUMENT:: dataset
-The location of a JSON file on disk
+The LINK::Classes/FluidDataSet:: of interest. This can either be a data set object itself, or the name of one.
 ARGUMENT:: action
-function to run when the data is loaded
+A function to run when indexing is complete
 METHOD:: kNearest
 Returns the IDs of the CODE::k:: points nearest to the one passed
 ARGUMENT:: buffer
 A LINK::Classes/Buffer:: containing a data point to match against. The number of frames in the buffer must match the dimensionality of the LINK::Classes/FluidDataSet:: the tree was fitted to.
 ARGUMENT:: k
 The number of neighbours to return
 ARGUMENT:: action
 A function that will run when the query returns, whose argument is an array of point IDs from the tree's LINK::Classes/FluidDataSet::
 METHOD:: kNearestDist
 Get the distances of the K nearest neighbours to a point
@ -31,16 +52,22 @@ The number of neighbours to search
 ARGUMENT:: action
 A function that will run when the query returns, whose argument is an array of distances
-returns:: nothing, but could return an array if you like
+METHOD:: cols
 Get the dimensionality of the data that the tree is indexed against
-METHOD:: fit
+ARGUMENT:: action
-Build the tree by scanning the points of a LINK::Classes/FluidDataSet::
+A function that runs when the query returns, whose argument is the dimensionality
-ARGUMENT:: dataset
+
-The LINK::Classes/FluidDataSet:: of interest. This can either be a data set object itself, or the name of one.
+METHOD:: read
 Set the object's state from a JSON file
 ARGUMENT:: filename
 The location of a JSON file on disk
 ARGUMENT:: action
-A function to run when indexing is complete
+function to run when the data is loaded
 METHOD:: write
 Write the index of the tree to disk. Currently this will not overwrite extant files.
@ -51,28 +78,50 @@ The path of a JSON file to write
 ARGUMENT:: action
 A function to run when writing is complete
 METHOD:: kNearest
 Returns the IDs of the CODE::k:: points nearest to the one passed
 ARGUMENT:: buffer
 A LINK::Classes/Buffer:: containing a data point to match against. The number of frames in the buffer must match the dimensionality of the LINK::Classes/FluidDataSet:: the tree was fitted to.
 ARGUMENT:: k
 The number of neighbours to return
 ARGUMENT:: action
 A function that will run when the query returns, whose argument is an array of point IDs from the tree's LINK::Classes/FluidDataSet::
 returns:: Nothing, but could return an array of IDs if you like
 METHOD:: cols
 Get the dimensionality of the data that the tree is indexed against
 ARGUMENT:: action
 A function that runs when the query returns, whose argument is the dimensionality
 EXAMPLES::
 code::
-(some example code)
+//Make some 2D points and place into a dataset
 (
 ~points = 100.collect{ [ 1.0.linrand,1.0.linrand] };
 ~dataset= FluidDataSet(s,\kdtree_help_rand2d);
 ~dataset.free
 ~tmpbuf = Buffer.alloc(s,2) ;
 fork{
 	~dataset.ready.wait;
 	~points.do{|x,i|
 		(""++(i+1)++"/100").postln;
 		~tmpbuf.setn(0,x);
 		~dataset.addPoint(i,~tmpbuf);
 		s.sync
 	}
 }
 )
 //Make a new tree, and fit it to the dataset
 (
 fork{
 	~tree = FluidKDTree(s);
 	~tree.ready.wait;
 	s.sync;
 	~tree.fit(~dataset);
 }
 )
 //Dims of tree should match dataset
 ~tree.cols
 //Return labels of k nearest points to new data
 (
 ~tmpbuf.setn(0,[ 1.0.linrand,1.0.linrand ]);
 ~tree.kNearest(~tmpbuf,5, { |a| a.postln });
 )
 //or the distances
 ~tree.kNearestDist(~tmpbuf,5, { |a| a.postln });
 ::
--- a/release-packaging/HelpSource/Classes/FluidKMeans.schelp
+++ b/release-packaging/HelpSource/Classes/FluidKMeans.schelp
@ -10,87 +10,168 @@ https://scikit-learn.org/stable/tutorial/statistical_inference/unsupervised_lear
 CLASSMETHODS::
 METHOD:: new
 Construct a new K Means model on the passed server
 ARGUMENT:: server
 If nil will use Server.default
 INSTANCEMETHODS::
 PRIVATE::k
 METHOD:: predictPoint
 Given a trained object, return the cluster ID for a data point in a link::Classes/Buffer::
 ARGUMENT:: buffer
 a link::Classes/Buffer:: containing a data point
 ARGUMENT:: action
 A function to run when the server responds, taking the ID of the cluser as its argument
 METHOD:: fit
 Identify code::k:: clusters in a link::Classes/FluidDataSet::
 ARGUMENT:: dataset
 A link::Classes/FluidDataSet:: of data points
 ARGUMENT:: k
 The number of clusters to identify in the data set
 ARGUMENT:: maxIter
 Maximum number of iterations to use partitioning the data
 ARGUMENT:: buffer
 Seed centroids for clusters WARNING:: Not yet implemented ::
 ARGUMENT:: action
 A function to run when fitting is complete, taking as its argument an array with the number of data points for each cluster
-METHOD:: write
+METHOD:: predict
-write learned clusters to disk as a JSON file. Will not overwrite existing files
+Given a trained object, return the cluster ID for each data point in a dataset to a label set.
-
+ARGUMENT:: dataset
-ARGUMENT:: filename
+a link::Classes/FluidDataSet:: containing the data to predict
-Absolute path for file
+ARGUMENT:: labelset
-
+a link::Classes/FluidLabelSet:: to reveive the predicted clusters
 ARGUMENT:: action
 A function to run when the file is written
 METHOD:: read
 Read a learned clustering of a data set from a JSON file
 ARGUMENT:: filename
 Absolute path of the JSON file
 ARGUMENT:: action
-Function to run when the file has been read
+A function to run when the server responds
 METHOD:: getClusters
 Fill a link::Classes/FluidLabelSet:: with the assignments for each point in the passed link::Classes/FluidDataSet:: that was used to train this instance
 METHOD:: fitPredict
 Run link::Classes/FluidKMeans#*fit:: and link::Classes/FluidKMeans#*predict:: in a single pass: i.e. train the model on the incoming link::Classes/FluidDataSet:: and then return the learned clustering to the passed link::Classes/FluidLabelSet::
 ARGUMENT:: dataset
-The link::Classes/FluidDataSet:: used to train this instance
+a link::Classes/FluidDataSet:: containing the data to fit and predict
 ARGUMENT:: labelset
-A link::Classes/FluidLabelSet:: to fill with assignments
+a link::Classes/FluidLabelSet:: to reveive the predicted clusters
 ARGUMENT:: k
 The number of clusters to identify in the data set
 ARGUMENT:: maxIter
 Maximum number of iterations to use partitioning the data
 ARGUMENT:: action
 A function to run when the server responds
 METHOD:: predictPoint
 Given a trained object, return the cluster ID for a data point in a link::Classes/Buffer::
 ARGUMENT:: buffer
 a link::Classes/Buffer:: containing a data point
 ARGUMENT:: action
-A function to run when the operation is complete
+A function to run when the server responds, taking the ID of the cluser as its argument
 METHOD:: cols
 Retreive the dimentionality of the dataset this instance is trained on
 ARGUMENT:: action
 A function to run when the server responds, taking the dimensionality as its argument
 METHOD:: predict
 Report cluster assignments for previously unseen data
 ARGUMENT:: dataset
 A link::Classes/FluidDataSet:: of data points
 ARGUMENT:: labelset
 A link::Classes/FluidLabelSet:: to contain assigments
 ARGUMENT:: action
 A function to run when complete, taking an array of the counts for each catgegory as its argument
 EXAMPLES::
 METHOD:: write
 write learned clusters to disk as a JSON file. Will not overwrite existing files
 ARGUMENT:: filename
 Absolute path for file
 ARGUMENT:: action
 A function to run when the file is written
 METHOD:: read
 Read a learned clustering of a data set from a JSON file
 ARGUMENT:: filename
 Absolute path of the JSON file
 ARGUMENT:: action
 Function to run when the file has been read
 EXAMPLES::
 Server.default.options.outDevice = "Built-in Output"
 code::
-(some example code)
+
 //A dataset for our points, a labelset for cluster labels
 (
 ~dataset= FluidDataSet(s,\kdtree_help_rand2d);
 ~clusters = FluidLabelSet(s,\kmeans_help_clusters);
 )
 //Make some clumped 2D points and place into a dataset
 (
 ~points = (4.collect{64.collect{(1.sum3rand) + [1,-1].choose}.clump(2)}).flatten(1) * 0.5;
 ~dataset.clear;
 ~tmpbuf = Buffer.alloc(s,2);
 fork{
    s.sync;
    ~points.do{|x,i|
        (""++(i+1)++"/128").postln;
        ~tmpbuf.setn(0,x);
        ~dataset.addPoint(i,~tmpbuf);
        s.sync
    }
 }
 )
 //Make a new k means model, fit it to the dataset and return the discovered clusters to a labelset
 (
 fork{
 	~clusters.clear;
 	~kmeans = FluidKMeans(s);
    s.sync;
 	~kmeans.fitPredict(~dataset,~clusters, 4,action: {|c|
 		"Fitted.\n # Points in each cluster:".postln;
 		c.do{|x,i|
 			("Cluster" + i + "->" + x.asInteger + "points").postln;
 		}
 	});
 }
 )
 //Dims of kmeans should match dataset
 ~kmeans.cols
 //Return labels of clustered points
 (
 ~assignments = Array.new(128);
 fork{
 	128.do{ |i|
 		~clusters.getLabel(i,{|clusterID|
 			(i.asString+clusterID).postln;
 			~assignments.add(clusterID)
 		});
 		s.sync;
 	}
 }
 )
 //Visualise: we're hoping to see colours neatly mapped to quandrants...
 (
 d = ((~points + 1) * 0.5).flatten(1).unlace;
 // d = [20.collect{1.0.rand}, 20.collect{1.0.rand}];
 w = Window("scatter", Rect(128, 64, 200, 200));
 ~colours = [Color.blue,Color.red,Color.green,Color.magenta];
 w.drawFunc = {
 	Pen.use {
 		d[0].size.do{|i|
 			var x = (d[0][i]*200);
 			var y = (d[1][i]*200);
 			var r = Rect(x,y,5,5);
 			Pen.fillColor = ~colours[~assignments[i].asInteger];
 			Pen.fillOval(r);
 		}
 	}
 };
 w.refresh;
 w.front;
 )
 ::
--- a/release-packaging/HelpSource/Classes/FluidLabelSet.schelp
+++ b/release-packaging/HelpSource/Classes/FluidLabelSet.schelp
@ -12,84 +12,85 @@ CLASSMETHODS::
 PRIVATE:: kr
 METHOD:: new
-Make a new instance of a label set, uniquely identified by its name. Multiple instances to of this class with the same name refer to the same server-side entity.
+Make a new instance of a label set, uniquely identified by its name. Creating an instance with a name already in use will throw an exception. Use link::Classes/FluidLabelSet#*at:: or free the existing instance.
 ARGUMENT:: server
 The link::Classes/Server:: on which to create the label set
 ARGUMENT:: name
 symbol or string with the label set's name
 METHOD:: at
 Retreive a label set from the cache
 ARGUMENT:: server
 The link::Classes/Server:: on which to create the label set
 ARGUMENT:: id
 symbol or string with the label set's name
 INSTANCEMETHODS::
-PRIVATE:: init, id, server, synth
+PRIVATE:: init, id
 METHOD:: addLabel
 Add a label to the label set
 ARGUMENT:: id
 symbol or string with the ID for this label
 ARGUMENT:: label
 symbol or string with the label to add
 ARGUMENT:: action
 function to run when the operation completes
 METHOD:: updateLabel
 Change a label in the label set
 ARGUMENT:: id
 symbol or string with the ID for this label
 ARGUMENT:: label
 symbol or string with the label to add
 ARGUMENT:: action
 function to run when the operation completes
 METHOD:: getLabel
 Retreive the label associated with an ID. Will report an error if the ID isn't present in the set
 ARGUMENT:: id
 symbol or string with the ID to retreive.
 ARGUMENT:: action
 A function to run when the server responds, with the label as its argument
 METHOD:: deleteLabel
 Remove a id-label pair from the label set
 ARGUMENT:: id
 symbol or string with the ID to remove
 ARGUMENT:: action
 A function to run when the label has been removed
 METHOD:: clear
 Empty the label set
 ARGUMENT:: action
 Function to run whrn the action completes
 METHOD:: size
 Report the number of items in the label set
 ARGUMENT:: action
 A function to run when the server responds, taking the size as its argument
 METHOD:: cols
 Returns the dimensionality of the link::Classes/FluidDataSet:: associated with this label set
 ARGUMENT:: action
 A function to run when the server responds, with the dimensionality as its argument
 METHOD:: write
-Write this label set to disk as a JSON file. Will not overwrite existing files.
+Write this label set to disk as a JSON file.
 ARGUMENT:: filename
 Absolute path of file to write
 ARGUMENT:: action
 A function to run when the file is written
 METHOD:: read
 Read a label set from a JSON file on disk
 ARGUMENT:: filename
 Absolute path of the file to read
 ARGUMENT:: action
 A function to run when the file is read
 METHOD:: deleteLabel
 Remove a id-label pair from the label set
 ARGUMENT:: id
 symbol or string with the ID to remove
 ARGUMENT:: action
 A function to run when the label has been removed
 METHOD:: size
 Report the num er of items in the label set
 ARGUMENT:: action
 A function to run when the server responds, taking the size as its argument
 METHOD:: addLabel
 Add a label to the label set
 ARGUMENT:: id
 symbol or string with the ID for this label
 ARGUMENT:: label
 symbol or string with the label to add
 ARGUMENT:: action
 function to run when the operation completes
 METHOD:: clear
 Empty the label set
 ARGUMENT:: action
 Function to run whrn the action completes
 EXAMPLES::
 code::