title:: The Fluid Corpus Manipulation Data Tools summary:: Tools for organising, exploring and querying corpora categories:: Libraries>FluidDecomposition,Guides>FluCoMa related:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/FluidDataSet,Classes/FluidLabelSet The suite of Fluid Corpus Manipulation data tools offer facilities for building, exploring, transforming and playing with corpora. The tools are built around two container classes, link::Classes/FluidDataSet:: and link::Classes/FluidLabelSet::, which provides a way to build up and stored collections of labelled data, and a suite of objects that act on these containers. The design and interface of many of these objects is heavily based on the Python library link::https://scikit-learn.org/stable/##scikit-learn::, a mature and well developed machine learning toolkit that is comparatively quick to get going with. As our documentation continues to develop, we will also lean quite heavily on sci-learn's! section:: Containers Map id labels to data points, or to other labels link::Classes/FluidDataSet:: link::Classes/FluidLabelSet:: section:: DataSet Filtering Select and filter items from FluidDataSet by building queries link::Classes/FluidDataSetQuery:: section:: Data Structure Perform nearest neighbour searches link::Classes/FluidKDTree:: section:: Data Conditioning Pre-process data link::Classes/FluidNormalize:: link::Classes/FluidStandardize:: link::Classes/FluidRobustScale:: section:: Dimension Reduction Compress data to fewer dimensions for visualisation / efficiency / preprocessing link::Classes/FluidPCA:: link::Classes/FluidMDS:: section:: Supervised Learning Train supervised learning models using either K nearest neighbours or a simple neural network subsection:: Classification Map input data points to categories link::Classes/FluidKNNClassifier:: link::Classes/FluidMLPClassifier:: subsection:: Regression Map input data points to continuous output link::Classes/FluidKNNRegressor:: link::Classes/FluidMLPRegressor::