title:: The Fluid Corpus Manipulation Data Tools
summary:: Tools for organising, exploring and querying corpora
categories:: Libraries>FluidDecomposition,Guides>FluCoMa
related:: Guides/FluCoMa, Guides/FluidDecomposition, Classes/FluidDataSet,Classes/FluidLabelSet

The suite of Fluid Corpus Manipulation data tools offer facilities for building, exploring, transforming and playing with corpora. The tools are built around two container classes, link::Classes/FluidDataSet:: and link::Classes/FluidLabelSet::, which provides a way to build up and stored collections of labelled data, and a suite of objects that act on these containers.

The design and interface of many of these objects is heavily based on the Python library link::https://scikit-learn.org/stable/##scikit-learn::, a mature and well developed machine learning toolkit that is comparatively quick to get going with. As our documentation continues to develop, we will also lean quite heavily on sci-learn's!

section:: Containers

Map id labels to data points, or to other labels

link::Classes/FluidDataSet::

link::Classes/FluidLabelSet::


section:: DataSet Filtering

Select and filter items from FluidDataSet by building queries

link::Classes/FluidDataSetQuery::

section:: Data Structure

Perform nearest neighbour searches

link::Classes/FluidKDTree::

section:: Data Conditioning

Pre-process data

link::Classes/FluidNormalize::

link::Classes/FluidStandardize::

link::Classes/FluidRobustScale::

section:: Dimension Reduction

Compress data to fewer dimensions for visualisation / efficiency / preprocessing

link::Classes/FluidPCA::

link::Classes/FluidMDS::

section:: Supervised Learning

Train supervised learning models using either K nearest neighbours or a simple neural network

subsection:: Classification

Map input data points to categories

link::Classes/FluidKNNClassifier::

link::Classes/FluidMLPClassifier::

subsection:: Regression

Map input data points to continuous output

link::Classes/FluidKNNRegressor::

link::Classes/FluidMLPRegressor::