Public Documentation

Documentation for Severo.jl's public interface.

See the Internals section of the manual for internal package docs covering all submodules.

Contents

Public Documentation

Index

Public Interface

Severo.agreement — Method

agreement(rng::AbstractRNG, X::AbstractMatrix, Y::Union{AbstractMatrix, LinearEmbedding}; k::Int64)

A metric for quantifying how much a transformation/factorization distorts the geometry of the original dataset. The greater the agreement, the less distortion of geometry there is.

This is calculated by performing dimensionality reduction on the original and transformed dataset, and measuring similarity between the k nearest neighbors for each cell in the datasets. The Jaccard index is used to quantify similarity, and is the final metric averages across all cells.

Arguments:

- `rng`: random number generator used for k-NN
- `X`: low dimensional embedding for reference dataset
- `Y` low dimensional embedding for transformed dataset
- `k`: number of neighbours to find (default=15)

Return values: The agreement score

Severo.agreement — Method

agreement(rng::AbstractRNG, X::Union{AbstractMatrix, LinearEmbedding}, Ys::Union{AbstractMatrix, LinearEmbedding}...; k::Int64)

A metric for quantifying how much a transformation/factorization distorts the geometry of the original dataset. The greater the agreement, the less distortion of geometry there is.

This is calculated by performing dimensionality reduction on the original and transformed dataset, and measuring similarity between the k nearest neighbors for each cell in the datasets. The Jaccard index is used to quantify similarity, and is the final metric averages across all cells.

Arguments:

- `rng`: random number generator used for k-NN
- `X`: low dimensional embedding for reference dataset
- `Ys` low dimensional embedding for transformed datasets
- `k`: number of neighbours to find (default=15)

Return values: A tuple of agreement scores for each transformed dataset

Severo.agreement — Method

agreement(X::Union{AbstractMatrix, LinearEmbedding}, Ys::Union{AbstractMatrix, LinearEmbedding}...; k::Int64)

A metric for quantifying how much a transformation/factorization distorts the geometry of the original dataset. The greater the agreement, the less distortion of geometry there is.

This is calculated by performing dimensionality reduction on the original and transformed dataset, and measuring similarity between the k nearest neighbors for each cell in the datasets. The Jaccard index is used to quantify similarity, and is the final metric averages across all cells.

Arguments:

- `X`: low dimensional embedding for reference dataset
- `Y...`: low dimensional embedding for transformed dataset(s)
- `k`: number of neighbours to find (default=15)

Return values: The agreement score

Severo.alignment — Method

alignment(X::AbstractMatrix, datasets::AbstractVector{T}...; k::Union{Nothing,Int64}=nothing) where T

Calculates the alignment score as defined by Butler 2018 [doi: 10.1038/nbt.4096]. It's a quantitative metric for the alignment of datasets and calculated as follows:

1. Randomly downsample the datasets to have the same number of cells as the smallest dataset
2. Construct a nearest-neighbor graph based on the cells’ embedding in some low dimensional space `X`.
3. For every cell, calculate how many of its k nearest-neighbors belong to the same dataset and average this over all cells.
4. We then normalize by the expected number of same dataset cells and scale to range from 0 to 1.

If the datasets are well-aligned, we would expect that each cells’ nearest neighbors would be evenly shared across all datasets.

Arguments:

- `X`: low dimensional embedding of the aligned datasets
- `datasets`: the split into datasets
- `k`: number of neighbours to find. By default: 1% of the total number of cells, capped by a minimum of 10 and total number of samples drawn

Return values: The alignment score

Severo.alignment — Method

alignment(rng::AbstractRNG, X::AbstractMatrix, datasets::AbstractVector{T}...; k::Union{Nothing,Int64}=nothing) where T

Calculates the alignment score as defined by Butler 2018 [doi: 10.1038/nbt.4096]. It's a quantitative metric for the alignment of datasets and calculated as follows:

1. Randomly downsample the datasets to have the same number of cells as the smallest dataset
2. Construct a nearest-neighbor graph based on the cells’ embedding in some low dimensional space `X`.
3. For every cell, calculate how many of its k nearest-neighbors belong to the same dataset and average this over all cells.
4. We then normalize by the expected number of same dataset cells and scale to range from 0 to 1.

If the datasets are well-aligned, we would expect that each cells’ nearest neighbors would be evenly shared across all datasets.

Arguments:

- `rng`: random number generator used by downsampling and k-NN
- `X`: low dimensional embedding
- `datasets`: the split into datasets
- `k`: number of neighbours to find. By default: 1% of the total number of cells, capped by a minimum of 10 and total number of samples drawn

Return values: The alignment score

Severo.cluster — Method

cluster(SNN::NeighbourGraph; algorithm=:louvain, resolution=0.8, nstarts=1, niterations=10) where T

Cluster cells based on a neighbourhood graph.

Arguments:

- `SNN`: shared neighbours graph
- `algorithm`: clustering algorithm to use (louvain)
- `resolution`: parameters above 1 will lead to larger communities whereas below 1 lead to smaller ones
- `nstarts`: number of random starts
- `niterations`: maximum number of iterations per random start
- `group_singletons`: group singletons into nearest cluster, if false keeps singletons

Return values:

cluster assignment per cell

Severo.cluster — Method

cluster(rng::AbstractRNG, SNN::NeighbourGraph; algorithm=:louvain, resolution=0.8, nstarts=1, niterations=10) where T

Cluster cells based on a neighbourhood graph.

Arguments:

- `rng`: random number generator
- `SNN`: shared neighbours graph
- `algorithm`: clustering algorithm to use (louvain)
- `resolution`: parameters above 1 will lead to larger communities whereas below 1 lead to smaller ones
- `nstarts`: number of random starts
- `niterations`: maximum number of iterations per random start
- `group_singletons`: group singletons into nearest cluster, if false keeps singletons

Return values:

cluster assignment per cell

Severo.convert_counts — Method

convert_counts(X::AbstractMatrix, features::AbstractVector, barcodes::AbstractVector; unique_features::Bool=true)

Convert a count matrix and labels into its labeled representation

Arguments:

X: a count matrix (features x barcodes)
features: list of feature names
barcodes: list of barcodes
unique_features: should feature names be made unique (default: true)

Returns values:

Returns labeled sparse matrix containing the counts

Severo.convert_counts — Method

convert_counts(X::AbstractMatrix)

Convert a count matrix into its labeled representation by generating unique labels

Arguments:

X: a count matrix (features x barcodes)

Returns values:

Returns labeled sparse matrix containing the counts

Severo.filter_cells — Method

filter_cells(A::NamedCountMatrix; min_features=0, min_feature_count=0, min_umi=0)

Filter a labeled count matrix, removing cells for which the metrics fall below the given thresholds

Arguments:

- `A`: the count matrix
- `min_features`: include cells where at least this many features are detected
- `min_features_count`: threshold on the count for which a feature is marked "detected"
- `min_umi`: include cells where the total of umi counts is at least this value

Return value:

The filtered, labeled matrix with cells removed

Severo.filter_counts — Method

filter_counts(A::NamedCountMatrix; min_cells=0, min_features=0, min_feature_count=0, min_umi=0)

Filter a labeled count matrix, removing cells and features for which the metrics fall below the given threshold

First cells are removed using filter_cells and then features using filter_features. This order can be important!

Arguments:

- `A`: the count matrix
- `min_cells`: include features detected in at least this many cells
- `min_features`: include cells where at least this many features are detected
- `min_features_count`: threshold on the count for which a feature is marked "detected"
- `min_umi`: include cells where the total of umi counts is at least this value

Return value:

The filtered, labeled matrix with cells and features removed

Severo.filter_features — Method

filter_features(A::NamedCountMatrix; min_cells=0)

Filter a count matrix, removing features for which the metrics fall below the given thresholds

Arguments:

- `A`: the count matrix
- `min_cells`: include features detected in at least this many cells

Return value:

The filtered matrix with features removed

Severo.filter_rank_markers — Method

filter_rank_markers(de::DataFrame; pval_thresh::Real=1e-2, ngenes::Integer=typemax(Int64))

Filters and ranks a list of markers (differentially expressed genes).

Arguments:

-`de`: list of markers returned by [find_markers](@ref)
-`pval_thresh`: only keep markers with pval < pval_thresh
-`count`: the number of highest-ranked markers to keep
-`rankby_abs`: rank based on absolute value of the scores

Return values:

A DataFrame containing a ranked list of filtered markers.

Severo.find_markers — Method

find_markers(X::Union{NamedCountMatrix, NamedDataMatrix}, idents::NamedVector{<:Integer};
    method=:wilcoxon, selection::Union{Nothing, NamedArray{Bool, 2}, AbstractArray{Bool,2}}=nothing, log::Bool=false, kw...)

Finds markers (differentially expressed genes) for each of the classes in a dataset.

Arguments:

-`X`: count or data matrix
-`idents`: class identity for each cell
-`method`: Which test to use, supported are: [wilcoxon, t]
-`selection`: a selection of features and groups that should be considered
-`log`: the data is in log-scale (default = false)
-`kw...`: additional parameters passed down to the method

Return values:

A DataFrame containing a list of putative markers with associated statistics (p-values and scores) and log fold-changes.

Severo.find_variable_features — Function

find_variable_features(counts::NamedCountMatrix, nfeatures=2000; method=:vst, kw...)

Identification of highly variable features: find features that exhibit high cell-to-cell variation in the dataset (i.e, they are highly expressed in some cells, and lowly expressed in others).

Arguments:

-`counts`: count matrix
-`nfeatures`: the number of top-ranking features to return
-`method`: how to choose top variable features
-`kw`: additional keyword arguments to pass along to the method

*Methods**:

-`:vst`: fits a line to the log(mean) - log(variance) relationship, then standardizes the features values
    using the observed mean and expected variance. Finally, feature variance is calculated using the standardized values.

    - `loess_span`: span parameter for loess regression when fitting the mean-variance relationship

-`dispersion`: selects the genes with the highest dispersion values

-`meanvarplot`: calculates the feature mean and dispersion, bins the mean according into `num_bins` bins.
    Finally, returns the z-scores for dispersion within each bin.

    - `num_bins`: Total number of bins to use
    - `binning_method`: Specifies how the bins should be computed. Available: `:width` for equal width and `:frequency` for equal frequency binning

Return value:

The nfeatures top-ranked features

Severo.jaccard_index — Method

jaccard_index(X::NamedArray{T,2}; prune::Real=1/15) where T

Compute a graph with edges defined by the jaccard index. The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.

Arguments:

- `nn`: a nearest neighbour graph
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph

Return values:

A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.

Severo.jaccard_index — Method

jaccard_index(X::NamedArray{T,2}, k::Int64; prune::Real=1/15) where T

Compute a graph with edges defined by the jaccard index. The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.

Arguments:

- `nn`: a nearest neighbour graph
- `k`: maximum number of neighbours
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph

Return values:

A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.

Severo.mean_var — Method

variance to mean ratio (VMR) in non-logspace

Severo.nearest_neighbours — Method

nearest_neighbours(em::LinearEmbedding, k::Int64;
     dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T

Compute a k-nearest neighbours graph based on a linear embedding

Arguments:

- `em`: embedding containing the transformed coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)

Return values:

A k-nearest neighbours graph represented by a sparse matrix. k-neighbours are stored as rows for each cell (cols)

Severo.nearest_neighbours — Method

nearest_neighbours(rng::AbstractRNG, em::LinearEmbedding, k::Int64;
     dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T

Compute a k-nearest neighbours graph based on a linear embedding

Arguments:

- `rng`: random number generator
- `em`: embedding containing the transformed coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)

Return values:

A k-nearest neighbours graph represented by a sparse matrix. k-neighbours are stored as rows for each cell (cols)

Severo.nearest_neighbours — Method

nearest_neighbours(X::NamedArray{T,2}, k::Int64;
    dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T

Compute a k-nearest neighbours graph based on coordinates for each cell.

Arguments:

- `X`: a labelled matrix with coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)

Return values:

A k-nearest neighbours graph represented by a sparse matrix. k-neighbours are stored as rows for each cell (cols)

Severo.nearest_neighbours — Method

nearest_neighbours(rng::AbstractRNG, X::NamedArray{T,2}, k::Int64;
    dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T

Compute a k-nearest neighbours graph based on coordinates for each cell.

Arguments:

- `rng`: random number generator
- `X`: a labelled matrix with coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)

Return values:

A k-nearest neighbours graph represented by a sparse matrix. k-neighbours are stored as rows for each cell (cols)

Severo.normalize_cells — Method

normalize_cells(X::NamedCountMatrix; method=:lognormalize, scale_factor=1.0)

Normalize count data with different methods:

- `lognormalize`: feature counts are divided by the total count per cell, scaled by `scale_factor` and then log1p transformed.
- `relativecounts`: feature counts are divided by the total count per cell and scaled by `scale_factor`.

Arguments:

- `X`: the labelled count matrix to normalize
- `method`: normalization method to apply
- `scale_factor`: the scaling factor
- `dtype`: datatype to be used for the output

Return values:

A labelled data matrix

Severo.plot_elbow — Method

plot_elbow(em::LinearEmbedding)

Plots the standard deviations of the principle components for easy identification of an elbow in the graph.

Arguments:

-`em`: a linear embedding

Return value:

A plot object

Severo.plot_embedding — Method

plot_embedding(em::LinearEmbedding)

Plots the output of a dimensional reduction technique on a 2D scatter plot where each point is a cell and it's positioned based on the cell embeddings determined by the reduction technique.

Arguments:

-`em`: a linear embedding

Return value:

A plot object

Severo.plot_highest_expressed — Method

plot_highly_expressed_genes(X::NamedCountMatrix, n::Int64; dropfeatures::Union{Nothing, AbstractArray}=nothing)

Plot the features with the highest average expression across all cells, along with their expression in each individual cell.

Arguments:

-`X`: the count matrix
-`n`: the number of the most expressed features to show
-`dropfeatures`: array with names, indices or bits indicating features to drop when plotting

Return value:

A plot object

Severo.plot_loadings — Method

plot_loadings(em::LinearEmbedding; dims::AbstractVector{<:Integer}=1:6, nfeatures::Integer=10)

Visualize top genes associated with reduction components

Arguments:

-`em`: a linear embedding
-`dims`: which components to display
-`nfeatures`: number of genes to display

Return value:

A plot object

Severo.prefilter_markers — Function

prefilter_markers(X::Union{NamedCountMatrix, NamedDataMatrix}, idents::NamedVector{<:Integer};
        logfc_threshold::Real=0.0, min_pct::Real=0.0, min_diff_pct::Real=-Inf, only_pos:Bool=false, log::Bool=false)

Filter features for each of the classes in a dataset.

Arguments:

-`X`: count or data matrix
-`idents`: class identity for each cell
-`logfc_threshold`: Limit testing to features which show, on average, at least X-fold difference (log-scale) between the two groups of cells
-`min_pct`: only test features that are detected in a minimum fraction of `min_pct` cells in either of the two populations
-`min_diff_pct`: only test features that show a minimum difference in the fraction of detection between the two groups.
-`only_pos`: only return features with positive log fold-change
-`log`: the data is in log-scale (default = false)

Return values:

Selection matrix for each feature and class

Severo.purity — Method

purity(clusters::IntegerArray, classes::IntegerArray)

Calculates purity between clusters and external clustering (true clusters/classes).

Arguments:

- `clusters`: clustering for which to calculate purity
- `classes`: clustering/classes with which to compare

Return values:

Purity score in the range [0, 1], with a score of 1 representing a pure/accurate clustering

Severo.read_10X — Method

read_10X(dirname::AbstractString; unique_features=true)

Read count matrix from 10X genomics

Arguments:

dirname: path to directory containing matrix.mtx, genes.tsv (or features.tsv), and barcodes.tsv from 10X
unique_features: should feature names be made unique (default: true)

Returns values:

Returns labeled sparse matrix containing the counts

Severo.read_10X_h5 — Method

read_10X_h5(fname::AbstractString; dataset::AbstractString="/mm10", unique_features=true)

Read count matrix from 10X CellRanger hdf5 file.

Arguments:

fname: path to hdf5 file
dataset: name of dataset to load (default: "mm10")
unique_features: should feature names be made unique (default: true)

Returns values:

Returns labeled sparse matrix containing the counts

Severo.read_csv — Method

read_csv(fname::AbstractString; unique_features=true)

Read count matrix from CSV

Arguments:

fname: path to csv file
unique_features: should feature names be made unique (default: true)

Returns values:

Returns labeled sparse matrix containing the counts

Severo.read_data — Method

read_data(path::AbstractString; kw...)

Tries to identify and read a count matrix in any of the supported formats

Arguments:

fname: path
kw: additional keyword arguments are passed on

Returns values:

Returns labeled sparse matrix containing the counts

Severo.read_dge — Method

read_dge(fname::AbstractString; unique_features=true)

Read count matrix from digital gene expression (DGE) files

Arguments:

fname: path to dge file
unique_features: should feature names be made unique (default: true)

Returns values:

Returns labeled sparse matrix containing the counts

Severo.read_h5 — Method

read_h5(fname::AbstractString; dataset::AbstractString="/mm10", unique_features=true)

Read count matrix from hdf5 file.

Arguments:

fname: path to hdf5 file
dataset: name of dataset to load (default: "counts")
unique_features: should feature names be made unique (default: true)

Returns values:

Returns labeled sparse matrix containing the counts

Severo.read_h5ad — Method

read_h5ad(fname::AbstractString, dataset::String="/mm10"; unique_features=true)

Read count matrix from hdf5 file as created by AnnData.py. https://anndata.readthedocs.io/en/latest/fileformat-prose.html

Arguments:

fname: path to hdf5 file
unique_features: should feature names be made unique (default: true)

Returns values:

Returns labeled sparse matrix containing the counts

Severo.read_loom — Method

read_loom(fname::AbstractString; barcode_names::AbstractString="CellID", feature_names::AbstractString="Gene", unique_names::Bool=true, blocksize::Tuple{Int,Int}=(100,100))

Read count matrix from loom format

Arguments:

fname: path to loom file
barcode_names: key where the observation/cell names are stored.
feature_names: key where the variable/feature names are stored.
unique_names: should feature and barcode names be made unique (default: true)
blocksize: blocksize to use when reading the matrix (tradeoff between memory and speed)

Returns values:

Returns labeled sparse matrix containing the counts

Severo.scale_features — Method

scale_features(X::NamedArray{T, 2, SparseMatrixCSC{T, Int64}} ; scale_max=Inf, dtype::Type{<:AbstractFloat})

Scale and center a count/data matrix along the cells such that each feature is standardized

Arguments:

- `X`: the labelled count/data matrix to scale
- `scale_max`: maximum value of the scaled data

Return values:

A centered matrix

Severo.shared_nearest_neighbours — Method

shared_nearest_neighbours(em::LinearEmbedding, k::Int64; dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2))

Compute a k-nearest neighbours graph based on an embedding of cells and its Jaccard index.
The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.

Arguments:

- `em`: embedding containing the transformed coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph

Return values:

A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.

Severo.shared_nearest_neighbours — Method

shared_nearest_neighbours(rng::AbstractRNG, em::LinearEmbedding, k::Int64; dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2))

Compute a k-nearest neighbours graph based on an embedding of cells and its Jaccard index.
The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.

Arguments:

- `rng`: random number generator
- `em`: embedding containing the transformed coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph

Return values:

A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.

Severo.shared_nearest_neighbours — Method

shared_nearest_neighbours(X::NamedArray{T,2}, k::Int64;
    dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T

Compute a k-nearest neighbours graph based on coordinates for each cell and its Jaccard index.
The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.

Arguments:

- `X`: a labelled matrix with coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph

Return values:

A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.

Severo.shared_nearest_neighbours — Method

shared_nearest_neighbours(rng::AbstractRNG, X::NamedArray{T,2}, k::Int64; dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T

Compute a k-nearest neighbours graph based on coordinates for each cell and its Jaccard index.
The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.

Arguments:

- `rng`: random number generator
- `X`: a labelled matrix with coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph

Return values:

A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.

Severo.umap — Function

umap(em::LinearEmbedding, ncomponents::Int64=2; dims=:, metric=:cosine, nneighbours::Int=30, min_dist::Real=.3, nepochs::Int=300, kw...) where T

Performs a Uniform Manifold Approximation and Projection (UMAP) dimensional reduction on the coordinates in the linear embedding.

For a more in depth discussion of the mathematics underlying UMAP, see the ArXiv paper: [https://arxiv.org/abs/1802.03426]

Arguments:

- `em`: embedding containing the transformed coordinates for each cell
- `ncomponents`: the dimensionality of the embedding
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `nneighbours`: the number of neighboring points used in local approximations of manifold structure.
- `min_dist`: controls how tightly the embedding is allowed compress points together.
- `nepochs`: number of training epochs to be used while optimizing the low dimensional embedding
- `kw`: additional parameters for the umap algorithm. See [`UMAP.umap`](@ref)

Return values:

A low-dimensional embedding of the cells

Severo.umap — Function

umap(X::NamedMatrix, ncomponents::Int64=2; dims=:, metric=:cosine, nneighbours::Int=30, min_dist::Real=.3, nepochs::Int=300, kw...) where T

Performs a Uniform Manifold Approximation and Projection (UMAP) dimensional reduction on the coordinates.

For a more in depth discussion of the mathematics underlying UMAP, see the ArXiv paper: [https://arxiv.org/abs/1802.03426]

Arguments:

- `X`: a labelled matrix with coordinates for each cell
- `ncomponents`: the dimensionality of the embedding
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `nneighbours`: the number of neighboring points used in local approximations of manifold structure.
- `min_dist`: controls how tightly the embedding is allowed compress points together.
- `nepochs`: number of training epochs to be used while optimizing the low dimensional embedding
- `kw`: additional parameters for the umap algorithm. See [`UMAP.umap`](@ref)

Return values:

A low-dimensional embedding of the cells

Severo.umap — Method

umap(X::AbstractMatrix, ncomponents::Int64=2; dims=:, metric=:cosine, nneighbours::Int=30, min_dist::Real=.3, nepochs::Int=300, kw...) where T

Performs a Uniform Manifold Approximation and Projection (UMAP) dimensional reduction on the coordinates in the linear embedding.

For a more in depth discussion of the mathematics underlying UMAP, see the ArXiv paper: [https://arxiv.org/abs/1802.03426]

Arguments:

- `X`: an unlabelled matrix with coordinates for each cell
- `ncomponents`: the dimensionality of the embedding
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `nneighbours`: the number of neighboring points used in local approximations of manifold structure.
- `min_dist`: controls how tightly the embedding is allowed compress points together.
- `nepochs`: number of training epochs to be used while optimizing the low dimensional embedding
- `kw`: additional parameters for the umap algorithm. See [`UMAP.umap`](@ref)

Return values:

A low-dimensional embedding of the cells