Public Documentation
Documentation for Severo.jl
's public interface.
See the Internals section of the manual for internal package docs covering all submodules.
Contents
Index
Severo.agreement
Severo.agreement
Severo.agreement
Severo.alignment
Severo.alignment
Severo.cluster
Severo.cluster
Severo.convert_counts
Severo.convert_counts
Severo.filter_cells
Severo.filter_counts
Severo.filter_features
Severo.filter_rank_markers
Severo.find_markers
Severo.find_variable_features
Severo.jaccard_index
Severo.jaccard_index
Severo.mean_var
Severo.nearest_neighbours
Severo.nearest_neighbours
Severo.nearest_neighbours
Severo.nearest_neighbours
Severo.normalize_cells
Severo.plot_elbow
Severo.plot_embedding
Severo.plot_highest_expressed
Severo.plot_loadings
Severo.prefilter_markers
Severo.purity
Severo.read_10X
Severo.read_10X_h5
Severo.read_csv
Severo.read_data
Severo.read_dge
Severo.read_h5
Severo.read_h5ad
Severo.read_loom
Severo.scale_features
Severo.shared_nearest_neighbours
Severo.shared_nearest_neighbours
Severo.shared_nearest_neighbours
Severo.shared_nearest_neighbours
Severo.umap
Severo.umap
Severo.umap
Public Interface
Severo.agreement
— Methodagreement(rng::AbstractRNG, X::AbstractMatrix, Y::Union{AbstractMatrix, LinearEmbedding}; k::Int64)
A metric for quantifying how much a transformation/factorization distorts the geometry of the original dataset. The greater the agreement, the less distortion of geometry there is.
This is calculated by performing dimensionality reduction on the original and transformed dataset, and measuring similarity between the k nearest neighbors for each cell in the datasets. The Jaccard index is used to quantify similarity, and is the final metric averages across all cells.
Arguments:
- `rng`: random number generator used for k-NN
- `X`: low dimensional embedding for reference dataset
- `Y` low dimensional embedding for transformed dataset
- `k`: number of neighbours to find (default=15)
Return values: The agreement score
Severo.agreement
— Methodagreement(rng::AbstractRNG, X::Union{AbstractMatrix, LinearEmbedding}, Ys::Union{AbstractMatrix, LinearEmbedding}...; k::Int64)
A metric for quantifying how much a transformation/factorization distorts the geometry of the original dataset. The greater the agreement, the less distortion of geometry there is.
This is calculated by performing dimensionality reduction on the original and transformed dataset, and measuring similarity between the k nearest neighbors for each cell in the datasets. The Jaccard index is used to quantify similarity, and is the final metric averages across all cells.
Arguments:
- `rng`: random number generator used for k-NN
- `X`: low dimensional embedding for reference dataset
- `Ys` low dimensional embedding for transformed datasets
- `k`: number of neighbours to find (default=15)
Return values: A tuple of agreement scores for each transformed dataset
Severo.agreement
— Methodagreement(X::Union{AbstractMatrix, LinearEmbedding}, Ys::Union{AbstractMatrix, LinearEmbedding}...; k::Int64)
A metric for quantifying how much a transformation/factorization distorts the geometry of the original dataset. The greater the agreement, the less distortion of geometry there is.
This is calculated by performing dimensionality reduction on the original and transformed dataset, and measuring similarity between the k nearest neighbors for each cell in the datasets. The Jaccard index is used to quantify similarity, and is the final metric averages across all cells.
Arguments:
- `X`: low dimensional embedding for reference dataset
- `Y...`: low dimensional embedding for transformed dataset(s)
- `k`: number of neighbours to find (default=15)
Return values: The agreement score
Severo.alignment
— Methodalignment(X::AbstractMatrix, datasets::AbstractVector{T}...; k::Union{Nothing,Int64}=nothing) where T
Calculates the alignment score
as defined by Butler 2018 [doi: 10.1038/nbt.4096]. It's a quantitative metric for the alignment of datasets and calculated as follows:
1. Randomly downsample the datasets to have the same number of cells as the smallest dataset
2. Construct a nearest-neighbor graph based on the cells’ embedding in some low dimensional space `X`.
3. For every cell, calculate how many of its k nearest-neighbors belong to the same dataset and average this over all cells.
4. We then normalize by the expected number of same dataset cells and scale to range from 0 to 1.
If the datasets are well-aligned, we would expect that each cells’ nearest neighbors would be evenly shared across all datasets.
Arguments:
- `X`: low dimensional embedding of the aligned datasets
- `datasets`: the split into datasets
- `k`: number of neighbours to find. By default: 1% of the total number of cells, capped by a minimum of 10 and total number of samples drawn
Return values: The alignment score
Severo.alignment
— Methodalignment(rng::AbstractRNG, X::AbstractMatrix, datasets::AbstractVector{T}...; k::Union{Nothing,Int64}=nothing) where T
Calculates the alignment score
as defined by Butler 2018 [doi: 10.1038/nbt.4096]. It's a quantitative metric for the alignment of datasets and calculated as follows:
1. Randomly downsample the datasets to have the same number of cells as the smallest dataset
2. Construct a nearest-neighbor graph based on the cells’ embedding in some low dimensional space `X`.
3. For every cell, calculate how many of its k nearest-neighbors belong to the same dataset and average this over all cells.
4. We then normalize by the expected number of same dataset cells and scale to range from 0 to 1.
If the datasets are well-aligned, we would expect that each cells’ nearest neighbors would be evenly shared across all datasets.
Arguments:
- `rng`: random number generator used by downsampling and k-NN
- `X`: low dimensional embedding
- `datasets`: the split into datasets
- `k`: number of neighbours to find. By default: 1% of the total number of cells, capped by a minimum of 10 and total number of samples drawn
Return values: The alignment score
Severo.cluster
— Methodcluster(SNN::NeighbourGraph; algorithm=:louvain, resolution=0.8, nstarts=1, niterations=10) where T
Cluster cells based on a neighbourhood graph.
Arguments:
- `SNN`: shared neighbours graph
- `algorithm`: clustering algorithm to use (louvain)
- `resolution`: parameters above 1 will lead to larger communities whereas below 1 lead to smaller ones
- `nstarts`: number of random starts
- `niterations`: maximum number of iterations per random start
- `group_singletons`: group singletons into nearest cluster, if false keeps singletons
Return values:
cluster assignment per cell
Severo.cluster
— Methodcluster(rng::AbstractRNG, SNN::NeighbourGraph; algorithm=:louvain, resolution=0.8, nstarts=1, niterations=10) where T
Cluster cells based on a neighbourhood graph.
Arguments:
- `rng`: random number generator
- `SNN`: shared neighbours graph
- `algorithm`: clustering algorithm to use (louvain)
- `resolution`: parameters above 1 will lead to larger communities whereas below 1 lead to smaller ones
- `nstarts`: number of random starts
- `niterations`: maximum number of iterations per random start
- `group_singletons`: group singletons into nearest cluster, if false keeps singletons
Return values:
cluster assignment per cell
Severo.convert_counts
— Methodconvert_counts(X::AbstractMatrix, features::AbstractVector, barcodes::AbstractVector; unique_features::Bool=true)
Convert a count matrix and labels into its labeled representation
Arguments:
X
: a count matrix (features x barcodes)features
: list of feature namesbarcodes
: list of barcodesunique_features
: should feature names be made unique (default: true)
Returns values:
Returns labeled sparse matrix containing the counts
Severo.convert_counts
— Methodconvert_counts(X::AbstractMatrix)
Convert a count matrix into its labeled representation by generating unique labels
Arguments:
X
: a count matrix (features x barcodes)
Returns values:
Returns labeled sparse matrix containing the counts
Severo.filter_cells
— Methodfilter_cells(A::NamedCountMatrix; min_features=0, min_feature_count=0, min_umi=0)
Filter a labeled count matrix, removing cells for which the metrics fall below the given thresholds
Arguments:
- `A`: the count matrix
- `min_features`: include cells where at least this many features are detected
- `min_features_count`: threshold on the count for which a feature is marked "detected"
- `min_umi`: include cells where the total of umi counts is at least this value
Return value:
The filtered, labeled matrix with cells removed
Severo.filter_counts
— Methodfilter_counts(A::NamedCountMatrix; min_cells=0, min_features=0, min_feature_count=0, min_umi=0)
Filter a labeled count matrix, removing cells and features for which the metrics fall below the given threshold
First cells are removed using filter_cells
and then features using filter_features
. This order can be important!
Arguments:
- `A`: the count matrix
- `min_cells`: include features detected in at least this many cells
- `min_features`: include cells where at least this many features are detected
- `min_features_count`: threshold on the count for which a feature is marked "detected"
- `min_umi`: include cells where the total of umi counts is at least this value
Return value:
The filtered, labeled matrix with cells and features removed
Severo.filter_features
— Methodfilter_features(A::NamedCountMatrix; min_cells=0)
Filter a count matrix, removing features for which the metrics fall below the given thresholds
Arguments:
- `A`: the count matrix
- `min_cells`: include features detected in at least this many cells
Return value:
The filtered matrix with features removed
Severo.filter_rank_markers
— Methodfilter_rank_markers(de::DataFrame; pval_thresh::Real=1e-2, ngenes::Integer=typemax(Int64))
Filters and ranks a list of markers (differentially expressed genes).
Arguments:
-`de`: list of markers returned by [find_markers](@ref)
-`pval_thresh`: only keep markers with pval < pval_thresh
-`count`: the number of highest-ranked markers to keep
-`rankby_abs`: rank based on absolute value of the scores
Return values:
A DataFrame
containing a ranked list of filtered markers.
Severo.find_markers
— Methodfind_markers(X::Union{NamedCountMatrix, NamedDataMatrix}, idents::NamedVector{<:Integer};
method=:wilcoxon, selection::Union{Nothing, NamedArray{Bool, 2}, AbstractArray{Bool,2}}=nothing, log::Bool=false, kw...)
Finds markers (differentially expressed genes) for each of the classes in a dataset.
Arguments:
-`X`: count or data matrix
-`idents`: class identity for each cell
-`method`: Which test to use, supported are: [wilcoxon, t]
-`selection`: a selection of features and groups that should be considered
-`log`: the data is in log-scale (default = false)
-`kw...`: additional parameters passed down to the method
Return values:
A DataFrame
containing a list of putative markers with associated statistics (p-values and scores) and log fold-changes.
Severo.find_variable_features
— Functionfind_variable_features(counts::NamedCountMatrix, nfeatures=2000; method=:vst, kw...)
Identification of highly variable features: find features that exhibit high cell-to-cell variation in the dataset (i.e, they are highly expressed in some cells, and lowly expressed in others).
Arguments:
-`counts`: count matrix
-`nfeatures`: the number of top-ranking features to return
-`method`: how to choose top variable features
-`kw`: additional keyword arguments to pass along to the method
*Methods**:
-`:vst`: fits a line to the log(mean) - log(variance) relationship, then standardizes the features values
using the observed mean and expected variance. Finally, feature variance is calculated using the standardized values.
- `loess_span`: span parameter for loess regression when fitting the mean-variance relationship
-`dispersion`: selects the genes with the highest dispersion values
-`meanvarplot`: calculates the feature mean and dispersion, bins the mean according into `num_bins` bins.
Finally, returns the z-scores for dispersion within each bin.
- `num_bins`: Total number of bins to use
- `binning_method`: Specifies how the bins should be computed. Available: `:width` for equal width and `:frequency` for equal frequency binning
Return value:
The nfeatures
top-ranked features
Severo.jaccard_index
— Methodjaccard_index(X::NamedArray{T,2}; prune::Real=1/15) where T
Compute a graph with edges defined by the jaccard index. The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.
Arguments:
- `nn`: a nearest neighbour graph
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph
Return values:
A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.
Severo.jaccard_index
— Methodjaccard_index(X::NamedArray{T,2}, k::Int64; prune::Real=1/15) where T
Compute a graph with edges defined by the jaccard index. The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.
Arguments:
- `nn`: a nearest neighbour graph
- `k`: maximum number of neighbours
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph
Return values:
A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.
Severo.mean_var
— Methodvariance to mean ratio (VMR) in non-logspace
Severo.nearest_neighbours
— Methodnearest_neighbours(em::LinearEmbedding, k::Int64;
dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T
Compute a k-nearest neighbours graph based on a linear embedding
Arguments:
- `em`: embedding containing the transformed coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
Return values:
A k-nearest neighbours graph represented by a sparse matrix. k-neighbours are stored as rows for each cell (cols)
Severo.nearest_neighbours
— Methodnearest_neighbours(rng::AbstractRNG, em::LinearEmbedding, k::Int64;
dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T
Compute a k-nearest neighbours graph based on a linear embedding
Arguments:
- `rng`: random number generator
- `em`: embedding containing the transformed coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
Return values:
A k-nearest neighbours graph represented by a sparse matrix. k-neighbours are stored as rows for each cell (cols)
Severo.nearest_neighbours
— Methodnearest_neighbours(X::NamedArray{T,2}, k::Int64;
dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T
Compute a k-nearest neighbours graph based on coordinates for each cell.
Arguments:
- `X`: a labelled matrix with coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
Return values:
A k-nearest neighbours graph represented by a sparse matrix. k-neighbours are stored as rows for each cell (cols)
Severo.nearest_neighbours
— Methodnearest_neighbours(rng::AbstractRNG, X::NamedArray{T,2}, k::Int64;
dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T
Compute a k-nearest neighbours graph based on coordinates for each cell.
Arguments:
- `rng`: random number generator
- `X`: a labelled matrix with coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
Return values:
A k-nearest neighbours graph represented by a sparse matrix. k-neighbours are stored as rows for each cell (cols)
Severo.normalize_cells
— Methodnormalize_cells(X::NamedCountMatrix; method=:lognormalize, scale_factor=1.0)
Normalize count data with different methods:
- `lognormalize`: feature counts are divided by the total count per cell, scaled by `scale_factor` and then log1p transformed.
- `relativecounts`: feature counts are divided by the total count per cell and scaled by `scale_factor`.
Arguments:
- `X`: the labelled count matrix to normalize
- `method`: normalization method to apply
- `scale_factor`: the scaling factor
- `dtype`: datatype to be used for the output
Return values:
A labelled data matrix
Severo.plot_elbow
— Methodplot_elbow(em::LinearEmbedding)
Plots the standard deviations of the principle components for easy identification of an elbow in the graph.
Arguments:
-`em`: a linear embedding
Return value:
A plot object
Severo.plot_embedding
— Methodplot_embedding(em::LinearEmbedding)
Plots the output of a dimensional reduction technique on a 2D scatter plot where each point is a cell and it's positioned based on the cell embeddings determined by the reduction technique.
Arguments:
-`em`: a linear embedding
Return value:
A plot object
Severo.plot_highest_expressed
— Methodplot_highly_expressed_genes(X::NamedCountMatrix, n::Int64; dropfeatures::Union{Nothing, AbstractArray}=nothing)
Plot the features with the highest average expression across all cells, along with their expression in each individual cell.
Arguments:
-`X`: the count matrix
-`n`: the number of the most expressed features to show
-`dropfeatures`: array with names, indices or bits indicating features to drop when plotting
Return value:
A plot object
Severo.plot_loadings
— Methodplot_loadings(em::LinearEmbedding; dims::AbstractVector{<:Integer}=1:6, nfeatures::Integer=10)
Visualize top genes associated with reduction components
Arguments:
-`em`: a linear embedding
-`dims`: which components to display
-`nfeatures`: number of genes to display
Return value:
A plot object
Severo.prefilter_markers
— Functionprefilter_markers(X::Union{NamedCountMatrix, NamedDataMatrix}, idents::NamedVector{<:Integer};
logfc_threshold::Real=0.0, min_pct::Real=0.0, min_diff_pct::Real=-Inf, only_pos:Bool=false, log::Bool=false)
Filter features for each of the classes in a dataset.
Arguments:
-`X`: count or data matrix
-`idents`: class identity for each cell
-`logfc_threshold`: Limit testing to features which show, on average, at least X-fold difference (log-scale) between the two groups of cells
-`min_pct`: only test features that are detected in a minimum fraction of `min_pct` cells in either of the two populations
-`min_diff_pct`: only test features that show a minimum difference in the fraction of detection between the two groups.
-`only_pos`: only return features with positive log fold-change
-`log`: the data is in log-scale (default = false)
Return values:
Selection matrix for each feature and class
Severo.purity
— Methodpurity(clusters::IntegerArray, classes::IntegerArray)
Calculates purity between clusters and external clustering (true clusters/classes).
Arguments:
- `clusters`: clustering for which to calculate purity
- `classes`: clustering/classes with which to compare
Return values:
Purity score in the range [0, 1], with a score of 1 representing a pure/accurate clustering
Severo.read_10X
— Methodread_10X(dirname::AbstractString; unique_features=true)
Read count matrix from 10X genomics
Arguments:
dirname
: path to directory containing matrix.mtx, genes.tsv (or features.tsv), and barcodes.tsv from 10Xunique_features
: should feature names be made unique (default: true)
Returns values:
Returns labeled sparse matrix containing the counts
Severo.read_10X_h5
— Methodread_10X_h5(fname::AbstractString; dataset::AbstractString="/mm10", unique_features=true)
Read count matrix from 10X CellRanger hdf5 file.
Arguments:
fname
: path to hdf5 filedataset
: name of dataset to load (default: "mm10")unique_features
: should feature names be made unique (default: true)
Returns values:
Returns labeled sparse matrix containing the counts
Severo.read_csv
— Methodread_csv(fname::AbstractString; unique_features=true)
Read count matrix from CSV
Arguments:
fname
: path to csv fileunique_features
: should feature names be made unique (default: true)
Returns values:
Returns labeled sparse matrix containing the counts
Severo.read_data
— Methodread_data(path::AbstractString; kw...)
Tries to identify and read a count matrix in any of the supported formats
Arguments:
fname
: pathkw
: additional keyword arguments are passed on
Returns values:
Returns labeled sparse matrix containing the counts
Severo.read_dge
— Methodread_dge(fname::AbstractString; unique_features=true)
Read count matrix from digital gene expression (DGE) files
Arguments:
fname
: path to dge fileunique_features
: should feature names be made unique (default: true)
Returns values:
Returns labeled sparse matrix containing the counts
Severo.read_h5
— Methodread_h5(fname::AbstractString; dataset::AbstractString="/mm10", unique_features=true)
Read count matrix from hdf5 file.
Arguments:
fname
: path to hdf5 filedataset
: name of dataset to load (default: "counts")unique_features
: should feature names be made unique (default: true)
Returns values:
Returns labeled sparse matrix containing the counts
Severo.read_h5ad
— Methodread_h5ad(fname::AbstractString, dataset::String="/mm10"; unique_features=true)
Read count matrix from hdf5 file as created by AnnData.py. https://anndata.readthedocs.io/en/latest/fileformat-prose.html
Arguments:
fname
: path to hdf5 fileunique_features
: should feature names be made unique (default: true)
Returns values:
Returns labeled sparse matrix containing the counts
Severo.read_loom
— Methodread_loom(fname::AbstractString; barcode_names::AbstractString="CellID", feature_names::AbstractString="Gene", unique_names::Bool=true, blocksize::Tuple{Int,Int}=(100,100))
Read count matrix from loom format
Arguments:
fname
: path to loom filebarcode_names
: key where the observation/cell names are stored.feature_names
: key where the variable/feature names are stored.unique_names
: should feature and barcode names be made unique (default: true)blocksize
: blocksize to use when reading the matrix (tradeoff between memory and speed)
Returns values:
Returns labeled sparse matrix containing the counts
Severo.scale_features
— Methodscale_features(X::NamedArray{T, 2, SparseMatrixCSC{T, Int64}} ; scale_max=Inf, dtype::Type{<:AbstractFloat})
Scale and center a count/data matrix along the cells such that each feature is standardized
Arguments:
- `X`: the labelled count/data matrix to scale
- `scale_max`: maximum value of the scaled data
Return values:
A centered matrix
Severo.shared_nearest_neighbours
— Methodshared_nearest_neighbours(em::LinearEmbedding, k::Int64; dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2))
Compute a k-nearest neighbours graph based on an embedding of cells and its Jaccard index.
The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.
Arguments:
- `em`: embedding containing the transformed coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph
Return values:
A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.
Severo.shared_nearest_neighbours
— Methodshared_nearest_neighbours(rng::AbstractRNG, em::LinearEmbedding, k::Int64; dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2))
Compute a k-nearest neighbours graph based on an embedding of cells and its Jaccard index.
The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.
Arguments:
- `rng`: random number generator
- `em`: embedding containing the transformed coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph
Return values:
A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.
Severo.shared_nearest_neighbours
— Methodshared_nearest_neighbours(X::NamedArray{T,2}, k::Int64;
dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T
Compute a k-nearest neighbours graph based on coordinates for each cell and its Jaccard index.
The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.
Arguments:
- `X`: a labelled matrix with coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph
Return values:
A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.
Severo.shared_nearest_neighbours
— Methodshared_nearest_neighbours(rng::AbstractRNG, X::NamedArray{T,2}, k::Int64; dims=:, metric::SemiMetric=Euclidean(), include_self::Bool=true, ntables::Int64=2*size(X,2)) where T
Compute a k-nearest neighbours graph based on coordinates for each cell and its Jaccard index.
The Jaccard index measures similarity between nearest neighbour sets, and is defined as the size of the intersection divided by the size of the union. "0" indicating no overlap and "1" indicating full overlap.
Arguments:
- `rng`: random number generator
- `X`: a labelled matrix with coordinates for each cell
- `k`: number of nearest neighbours to find
- `dims`: which dimensions to use
- `include_self`: include the cell in its k-nearest neighbours
- `ntables`: number of tables to use in knn algorithm: controls the precision (higher is more accurate)
- `prune`: cutoff for the Jaccard index, edges with values below this cutoff are removed from the resulting graph
Return values:
A shared nearest neighbours graph represented by a sparse matrix. Weights of the edges indicate similarity of the neighbourhoods of the cells as computed with the Jaccard index.
Severo.umap
— Functionumap(em::LinearEmbedding, ncomponents::Int64=2; dims=:, metric=:cosine, nneighbours::Int=30, min_dist::Real=.3, nepochs::Int=300, kw...) where T
Performs a Uniform Manifold Approximation and Projection (UMAP) dimensional reduction on the coordinates in the linear embedding.
For a more in depth discussion of the mathematics underlying UMAP, see the ArXiv paper: [https://arxiv.org/abs/1802.03426]
Arguments:
- `em`: embedding containing the transformed coordinates for each cell
- `ncomponents`: the dimensionality of the embedding
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `nneighbours`: the number of neighboring points used in local approximations of manifold structure.
- `min_dist`: controls how tightly the embedding is allowed compress points together.
- `nepochs`: number of training epochs to be used while optimizing the low dimensional embedding
- `kw`: additional parameters for the umap algorithm. See [`UMAP.umap`](@ref)
Return values:
A low-dimensional embedding of the cells
Severo.umap
— Functionumap(X::NamedMatrix, ncomponents::Int64=2; dims=:, metric=:cosine, nneighbours::Int=30, min_dist::Real=.3, nepochs::Int=300, kw...) where T
Performs a Uniform Manifold Approximation and Projection (UMAP) dimensional reduction on the coordinates.
For a more in depth discussion of the mathematics underlying UMAP, see the ArXiv paper: [https://arxiv.org/abs/1802.03426]
Arguments:
- `X`: a labelled matrix with coordinates for each cell
- `ncomponents`: the dimensionality of the embedding
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `nneighbours`: the number of neighboring points used in local approximations of manifold structure.
- `min_dist`: controls how tightly the embedding is allowed compress points together.
- `nepochs`: number of training epochs to be used while optimizing the low dimensional embedding
- `kw`: additional parameters for the umap algorithm. See [`UMAP.umap`](@ref)
Return values:
A low-dimensional embedding of the cells
Severo.umap
— Methodumap(X::AbstractMatrix, ncomponents::Int64=2; dims=:, metric=:cosine, nneighbours::Int=30, min_dist::Real=.3, nepochs::Int=300, kw...) where T
Performs a Uniform Manifold Approximation and Projection (UMAP) dimensional reduction on the coordinates in the linear embedding.
For a more in depth discussion of the mathematics underlying UMAP, see the ArXiv paper: [https://arxiv.org/abs/1802.03426]
Arguments:
- `X`: an unlabelled matrix with coordinates for each cell
- `ncomponents`: the dimensionality of the embedding
- `dims`: which dimensions to use
- `metric`: distance metric to use
- `nneighbours`: the number of neighboring points used in local approximations of manifold structure.
- `min_dist`: controls how tightly the embedding is allowed compress points together.
- `nepochs`: number of training epochs to be used while optimizing the low dimensional embedding
- `kw`: additional parameters for the umap algorithm. See [`UMAP.umap`](@ref)
Return values:
A low-dimensional embedding of the cells