To group tracks with similar properties, one can in principle perform
any clustering method of interest on a *feature matrix* of
quantification metrics for each track in the dataset. The package comes
with three convenience functions –
`getFeatureMatrix()`

,`trackFeatureMap()`

, and
`clusterTracks()`

– to easily compute several metrics on all
tracks at once, visualize them in 2 dimensions, and to cluster tracks
accordingly. This tutorial shows how to use these functions to explore
heterogeneity in a track dataset.

First load the package:

The package contains a dataset of B and T cells in a mouse cervical
lymph node, and neutrophils responding to an *S. aureus*
infection in a mouse ear; all are imaged using two-photon microscopy.
While the original data contained 3D coordinates, we’ll use the 2D
projection on the XY plane (see the vignettes on quality control methods and preprocessing of the package datasets for
details).

The T-cell dataset consists of 199 tracks of individual cells in a tracks object:

```
str( TCells, list.len = 2 )
#> List of 199
#> $ 1 : num [1:39, 1:3] 48 72 96 120 144 168 192 216 240 264 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:3] "t" "x" "y"
#> $ 3 : num [1:19, 1:3] 24 48 72 96 120 144 168 192 216 240 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:3] "t" "x" "y"
#> [list output truncated]
#> - attr(*, "class")= chr "tracks"
```

Each element in this list is a track from a single cell, consisting of a matrix with \((x,y)\) coordinates and the corresponding measurement timepoints:

```
head( TCells[[1]] )
#> t x y
#> [1,] 48 90.8534 65.3943
#> [2,] 72 89.5923 64.9042
#> [3,] 96 88.6958 67.1125
#> [4,] 120 87.3437 68.2392
#> [5,] 144 86.2740 67.9236
#> [6,] 168 84.0549 68.2502
```

Similarly, we will also use the `BCells`

and
`Neutrophils`

data:

```
str( BCells, list.len = 2 )
#> List of 74
#> $ 130 : num [1:24, 1:3] 48 72 96 120 144 168 192 216 240 264 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:3] "t" "x" "y"
#> $ 210 : num [1:39, 1:3] 48 72 96 120 144 168 192 216 240 264 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:3] "t" "x" "y"
#> [list output truncated]
#> - attr(*, "class")= chr "tracks"
str( Neutrophils, list.len = 2 )
#> List of 411
#> $ 21 : num [1:7, 1:3] 48 72 96 120 144 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:3] "t" "x" "y"
#> $ 22 : num [1:8, 1:3] 48 72 96 120 144 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:3] "t" "x" "y"
#> [list output truncated]
#> - attr(*, "class")= chr "tracks"
```

Since there are quite many cells, we’ll sample just some of the tracks for the legibility of the plots in this tutorial – but everything we will do could also be done on the complete datasets.

```
# Take a sample
set.seed(1234)
TCells <- TCells[ sample( names(TCells), 30 ) ]
BCells <- BCells[ sample( names(BCells), 30 ) ]
Neutrophils <- Neutrophils[ sample( names(Neutrophils), 30 ) ]
```

Combine them in a single dataset, where labels also indicate celltype:

```
T2 <- TCells
names(T2) <- paste0( "T", names(T2) )
tlab <- rep( "T", length(T2) )
B2 <- BCells
names(B2) <- paste0( "B", names(B2) )
blab <- rep( "B", length(B2) )
N2 <- Neutrophils
names(N2) <- paste0( "N", names(Neutrophils) )
nlab <- rep( "N", length( N2) )
all.tracks <- c( T2, B2, N2 )
real.celltype <- c( tlab, blab, nlab )
```

Using the function `getFeatureMatrix()`

, we can quickly
apply several quantification measures to all data at once (see
`?TrackMeasures`

for an overview of measures we can
compute):

```
m <- getFeatureMatrix( all.tracks,
c(speed, meanTurningAngle,
outreachRatio, squareDisplacement) )
# We get a matrix with a row per track and one column for each metric:
head(m)
#> [,1] [,2] [,3] [,4]
#> T38 0.12164986 0.6895097 0.5000639 1271.0287
#> T115 0.12411080 0.8124751 0.4840314 3161.6797
#> T7954 0.13206831 1.5715280 0.3300338 109.4300
#> T5695 0.22796888 1.1175668 0.2823076 458.9368
#> T6386 0.07619656 1.8086072 0.4217940 137.8384
#> T7581 0.14522408 1.1691051 0.5149288 1558.9763
```

We can use this matrix to explore relationships between different metrics. For example, we can check the relationship between speed and mean turning angle:

When using more than two metrics at once to quantify track
properties, it becomes hard to visualize which tracks are similar to
each other. Like with single-cell data, dimensionality reduction methods
can help visualize high-dimensional track feature datasets. The function
`trackFeatureMap()`

is a wrapper method that helps to quickly
visualize data using three popular methods: principal component analysis
(PCA), multidimensional scaling (MDS), and uniform manifold approximate
and projection (UMAP). The function `trackFeatureMap()`

can
be used for a quick visualization of data, or return the coordinates in
the new axis system if the argument
`return.mapping=TRUE`

.

Use `trackFeatureMap()`

to perform a principal component
analysis (PCA) based on the measures “speed”, “meanTurningAngle”,
“squareDisplacement”, “maxDisplacement”, and “outreachRatio” using the
optional `labels`

argument to color points by their real
celltype and `return.mapping=TRUE`

to also return the data
rather than just the plot:

```
pca <- trackFeatureMap( all.tracks,
c(speed,meanTurningAngle,squareDisplacement,
maxDisplacement,outreachRatio ), method = "PCA",
labels = real.celltype, return.mapping = TRUE )
```

Note that the B cells and Neutrophils are relatively well-separated from each other in this plot, but the T cells are hard to distinguish from neutrophils based on these features.

We can then inspect the data stored in `pca`

. This
reveals, for example, that the first principal component is correlated
with speed:

```
pc1 <- pca[,1]
pc2 <- pca[,2]
track.speed <- sapply( all.tracks, speed )
cor.test( pc1, track.speed )
#>
#> Pearson's product-moment correlation
#>
#> data: pc1 and track.speed
#> t = 4.5449, df = 88, p-value = 1.743e-05
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#> 0.2516387 0.5898408
#> sample estimates:
#> cor
#> 0.4360085
```

See `?prcomp`

for details on how principal components are
computed, and for further arguments that can be passed on to
`trackFeatureMap()`

.

Another popular method for visualization is multidimensional scaling
(MDS), which is also supported by `trackFeatureMap()`

:

```
trackFeatureMap( all.tracks,
c(speed,meanTurningAngle,squareDisplacement,maxDisplacement,
outreachRatio ), method = "MDS",
labels = real.celltype )
```

Internally, `trackFeatureMap()`

computes a distance matrix
using `dist()`

and then applies MDS using
`cmdscale()`

. See the documentation of `cmdscale`

for details and further arguments that can be passed on via
`trackFeatureMap()`

.

Again, we find that the B cells and Neutrophils are separated in this plot, while T cells mix with neutrophils.

Uniform manifold approximate and projection (UMAP) is another
powerful method to explore structure in high-dimensional datasets.
`trackFeatureMap()`

supports visualization of tracks in a
UMAP. Note that this option requires the `uwot`

package.
Please install this package first using
`install.packages("uwot")`

.

To go beyond visualizing similar and dissimilar tracks using multiple
track features, `clusterTracks()`

supports the explicit
grouping of tracks into clusters using two common methods: hierarchical
and k-means clustering.

Hierarchical clustering is performed by calling `hclust()`

on a distance matrix computed via `dist()`

on the feature
matrix:

```
clusterTracks( all.tracks,
c(speed,meanTurningAngle,squareDisplacement,maxDisplacement,
outreachRatio ), method = "hclust", labels = real.celltype )
```

See methods `dist()`

and `hclust()`

for
details.

Secondly, `clusterTracks()`

also supports k-means
clustering of tracks. Note that this requires an extra argument
`centers`

that is passed on to the `kmeans()`

function and specifies the number of clusters to make. In this case,
let’s use three clusters because we have three celltypes:

```
clusterTracks( all.tracks,
c(speed,meanTurningAngle,squareDisplacement,maxDisplacement,
outreachRatio ), method = "kmeans",
labels = real.celltype, centers = 3 )
```

In these plots, we see the value of each feature in the feature matrix plotted for the different clusters, whereas the color indicates the “real” celltype the track came from.