Normalize gene expression levels using Transcripts Per Million (TPM). The GTF files used for this process can be downloaded from these links:
gtf_human_file -
gtf_mouse_file
Remove genes that are not found in the GTF annotation file.
Transform the data by applying a logarithm base 2 using scanpy.pp.log1p() function.
The adaptive median filter is applied to all the slides in the collection.
The maximum window size is defined by a neighborhood of 3 hops, which represents the number of concentric rings in a graph considered when computing the median.
This adaptive median filter is the one proposed in the SEPAL paper.
Compute Moran's I for each gene in each slide using squidpy.gr.spatial_autocorr() function. Then, average Moran's I for each gene across all slides.
Filter by Moran's I, keeping the top 128 or 32 genes with the highest Moran's I. This parameter varies depending on each dataset.
For a detailed list of the number of genes in each dataset, please refer to Database Metadata.
Perform ComBat batch correction using pycombat() function.
Compute the deviations from the mean expression of each gene.