Dataset Filtering
- Define the minimum and maximum number of cells counts for each dataset. Then, filter out observations with total counts outside the range [cell min counts, cell max counts].
- Compute the expression fraction for each gene. This means that for each slide in the collection we compute the fraction of the observations that express each gene and then took the minimum across all the slides.
- Compute the global expression fraction for each gene. This is like the expression fraction but instead of computing for each slide and taking the minimum we compute it for the whole collection.
- Filter out genes using the following criteria:
- Exclude genes that are not expressed in at least the minimum expression fraction of spots on each slide.
- Exclude genes that are not expressed in at least the minimum global expression fraction of cells in the entire collection.
- Exclude genes with counts outside the range of [gene min counts, gene max counts]
- Remove cells with zero counts in all genes if they exists.
- Compute quality control metrics using scanpy.pp.calculate_qc_metrics() function.