scRNA-seq
Raw data processing
cellranger : 10x Genomics’s Chromium system to
process single cell RNA-seq raw data. Easy to use but need large
computing resources and Linux environment.
Input : Fastq files, cellranger genome index
cellranger count/aggr
# cellranger count for transcriptome data processing
cellranger count --id=SampleA_GEX \
--transcriptome=/path/to/reference/transcriptomes/Mouse_GEX_2020/refdata-gex-mm10-2020-A \
--fastqs=/path/to/raw_data/SampleA \
--sample=SampleA \
--expect-cells=10000
# cellranger aggr to aggregate multiple cellranger count output to one file
cellranger aggr --id=Aggregate_GEX \
--csv=/path/to/aggregate_info/Aggregate_GEX.csv
Aggregate_GEX.csv example
Output
Aggregated Feature-Barcode Matrices
matrix.mtx
genes.tsv
barcodes.tsv
Read Cellranger output
Doublet removal (by scrublet)
scrublet runs in python env.
Input file is the count matrix from seurat object.
# If needed, install python pacakges first
# Create and activate a new conda environment
conda create -n bioinfo_env python=3.8 -y
conda activate bioinfo_env
# Install pip if not already installed
conda install pip -y
# Install necessary packages with pip
pip install numpy pandas scanpy scrublet
# run python
python
import numpy as np
import pandas as pd
import scanpy as sc
import scrublet as scr
def run_scrublet(input_file, output_file, expected_doublet_rate=0.1):
df = pd.read_csv(input_file, header=0, index_col=0)
adata = sc.AnnData(df)
# Set the expected_doublet_rate parameter
sc.external.pp.scrublet(adata, expected_doublet_rate=expected_doublet_rate)
# Save the observation (results) to CSV
adata.obs.to_csv(output_file)
input_file = 'input.count.csv'
output_file = 'output.scr.csv'
# Adjust the expected_doublet_rate parameter
expected_doublet_rate = 0.1
run_scrublet(input_file, output_file, expected_doublet_rate)
Add the scrublet output information to the seurat object
## import scrublet result
df= read.csv('path/output.scr.csv', row.names = 1)
# If the previous process changes the name of cell ID,
# Use the following code to correct it.
rownames(df) = gsub(pattern = '_', replacement = '-' ,rownames(df))
## add doublet info to the srt obj
obj.srt[['doublet_score']] = df[rownames(obj.srt@meta.data),]$doublet_score
obj.srt[['predicted_doublet']] =df[rownames(obj.srt@meta.data),]$predicted_doublet
obj.srt %>% saveRDS("path/saved.obj.rds")
# You might want to check the number of doublets in the data
obj.srt@meta.data %>% select(predicted_doublet) %>% table()