Back to Main Page

Workflow

Preprocessing by cellranger (count, vdj, aggr)

  • cellranger count
  • cellranger vdj
  • cellranger aggr
# Sample : Control
# CellRanger count for Transcriptome 
cellranger count --id=<SampleID_GEX> \
                 --transcriptome=<Path_to_Mouse_GEX_Reference> \
                 --fastqs=<Path_to_GEX_Raw_Data> \
                 --sample=<Sample_Name_GEX> \
                 --expect-cells=10000

# CellRanger vdj for VDJ information  
cellranger vdj --id=<SampleID_VDJ> \
               --reference=<Path_to_Mouse_VDJ_Reference> \
               --fastqs=<Path_to_VDJ_Raw_Data> \
               --sample=<Sample_Name_VDJ>

  
# Sample : Treated
# CellRanger count for Transcriptome 
cellranger count --id=<SampleID_GEX_2> \
                 --transcriptome=<Path_to_Mouse_GEX_Reference> \
                 --fastqs=<Path_to_GEX_Raw_Data_2> \
                 --sample=<Sample_Name_GEX_2> \
                 --expect-cells=10000

# CellRanger vdj for VDJ information  
cellranger vdj --id=<SampleID_VDJ_2> \
               --reference=<Path_to_Mouse_VDJ_Reference> \
               --fastqs=<Path_to_VDJ_Raw_Data_2> \
               --sample=<Sample_Name_VDJ_2>

# CellRanger aggr for combining all samples
cellranger aggr --id=<Aggregation_ID> \
                --csv=<Path_to_Aggregation_CSV>

Cell Ranger 5’ VDJ Algorithm Overview link


VDJ information

  • Filtering 1 TRA and 1 TRB clonotype
# Import cellranger aggr output 
vdj <- read.csv('path_to_filtered_contig_annotations.csv')
clono <- read.csv('path_to_clonotypes.csv')

# Removal of duplicates and merging by clonotype_id
vdj <- vdj %>% distinct(barcode, .keep_all = TRUE)
rownames(vdj) <- vdj$barcode
vdj.clono <- inner_join(vdj, clono, by=c('raw_clonotype_id'='clonotype_id'))

# Counting TRA and TRB from cdr3s_aa  
vdj.clono$TRBcount <- stringr::str_count(vdj.clono$cdr3s_aa, "TRB")
vdj.clono$TRAcount <- stringr::str_count(vdj.clono$cdr3s_aa, "TRA")
vdj.clono <- vdj.clono %>% arrange(desc(frequency))
rownames(vdj.clono) <- vdj.clono$barcode

# filtering 1 alpha chain and  1 beta chain clonotype
vdj.clono <- vdj.clono %>% filter(TRBcount == 1 & TRAcount == 1)

# save vdj information
vdj.clono %>% write.csv('path_to_save_vdj_clono.csv')


Basic Clonotype Analysis

Immune repertoire analysis

Analysis using R packages (scRepertoire, immunarch)

1. Diversity Index

  • Shannon Entropy
  • Simpson

2. Repertoire analysis

  • TRAV/D/J/C distribution
  • TRBV/D/J/C distribution
  • CDR3nt length distribution
  • Clustering of clonotype repertoire

3. Expansion study

  • Clonotype expansion (hyper,high,medium,small,rare)
  • Top 100 expanded clonotypes with TRAV:TRBV info


Further Analysis with scRNA-seq data

Merging vdj information with scRNA-seq (if available)
Suggested analyses below

Target cell selection

  • Cells with target features (e.g. cytotoxic cells, effector T cells, NKT, etc).
  • Cells associated with expanded clonotypes (hyper, high, medium, small, rare).
  • Cells with clonotypes displaying treatment-specific response.

Correlation of clonotype frequency with related data

  • Correlation of clonotype frequency with target gene expression profiles.
  • Correlation with clinical outcomes and phenotypic data. (if available)

Comparative Analysis

  • Comparison between different patient groups (e.g., responders vs non-responders to therapy, different disease stages) and associated clonotypes.
  • Clonotype repertoires to identify conserved features of the immune response.

Statistical Modeling

  • Application of statistical models and computational algorithms for the analysis.
  • Evaluation of the robustness and reproducibility of findings.



Sample work

DGKi TCR-seq analysis

Example Plots

Clonotype Distribution

# data_melted : Clonotype frequency information 
# Draw plot
ggplot(data_melted, aes(x = reorder(Clonotype, -value), y = value, 
                        fill = variable)) + 
  geom_col(color = "grey3", size=0.2) + 
  scale_fill_manual(values = c("salmon", "darkorange"),
                    labels = c("Sample A", "Sample B")) +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, size=6),
        legend.title = element_blank(),
        plot.title = element_text(hjust = 0.5, size=20)) +
  labs(x = "Top 100 Clonotype", y = "Total number of clonotype",
      title = "Clonotype distribution by sample") 



Clonotype Abundance in each sample

# Bubble Plot 
# input data format
# column1 : name of the factor
# column2 : frequency of the factor

bubble_plot = function(input.data, title="Abundant Clonotypes"){
  colnames(input.data)[2] = "Freq"
  
  packing <- circleProgressiveLayout(input.data$Freq, sizetype='area')
  
  input.data <- cbind(input.data, packing)
  colnames(input.data)[1] ='clonotype'
  dat.gg <- circleLayoutVertices(packing, npoints=50)
  
  ggplot() + 
    # Make the bubbles
    geom_polygon(data = dat.gg, aes(x, y, group = id, 
                                    fill=as.factor(id)), 
                 colour = "black", alpha = 0.6) +
    theme_void() + 
    theme(legend.position="none",
          plot.title = element_text(hjust = 0.5, size=30)) +
    coord_equal() +
    # Add text in the center of each bubble + control its size
    geom_text(data = input.data, aes(x, y, size=Freq, label = clonotype)) +
    scale_size_continuous(range = c(0.5,4)) +
    ggtitle(title) +
    scale_fill_viridis_d(option = "D", begin = 0.0, end = 1.0, direction = 1) 
}

Sample A

bubble_plot(input.data = data[,c(1,2)], title = "Abundant Clonotypes in A")

Sample B

bubble_plot(input.data = data[,c(1,3)], title = "Abundant Clonotypes in B")




Gene Expression Profiling across Clonotypes and Samples


Heatmap (Cytotoxic Genes)

# Draw heatmap 
expression_matrix[rownames(cell_metadata),] %>% 
  pheatmap::pheatmap(show_rownames = F,
                     annotation_row = cell_metadata,
                     cluster_rows = T)



More example plots will be updated.