General Workflow Overview

This page outlines the general workflows I utilize for the analysis of sequencing data across various platforms. I generally follow these guidelines for most of my work and incorporate additional analyses based on the specific goals and discussions related to each project.


List

  1. scRNA-seq pipeline
  2. RNA-seq pipeline
  3. ChIP-seq/Cut&Run/ATAC-seq pipeline
  4. Machine Learning application
  5. Public DB

scRNA-seq pipeline

  • Fastq files QC: Fastqc, MultiQC
  • 10X Genomics datasets (CellRanger) : count matrix
  • Other formats: Drop-seq Tools, Salmon, STARsolo
  • Demultiplexing (UMI-tools)
  • Doublets-removal: scrublet, DoubletFinder
  • Filtering by custom parameter
  • Normalization and scaling : Seurat(SCTransform), log normalization
  • Cell cycle phase : Seurat
  • Integration and clustering : Seurat(CCA, Harmony), liger, Harmony
  • Finding markers: Seurat, custom ML algorithm (XGBoost, Random Forest), DESeq2 and edgeR
  • Visualization : Seurat functions, ggplot, plotly, shiny, heatmap, custom scripts
  • Inter-cluster differential expression : MAST, DESeq2
  • Imputation of gene expressions : MAGIC, scImpute
  • Trajectories : Monocle3, destiny, slingshot
  • pathway analysis: GSEA, scGSEA, clusterprofiler
  • Network analysis: WGCNA, SCENIC
  • Cell-Cell interaction : Cellchat
  • RNA velocity : Velocyto
  • scTCR/BCR-seq : VDJtools, immunarch, scRepertoire
  • Tumor cell CNV : InferCNV

RNA-seq pipeline

  • Fastq files QC: Fastqc reports
  • STAR,Salmon: countmatrix
  • Normalization and scaling (if necessary)
  • TPM, FPKM calculation
  • PCA plots
  • Correlation analysis of selected geneset
  • DEG analysis
  • GSEA
  • Pathway analysis: GSEA, clusterprofiler
  • K-means clustering of DEGs
  • Visualization

ChIP-seq/Cut&Run/ATAC-seq pipeline

  • Fastq files QC: Fastqc reports
  • Mapping: bowtie2
  • Peak calling : macs2, SICER2
  • bigwig file visualization : Deeptools, IGV, UCSC genome browser
  • Custom analysis with bed files : BEDtools, GenomicRanges, rtracklayer, ChIPseeker
  • Differentially Expressed Peaks : DESeq, edgeR
  • TSS enrichment : Deeptools (computeMatrix)
  • Motif discovery and analysis : HOMER, GREAT
  • Tools for evaluating differential enrichment

    https://hbctraining.github.io/Intro-to-ChIPseq/lessons/08_diffbind_differential_peaks.html

Machine Learning application

Identification of variable features

  • k-means clustering
  • Elastic Net

Sampling data

  • Bootstrap
  • DBSCAN(Density-Based Spatial Clustering of Applications with Noise) sampling
  • Random Forest based sampling

Public DB

  • TCGA
  • CCLE
  • GEO

Frequently used Report formats

  • Markdown(html)
  • Rmarkdown(html)
  • PPT slides(presentation)
  • Excel dashboard
  • Word format (for SOPs, method/guideline documents)
  • Google Docs
  • Github Pages(for open data/results)