General Workflow Overview
This page outlines the general workflows I utilize for the analysis of sequencing data across various platforms. I generally follow these guidelines for most of my work and incorporate additional analyses based on the specific goals and discussions related to each project.
List
- scRNA-seq pipeline
- RNA-seq pipeline
- ChIP-seq/Cut&Run/ATAC-seq pipeline
- Machine Learning application
- Public DB
scRNA-seq pipeline
- Fastq files QC: Fastqc, MultiQC
- 10X Genomics datasets (CellRanger) : count matrix
- Other formats: Drop-seq Tools, Salmon, STARsolo
- Demultiplexing (UMI-tools)
- Doublets-removal: scrublet, DoubletFinder
- Filtering by custom parameter
- Normalization and scaling : Seurat(SCTransform), log
normalization
- Cell cycle phase : Seurat
- Integration and clustering : Seurat(CCA, Harmony), liger,
Harmony
- Finding markers: Seurat, custom ML algorithm (XGBoost, Random
Forest), DESeq2 and edgeR
- Visualization : Seurat functions, ggplot, plotly, shiny, heatmap,
custom scripts
- Inter-cluster differential expression : MAST, DESeq2
- Imputation of gene expressions : MAGIC, scImpute
- Trajectories : Monocle3, destiny, slingshot
- pathway analysis: GSEA, scGSEA, clusterprofiler
- Network analysis: WGCNA, SCENIC
- Cell-Cell interaction : Cellchat
- RNA velocity : Velocyto
- scTCR/BCR-seq : VDJtools, immunarch, scRepertoire
- Tumor cell CNV : InferCNV
RNA-seq pipeline
- Fastq files QC: Fastqc reports
- STAR,Salmon: countmatrix
- Normalization and scaling (if necessary)
- TPM, FPKM calculation
- PCA plots
- Correlation analysis of selected geneset
- DEG analysis
- GSEA
- Pathway analysis: GSEA, clusterprofiler
- K-means clustering of DEGs
- Visualization
ChIP-seq/Cut&Run/ATAC-seq pipeline
- Fastq files QC: Fastqc reports
- Mapping: bowtie2
- Peak calling : macs2, SICER2
- bigwig file visualization : Deeptools, IGV, UCSC genome
browser
- Custom analysis with bed files : BEDtools, GenomicRanges,
rtracklayer, ChIPseeker
- Differentially Expressed Peaks : DESeq, edgeR
- TSS enrichment : Deeptools (computeMatrix)
- Motif discovery and analysis : HOMER, GREAT
- Tools for evaluating differential enrichment
https://hbctraining.github.io/Intro-to-ChIPseq/lessons/08_diffbind_differential_peaks.html
Machine Learning application
Identification of variable features
- k-means clustering
- Elastic Net
Sampling data
- Bootstrap
- DBSCAN(Density-Based Spatial Clustering of Applications with Noise) sampling
- Random Forest based sampling
Public DB
- TCGA
- CCLE
- GEO
Frequently used Report formats
- Markdown(html)
- Rmarkdown(html)
- PPT slides(presentation)
- Excel dashboard
- Word format (for SOPs, method/guideline documents)
- Google Docs
- Github Pages(for open data/results)
Useful links
Single-cell RNA-seq:
Integration:
Seurat
Integration:
Harmony
Integration:
Liger InferCNV
RNA-seq:
RNA-Seq
Analysis in R using Rsubread
ChIP-seq:
Differential
Peak calling using DiffBind