ChIP-seq

Applicable to ATAC-seq, Cut&Run

Back to Main Page


Processing peak files

Calling peaks by MACS2

Input file format : Bam file

By folder

nano macs2_callpeaks.sh

#!/bin/bash

# Common part 
common_input="../*_H3K27.bam"
output_dir="macs2_out"

# sample peak calling 
for sample in H3K27me3
do
    macs2 callpeak \
        -t ${sample}.merged.bam \
        -c ${common_input} \
        -f BAMPE \
        -g mm \
        -n ${sample} \
        -B \
        -q 0.05 \
        --outdir ${output_dir}
done


chmod +x macs2_callpeaks.sh
./macs2_callpeaks.sh

By individual sample

nano run_peak_calling.sh

#!/bin/bash

# BAM file
input_bam=$1

# Control input  
common_input="../input.bam" # My input 

# Output directory 
output_dir="macs2_out"

# sample name 
sample_name=$(basename $input_bam .merged.bam)

# MACS2 run
macs2 callpeak \
    -t $input_bam \
    -c $common_input \
    -f BAMPE \
    -g mm \
    -n $sample_name \
    -B \
    -q 0.05 \
    --outdir $output_dir
chmod +x run_peak_calling.sh

./run_peak_calling.sh DMSO_H3K27me3.merged.bam
./run_peak_calling.sh Treat1_H3K27me3.merged.bam
./run_peak_calling.sh Treat2_H3K27me3.merged.bam


Broad Peak option

Broad Peak for Extended Regions:
Focuses on large, spread-out areas suited for markers such as H3K27me3 or H3K36me3.

#!/bin/bash

# BAM file
input_bam=$1

# Control input  
common_input="../input.bam" # My input 

# Output directory 
output_dir="macs2_out"

# sample name 
sample_name=$(basename $input_bam .merged.bam)

# MACS2 run for broad peaks
macs2 callpeak \
    -t $input_bam \
    -c $common_input \
    -f BAMPE \
    -g mm \
    -n $sample_name \
    -B \
    --broad \
    -q 0.05 \
    --broad-cutoff 0.1 \
    --outdir $output_dir


# Run
./run_broadpeak_calling.sh input.bam

–broad-cutoff 0.1 (FDR 10%)

Peak filtering

Narrowpeak column information

Filtering and Sorting

Filtering by signalValue if necessary.

Sorting bed:
First order by chromosome names using -k1,1 and then by start positions numerically with -k2,2n.
This sorts entries by chromosome and then by position within each chromosome.

Bed Coverage calculation from reference peak

The adv/disadv of processing bam files using a standardized peak set (union peaks) as a reference

  • Standardizes peak across different samples

  • Reduced Bias

  • Efficiency in the downstream analysis

  • Risk of overgeneralization (e.g. overlooking sample/condition-specific peaks like peak length)

Calculation

#!/bin/bash

# Assign BAM file
bamFile=$1

# Assign reference Peak File
unionPeakFile=$2

# Run bedtools coverage
bedtools coverage -a "$unionPeakFile" -b "$bamFile" > "${bamFile%.bam}.cov.bed"

echo "Coverage calculation for $bamFile completed."


./bamcoverage_single.sh path/to/your.bam path/to/unionpeak.bed

Read files in R

library(data.table)

# Getting files with .cov.bed 
cov_files <- list.files(pattern = "\\.cov\\.bed$")

# Import and save 
cov_data_list <- lapply(cov_files, function(file) {
  fread(file)
})

# assign names 
names(cov_data_list) <- cov_files

Coverage bed file example

##     chrom start   end num_reads coverageFraction
##    <char> <num> <num>     <num>            <num>
## 1:   chr1   100   200        25             1.00
## 2:   chr1   500   600        30             0.95
## 3:   chr2   200   300         5             0.20
## 4:   chr2   800   900        20             1.00
## 5:   chr3   150   250        15             0.85



Modify coverage bed files based on the experimental purpose in the downstream analysis.



Peak analysis


This part will be updated.