ChIP-seq
Applicable to ATAC-seq, Cut&Run
Processing peak
files
Calling peaks by MACS2
Input file format : Bam file
By folder
nano macs2_callpeaks.sh
#!/bin/bash
# Common part
common_input="../*_H3K27.bam"
output_dir="macs2_out"
# sample peak calling
for sample in H3K27me3
do
macs2 callpeak \
-t ${sample}.merged.bam \
-c ${common_input} \
-f BAMPE \
-g mm \
-n ${sample} \
-B \
-q 0.05 \
--outdir ${output_dir}
done
chmod +x macs2_callpeaks.sh
./macs2_callpeaks.sh
By individual sample
nano run_peak_calling.sh
#!/bin/bash
# BAM file
input_bam=$1
# Control input
common_input="../input.bam" # My input
# Output directory
output_dir="macs2_out"
# sample name
sample_name=$(basename $input_bam .merged.bam)
# MACS2 run
macs2 callpeak \
-t $input_bam \
-c $common_input \
-f BAMPE \
-g mm \
-n $sample_name \
-B \
-q 0.05 \
--outdir $output_dir
chmod +x run_peak_calling.sh
./run_peak_calling.sh DMSO_H3K27me3.merged.bam
./run_peak_calling.sh Treat1_H3K27me3.merged.bam
./run_peak_calling.sh Treat2_H3K27me3.merged.bam
Broad Peak option
Broad Peak for Extended Regions:
Focuses on large, spread-out areas suited for markers such as H3K27me3
or H3K36me3.
#!/bin/bash
# BAM file
input_bam=$1
# Control input
common_input="../input.bam" # My input
# Output directory
output_dir="macs2_out"
# sample name
sample_name=$(basename $input_bam .merged.bam)
# MACS2 run for broad peaks
macs2 callpeak \
-t $input_bam \
-c $common_input \
-f BAMPE \
-g mm \
-n $sample_name \
-B \
--broad \
-q 0.05 \
--broad-cutoff 0.1 \
--outdir $output_dir
# Run
./run_broadpeak_calling.sh input.bam
–broad-cutoff 0.1 (FDR 10%)
Peak filtering
Narrowpeak column information
Filtering and Sorting
Filtering by signalValue if necessary.
Sorting bed:
First order by chromosome names using -k1,1 and then by start positions
numerically with -k2,2n.
This sorts entries by chromosome and then by position within each
chromosome.
Bed Coverage calculation from reference peak
The adv/disadv of processing bam files using a standardized peak set (union peaks) as a reference
Standardizes peak across different samples
Reduced Bias
Efficiency in the downstream analysis
Risk of overgeneralization (e.g. overlooking sample/condition-specific peaks like peak length)
Calculation
#!/bin/bash
# Assign BAM file
bamFile=$1
# Assign reference Peak File
unionPeakFile=$2
# Run bedtools coverage
bedtools coverage -a "$unionPeakFile" -b "$bamFile" > "${bamFile%.bam}.cov.bed"
echo "Coverage calculation for $bamFile completed."
./bamcoverage_single.sh path/to/your.bam path/to/unionpeak.bed
Read files in R
Coverage bed file example
## chrom start end num_reads coverageFraction
## <char> <num> <num> <num> <num>
## 1: chr1 100 200 25 1.00
## 2: chr1 500 600 30 0.95
## 3: chr2 200 300 5 0.20
## 4: chr2 800 900 20 1.00
## 5: chr3 150 250 15 0.85
Modify coverage bed files based on the experimental purpose
in the downstream analysis.
Peak
analysis
This part will be updated.