Genomic data processing
Applicable to ChIP-seq, ATAC-seq, Cut&Run
- Peak Calling
- Normalization
- Peak analysis in R
Peak Calling by MACS2
Peak Calling of individual bam file
Input files:
- sample.bam
- control.bam
Script
nano run_peak_calling.sh
#!/bin/bash
# Check if required input BAM files are provided
if [ -z "$1" ] || [ -z "$2" ]; then
echo "Usage: $0 <input_bam> <control_bam>"
exit 1
fi
# BAM file (sample)
input_bam=$1
# Control input BAM file (background)
common_input=$2
# Output directory
output_dir="macs2_out"
# Ensure the output directory exists
mkdir -p "$output_dir"
# Extract sample name from the input BAM file
sample_name=$(basename "$input_bam")
# Run MACS2 peak calling
macs2 callpeak \
-t "$input_bam" \
-c "$common_input" \
-f BAMPE \
-g mm \
-n "$sample_name" \
-B \
-q 0.05 \
--outdir "$output_dir"
# Check exit status
if [ $? -ne 0 ]; then
echo "Error: MACS2 peak calling failed."
exit 1
fi
echo "MACS2 peak calling completed successfully. Results are in
Run (Example)
chmod +x run_peak_calling.sh
bash run_peak_calling.sh /path/to/sample.bam /path/to/control.bam
./run_peak_calling.sh DMSO_H3K27me3.merged.bam /path/to/control.bam
./run_peak_calling.sh Treat1_H3K27me3.merged.bam /path/to/control.bam
./run_peak_calling.sh Treat2_H3K27me3.merged.bam /path/to/control.bam
Peak Calling putting a folder as input
Script
nano macs2_callpeaks.sh
#!/bin/bash
# Check if the input folder is provided
if [ -z "$1" ]; then
echo "Usage: $0 <input_folder>"
exit 1
fi
# Input folder containing BAM files
input_folder=$1
# Control input BAM files (glob pattern inside the folder)
common_input="${input_folder}/*_H3K27.bam"
# Output directory
output_dir="macs2_out"
# Ensure the output directory exists
mkdir -p "$output_dir"
# Sample peak calling
for sample_bam in "${input_folder}"/*.merged.bam; do
sample=$(basename "$sample_bam" .merged.bam)
macs2 callpeak \
-t "$sample_bam" \
-c "$common_input" \
-f BAMPE \
-g mm \
-n "$sample" \
-B \
-q 0.05 \
--outdir "$output_dir"
done
echo "MACS2 peak calling completed. Results are in '$output_dir'."
Broad Peak option
Broad Peak for Extended Regions:
Focuses on large, spread-out areas suited for markers such as H3K27me3
or H3K36me3.
Script
–broad-cutoff 0.1 (FDR 10%)
#!/bin/bash
# BAM file
input_bam=$1
# Control input
common_input="../input.bam" # My input
# Output directory
output_dir="macs2_out"
# sample name
sample_name=$(basename $input_bam .merged.bam)
# MACS2 run for broad peaks
macs2 callpeak \
-t $input_bam \
-c $common_input \
-f BAMPE \
-g mm \
-n $sample_name \
-B \
--broad \
-q 0.05 \
--broad-cutoff 0.1 \
--outdir $output_dir
Peak filtering
Narrowpeak column information
- Check signalValue column
Filtering and Sorting
- Filtering by signalValue if necessary.
- Sorting bed:
- First order by chromosome names using -k1,1 and then by start
positions numerically with -k2,2n.
- This sorts entries by chromosome and then by position within each
chromosome.
- First order by chromosome names using -k1,1 and then by start
positions numerically with -k2,2n.
Normalization of peaks
Bed Coverage calculation from reference peak (unionpeak) information
The adv/disadv of processing bam files using a standardized peak set (union peaks) as a reference
- Standardizes peak across different samples
- Reduced Bias
- Efficiency in the downstream analysis
- Risk of overgeneralization (e.g. overlooking sample/condition-specific peaks like peak length)
bedtools coverage
#!/bin/bash
# Assign BAM file
bamFile=$1
# Assign reference Peak File
unionPeakFile=$2
# Run bedtools coverage
bedtools coverage -a "$unionPeakFile" -b "$bamFile" > "${bamFile%.bam}.cov.bed"
echo "Coverage calculation for $bamFile completed."
./bamcoverage_single.sh path/to/your.bam path/to/unionpeak.bed
Peak analysis in R
Read files in R
Coverage bed file example
## chrom start end num_reads coverageFraction
## <char> <num> <num> <num> <num>
## 1: chr1 100 200 25 1.00
## 2: chr1 500 600 30 0.95
## 3: chr2 200 300 5 0.20
## 4: chr2 800 900 20 1.00
## 5: chr3 150 250 15 0.85
Modify coverage bed files based on the purpose in the
downstream analysis.