1.交集,计算reads或peak分布

#gene.bed为标准的6列bed文件,其名字位于输出结果的第10列。bam文件所代表的bed的名字在输出结果的第4列
#-f 只对-a/-abam后的文件有效,所以intersectBed的两个输入文件位置不可对掉
#因为是RNA-Seq,所以使用-split参数
#-g group -c opcols -o ops
intersectBed -abam mapped.bam -b gene.bed -bed -wb -f 0.5 -split | cut -f 4,16 | sort -k16 | \
groupBy -i - -g 2 -c 1 -o collapse >gene.reads.groupBy
#如果intersectBed的两个文件都用 sort -k1,1 -k2,2n 排序过,可以加上-sorted 参数
intersectBed -abam mapped.bam -b gene.bed -bed -wb -f 0.5 -split -sorted | cut -f 4,16 | sort -k16 | \
groupBy -i - -g 2 -c 1 -o collapse >gene.reads.groupBy

2.覆盖度,区间内的read计数或区间的RPKM

#coverageBed的输出为在原有bed文件的每行后面增加四列,
#第一列是与-b后的文件每个区间重叠的-abam文件的特征的个数(reads数)。 6+1
#第二列是-b后的文件每个区间覆盖度不为0的碱基数目。                    6+2
#第三列是-b文件每个区间的长度                                         6+3
#第四列是第二列除以第三列                                             6+4
coverageBed -split -abam mapped.bam -b gene.bed | cut -f 4,7,9 | awk 'BEGIN{OFS="\t";FS="\t"}{print $1,10^9*$2/$3/total_reads)}' >file
coverageBed -counts -split -abam mapped.bam -b gene.bed | cut -f 4,7,9 >file
  1. Test the computational mode of intersectBed

When you use intersecBed only with -a and -b, it will output the overlapped regions for bins in the first bed file. But what would happen if the bins in the first bed file can match to multiple bins in the second file? intersectBed will compare each bin in the first bed to all bins in the second bed nd output overlapped regions for each pair if they have.

#Test the output and the compotational mode of intersectBed
#Given two bed files
$ cat test1.bed
chr1	0	1000	1_a
chr1	500	2000	1_b
chr2	3000	4000	1_c
$ cat test2.bed
chr1	0	800	2_a
chr1	500	1000	2_b
chr2	3800	4000	2_c
chr2	3500	4000	2_d
chr2	3000	4000	2_e
$ intersectBed -a test1.bed -b test2.bed
chr1	0	800	1_a #1_a is compared with 2_a and 2_b and get two overlapped regions
chr1	500	1000	1_a
chr1	500	800	1_b
chr1	500	1000	1_b
chr2	3800	4000	1_c #1_c is compared with 2_c, 2_d and 2_e.
chr2	3500	4000	1_c
chr2	3000	4000	1_c