Tips for bedtools usage

1.交集,计算reads或peak分布

#gene.bed为标准的6列bed文件,其名字位于输出结果的第10列。bam文件所代表的bed的名字在输出结果的第4列
#-f 只对-a/-abam后的文件有效,所以intersectBed的两个输入文件位置不可对掉
#因为是RNA-Seq,所以使用-split参数
#-g group -c opcols -o ops
intersectBed -abam mapped.bam -b gene.bed -bed -wb -f 0.5 -split | cut -f 4,16 | sort -k16 | \
groupBy -i - -g 2 -c 1 -o collapse >gene.reads.groupBy
#如果intersectBed的两个文件都用 sort -k1,1 -k2,2n 排序过,可以加上-sorted 参数
intersectBed -abam mapped.bam -b gene.bed -bed -wb -f 0.5 -split -sorted | cut -f 4,16 | sort -k16 | \
groupBy -i - -g 2 -c 1 -o collapse >gene.reads.groupBy

2.覆盖度,区间内的read计数或区间的RPKM

#coverageBed的输出为在原有bed文件的每行后面增加四列,
#第一列是与-b后的文件每个区间重叠的-abam文件的特征的个数(reads数)。 6+1
#第二列是-b后的文件每个区间覆盖度不为0的碱基数目。                    6+2
#第三列是-b文件每个区间的长度                                         6+3
#第四列是第二列除以第三列                                             6+4
coverageBed -split -abam mapped.bam -b gene.bed | cut -f 4,7,9 | awk 'BEGIN{OFS="\t";FS="\t"}{print $1,10^9*$2/$3/total_reads)}' >file
coverageBed -counts -split -abam mapped.bam -b gene.bed | cut -f 4,7,9 >file
  1. Test the computational mode of intersectBed

When you use intersecBed only with -a and -b, it will output the overlapped regions for bins in the first bed file. But what would happen if the bins in the first bed file can match to multiple bins in the second file? intersectBed will compare each bin in the first bed to all bins in the second bed nd output overlapped regions for each pair if they have.

#Test the output and the compotational mode of intersectBed
#Given two bed files
$ cat test1.bed
chr1	0	1000	1_a
chr1	500	2000	1_b
chr2	3000	4000	1_c
$ cat test2.bed
chr1	0	800	2_a
chr1	500	1000	2_b
chr2	3800	4000	2_c
chr2	3500	4000	2_d
chr2	3000	4000	2_e
$ intersectBed -a test1.bed -b test2.bed
chr1	0	800	1_a #1_a is compared with 2_a and 2_b and get two overlapped regions
chr1	500	1000	1_a
chr1	500	800	1_b
chr1	500	1000	1_b
chr2	3800	4000	1_c #1_c is compared with 2_c, 2_d and 2_e.
chr2	3500	4000	1_c
chr2	3000	4000	1_c
CHENTONG
版权声明:本文为博主原创文章,转载请注明出处。
alipay.png WeChatPay.png

CHENTONG

CHENTONG
积微,月不胜日,时不胜月,岁不胜时。凡人好敖慢小事,大事至,然后兴之务之。如是,则常不胜夫敦比于小事者矣!何也?小事之至也数,其悬日也博,其为积也大。大事之至也希,其悬日也浅,其为积也小。故善日者王,善时者霸,补漏者危,大荒者亡!故,王者敬日,霸者敬时,仅存之国危而后戚之。亡国至亡而后知亡,至死而后知死,亡国之祸败,不可胜悔也。霸者之善著也,可以时托也。王者之功名,不可胜日志也。财物货宝以大为重,政教功名者反是,能积微者速成。诗曰:德如毛,民鲜能克举之。此之谓也。

生信宝典文章集锦

### 程序学习心得* [生物信息之程序学习](http://mp.weixin.qq.com/s?__biz=MzI5MTcwNjA4NQ==&mid=2247483927&idx=1&sn=23adf2b9d13400f2081f790e674e...… Continue reading

生信宝典Linux学习系列文章整理

Published on October 18, 2017

在线画图

Published on October 17, 2017