Records for useful commands in bioinformatics analysis

The post collects my daily used commands or scripts in bioinformatics analysis especially focusing on next generation sequencing.

File format transfer

#####Bed to GTF or Gff

I usually use this online tool, bedToGtf.

Still, there are several ways to do this which I have not tested.

java -Xmx1000m -jar <path_to_scripture>/scripture_alpha2.jar -task toGFF -cufflinks -in your_file.bed -source SCRIPTURE -out your_file.gtf
  • bedToGenePred
bedToGenePred input.bed stdout | genePredToGtf file stdin output.gtf
  • one-line script
awk '{print $1"\t"$7"\t"$8"\t"($2+1)"\t"$3"\t"$5"\t"$6"\t"$9"\t"(substr($0, index($0,$10)))}' foo.bed > bed.gtf # one easy command to generate gtf
Gtf to bed
# (< a file) simply means read in the file
$ gtf2bed < foo.gtf | sort-bed - > foo.bed 
 $ perl gtf2bed.pl input.gtf > output.bed 
  • If you want to get the 5’UTR, CDS, 3’UTR, start-codon and stop codon separately, I recommend use my own script parseGTF.py and sortGTF.py. Please see detailed usage for this script in parseGTF.
sortGTF.py gtf-file >sorted.gtf-file
parseGTF.py sorted.gtf-file chrom-sizes-file >gtf.bed
gtfToGenePred mm9.gtf stdout | genePredToBed stdin mm9.bed12

Sequence extraction

  • Extract FATSA sequence for gtf file
gffread -w output.fa -g gename_assembl.fa refgene.gtf
seqExtract.3.py -i genome.fa -b "bed1,bed2..."
CHENTONG
版权声明:本文为博主原创文章,转载请注明出处。
alipay.png WeChatPay.png

CHENTONG

CHENTONG
积微,月不胜日,时不胜月,岁不胜时。凡人好敖慢小事,大事至,然后兴之务之。如是,则常不胜夫敦比于小事者矣!何也?小事之至也数,其悬日也博,其为积也大。大事之至也希,其悬日也浅,其为积也小。故善日者王,善时者霸,补漏者危,大荒者亡!故,王者敬日,霸者敬时,仅存之国危而后戚之。亡国至亡而后知亡,至死而后知死,亡国之祸败,不可胜悔也。霸者之善著也,可以时托也。王者之功名,不可胜日志也。财物货宝以大为重,政教功名者反是,能积微者速成。诗曰:德如毛,民鲜能克举之。此之谓也。

生信宝典文章集锦

生信的作用越来越大,想学的人越来越多,不管是为了以后发展,还是为了解决眼下的问题。但生信学习不是一朝一夕就可以完成的事情,也许你可以很短时间学会一个交互式软件的操作,却不能看完程序教学视频后就直接写程序。也许你可以跟着一个测序分析流程完成操作,但不懂得背后的原理,不知道什么...… Continue reading

生信宝典文章集锦

Published on January 01, 2100

生信宝典文章集锦

Published on January 01, 2100