书不如思贵，意不可言传

The post collects my daily used commands or scripts in bioinformatics analysis especially focusing on next generation sequencing.

#####Bed to GTF or Gff

I usually use this online tool, bedToGtf.

Still, there are several ways to do this which I have not tested.

java -Xmx1000m -jar <path_to_scripture>/scripture_alpha2.jar -task toGFF -cufflinks -in your_file.bed -source SCRIPTURE -out your_file.gtf

bedToGenePred input.bed stdout | genePredToGtf file stdin output.gtf

awk '{print $1"\t"$7"\t"$8"\t"($2+1)"\t"$3"\t"$5"\t"$6"\t"$9"\t"(substr($0, index($0,$10)))}' foo.bed > bed.gtf # one easy command to generate gtf

# (< a file) simply means read in the file
$ gtf2bed < foo.gtf | sort-bed - > foo.bed

 $ perl gtf2bed.pl input.gtf > output.bed

If you want to get the 5’UTR, CDS, 3’UTR, start-codon and stop codon separately, I recommend use my own script parseGTF.py and sortGTF.py. Please see detailed usage for this script in parseGTF.

sortGTF.py gtf-file >sorted.gtf-file
parseGTF.py sorted.gtf-file chrom-sizes-file >gtf.bed

gtfToGenePred mm9.gtf stdout | genePredToBed stdin mm9.bed12

gffread -w output.fa -g gename_assembl.fa refgene.gtf

seqExtract.3.py -i genome.fa -b "bed1,bed2..."

Records for useful commands in bioinformatics analysis