How to tell which library type to use (fr-firststrand or fr-secondstrand)

First of all, as a bioinformatian, you should ask the data producer (e.g. the one who prepared the RNAseq library) which protocol they used to generate the data.Tophat manual page has listed the general strand-specific protocol:</p>

Library Type	Examples	Description
fr-unstranded	`Standard Illumina`	Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
fr-firststrand	`dUTP, NSR, NNSR`	Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
fr-secondstrand	`Ligation, Standard SOLiD`	Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

In case you don’t know the library-type, you can still figure it out by yourself. Tophat FAQ page provided a solution for that (http://tophat.cbcb.umd.edu/faq.html#library_type). But more simply (comparing to running 1M reads first), you can choose few reads and BLAT to genome and infer the library-type from the mapping result.

Generally, reads from the left-most end of RNA fragment (always from 5´ to 3´) are always mapped to transcript-strand, and (for pair-end sequencing) reads from the right-most end are always mapped to the opposite strand. See the arrows direction in the below schema. This is because the sequencer always read from 5´ to 3´.

Summary of library type protocols (for Tophat/Bowtie)

But regarding to which strand the RNA fragment is synthesized from, this involves different strand-specific protocols. Thanks to the illustration figure (see below) from Zhao Zhang, we could see that for example dUTP method is to only sequence the strand from the first strand synthesis (the original RNA strand is degradated due to the dUTP incorporated), so the /2 read is from the original RNA strand.

Strand-specific library protocols (Credit: Zhao Zhang)

Taking a real example, first getting some reads (in fasta format) from the paired-end sequencing fastq file using command like:

$ zcat ~/nearline/rnaseq/BU/Jul2012/Sample_3576_H_01.R1.fastq.gz | sed ‘s/@//g;s/ /_/g’ | awk ‘{if(NR%4==1)print “>”$0;if(NR%4==2) print $0;}’ | head

$ zcat ~/nearline/rnaseq/BU/Jul2012/Sample_3576_H_01.R2.fastq.gz | sed ‘s/@//g;s/ /_/g’ | awk ‘{if(NR%4==1)print “>”$0;if(NR%4==2) print $0;}’ | head

Blatting them in UCSC Genome Browser

Below is screenshot for top hits of one pair of reads. They mapped to exons of OS9 genes (the left one is /1 and right one is /2, with opposite direction). We see that /1 mapped to transcript direction, /2 mapped to opposite direction, which means it can only be fr-secondstrand or fr-unstrand (cannot be fr-firststrand).</p>

Continuing to look at other reads in the file, we can find examples like these:

where /2 mapped to transcript strand and /1 mapped to the opposite strand. Combining with the observation from above, we can conclude that this is a fr-unstrand library.

培训

招聘

Tips

Lists

Cat.

Tags

About

Home

How to tell which library type to use (fr-firststrand or fr-secondstrand)

October 29, 2012

How to tell which library type to use (fr-firststrand or fr-secondstrand)

CHENTONG

生信宝典文章集锦

生信宝典文章集锦

生信宝典文章集锦