Tutorial for smRNA-Seq analysis
Here summarizes the workflow for quantifying the expression of miRNAs using smRNA-Seq data.
File format for smRNA-Seq
Due to the highly redundant of sequenced reads in smRNA-Seq, their data usually saved in a read-tab-count
like file as indicated below. Data set used can be downloaded from here.
Transfer read-tab-count
to FASTA file for supplying to quantifier.pl
in mirDeep2
Since quantifier.pl
requires FASTA file in specific formats, one can use collapsemiRNAreads.py at my Github.
FASTA format:
Clip adaptor
Once got the reads file, please check if the adaptor is removed. Normally if all reads have same length, we can assume no adaptor-removing is performed. Even unfortunately, researchers normally do not provide adaptor sequences. One can get adaptor sequences by performing multiple sequence aligenment (MSA) of first tens to hundards reads and selectling the end common sequence as adaptor. T-coffee is a great on-line tool to perform MSA.
Once getting the adaptor sequence, one can use fastx_clipper from FASTX tools.
Here I used Mega
result as an example. The left colorfull letters show the MSA result of 25 reads and black-boxed 11 letters were selected as adaptor sequence. The length of adaptor sequence normally should be larger than 10. The right txt shows the adaptor-clipping result connected by thin-lines.
Quantify miRNA expression
Use the following command to quantify miRNA expression and check your result in file miRNAs_expressed_all_samples_hela.csv
.
Trim reads and quantify miRNA expression again
The 3’ ends of canonical miRNAs are often subject to untemplated additions, especially the 39 ends of mirtron-3p species. Then sometimes we want to trim 3’ end reads one by one and perform mapping process for each trim.
Here I constrcuted a flow to simply the process. All one need is the main program quantifier.sh
, and depeneded three programs trimFasta.py
, quantifier.modified.pl
a modified version of quantifier.pl
.
The principle is like described below. First, map all reads to miRNA precursor using; Second, save mapped reads; Third, extract unmapped reads and trim the 3’ last nucleotide; Forth, map trimmed reads again and save those with no more than 20 mapping loci (this number can be changes as wanted) into mapped reads; Fifth, repeat trimming and mapping process until all reads are shorted than a given length or the cycle-index larger than given number; Sixth, map all saved mapped reads.