UCSC usages

Here lists how to use UCSC tablebrowser to get various gene annotation information.

Get gene annotation file in GTF or bed format

Open UCSC table browser and fill the following information and hit get output.

Gene annotation file in GTF format Gene annotation file in Bed format Sub-gene annotation file in Bed format

Get chromosome size file or genome size file
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
	"select chrom, size from hg19.chromInfo" > hg19.chromsome.size
#or
fetchChromSizes mm9 > mm9.chrom.sizes
Get the type of each transcripts
  • For Refseq genes
    • Change table from refGene to refSeqStatus in above picture.
    • Click describe table schema to see the description of each table. Here lists the information for reference.

–Status

'Unknown', 'Reviewed', 'Validated', 'Provisional', 'Predicted', 'Inferred'

–Molecular type

'DNA', 'RNA', 'ds-RNA', 'ds-mRNA', 'ds-rRNA', 'mRNA', 'ms-DNA', 'ms-RNA', 'rRNA', 'scRNA',
'snRNA', 'snoRNA', 'ss-DNA', 'ss-RNA', 'ss-snoRNA', 'tRNA', 'cRNA', 'ss-cRNA', 'ds-cRNA', 'ms-rRNA'
  • For Ensemble genes
    • From track ensemblGenes and table ensemblSource.
    • Click describe table schema to see the description of each table. Here lists the information for reference.

name source

ENSMUST00000160944 transcribed_unprocessed_pseudogene

ENSMUST00000082908 snRNA

ENSMUST00000162897 processed_transcript

ENSMUST00000159265 processed_transcript

ENSMUST00000070533 protein_coding

ENSMUST00000161581 antisense

ENSMUST00000157765 snRNA

Get pseudogene annotation in GTF format

Open UCSC table browser and chose alltables for group and pseudoYale60 for table.

Get rRNA annotation in GTF format
  • Select “All Tables” from the group drop-down list
  • Select the “rmsk” table from the table drop-down list
  • Choose “GTF” as the output format
  • Type a filename in “output file” so your browser downloads the result
  • Click “create” next to filter
  • Next to “repClass,” type rRNA
  • Next to free-form query, select “OR” and type repClass = “tRNA”
  • Click submit on that page, then get output on the main page
Get repeat elements and their class and family information
  • UCSC table browser — mammal – mouse — mm9 - variation and repeats – repeatmarsker — rmsk – selected fields from primary and related tables

repeat masker file in GTF format

Get and use Phastcon conservation value
  • Get PhastCons scores from UCSC:

There are various phastCons score files, depending on the genomes in the pileup. E.g. http://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/. You can get them by ftp:

wget ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/vertebrate/chr*

2) Convert wig to bigWig:

Each chromosome file of scores (wig files) can be converted into bigWig files. 64-bit binaries come from: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

The fetchChromSizes script is required to get the chromosome length for the wigToBigWig script

fetchChromSizes mm9 > mm9.chrom.sizes

Run wigToBigWig on each chromosome file, e.g:

wigToBigWig -clip chr1.data mm9.chrom.sizes chr1.bw

3) Calculate mean phastCons score for the coordinates of interest:

Another script called bigWigSummary can be obtained from UCSC, using the link above. It takes in a set of genome coordinate and will spit out the max or mean of values contained in the bigWig file, in our case phastCons scores.

CHENTONG
版权声明:本文为博主原创文章,转载请注明出处。
alipay.png

CHENTONG

CHENTONG
积微,月不胜日,时不胜月,岁不胜时。凡人好敖慢小事,大事至,然后兴之务之。如是,则常不胜夫敦比于小事者矣!何也?小事之至也数,其悬日也博,其为积也大。大事之至也希,其悬日也浅,其为积也小。故善日者王,善时者霸,补漏者危,大荒者亡!故,王者敬日,霸者敬时,仅存之国危而后戚之。亡国至亡而后知亡,至死而后知死,亡国之祸败,不可胜悔也。霸者之善著也,可以时托也。王者之功名,不可胜日志也。财物货宝以大为重,政教功名者反是,能积微者速成。诗曰:德如毛,民鲜能克举之。此之谓也。

R 学习

R语言是比较常用的统计分析和绘图语言,拥有强大的统计库、绘图库和生信分析的Bioconductor库,是学习生物信息分析的必备语言之一。Rstudio是编辑、运行R语言的最为理想的工具之一,支持纯R脚本、Rmarkdown (脚本文档混排)、Bookdown (脚本文档混排...… Continue reading

本地使用Rfam 12.0+

Published on June 16, 2017

Linux学习(一)- 文件和目录

Published on June 08, 2017