UCSC usages

Here lists how to use UCSC tablebrowser to get various gene annotation information.

Get gene annotation file in GTF or bed format

Open UCSC table browser and fill the following information and hit get output.

Gene annotation file in GTF format Gene annotation file in Bed format Sub-gene annotation file in Bed format

Get chromosome size file or genome size file
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
	"select chrom, size from hg19.chromInfo" > hg19.chromsome.size
#or
fetchChromSizes mm9 > mm9.chrom.sizes
Get the type of each transcripts
  • For Refseq genes
    • Change table from refGene to refSeqStatus in above picture.
    • Click describe table schema to see the description of each table. Here lists the information for reference.

–Status

'Unknown', 'Reviewed', 'Validated', 'Provisional', 'Predicted', 'Inferred'

–Molecular type

'DNA', 'RNA', 'ds-RNA', 'ds-mRNA', 'ds-rRNA', 'mRNA', 'ms-DNA', 'ms-RNA', 'rRNA', 'scRNA',
'snRNA', 'snoRNA', 'ss-DNA', 'ss-RNA', 'ss-snoRNA', 'tRNA', 'cRNA', 'ss-cRNA', 'ds-cRNA', 'ms-rRNA'
  • For Ensemble genes
    • From track ensemblGenes and table ensemblSource.
    • Click describe table schema to see the description of each table. Here lists the information for reference.

name source

ENSMUST00000160944 transcribed_unprocessed_pseudogene

ENSMUST00000082908 snRNA

ENSMUST00000162897 processed_transcript

ENSMUST00000159265 processed_transcript

ENSMUST00000070533 protein_coding

ENSMUST00000161581 antisense

ENSMUST00000157765 snRNA

Get pseudogene annotation in GTF format

Open UCSC table browser and chose alltables for group and pseudoYale60 for table.

Get rRNA annotation in GTF format
  • Select “All Tables” from the group drop-down list
  • Select the “rmsk” table from the table drop-down list
  • Choose “GTF” as the output format
  • Type a filename in “output file” so your browser downloads the result
  • Click “create” next to filter
  • Next to “repClass,” type rRNA
  • Next to free-form query, select “OR” and type repClass = “tRNA”
  • Click submit on that page, then get output on the main page
Get repeat elements and their class and family information
  • UCSC table browser — mammal – mouse — mm9 - variation and repeats – repeatmarsker — rmsk – selected fields from primary and related tables

repeat masker file in GTF format

Get and use Phastcon conservation value
  • Get PhastCons scores from UCSC:

There are various phastCons score files, depending on the genomes in the pileup. E.g. http://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/. You can get them by ftp:

wget ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/vertebrate/chr*

2) Convert wig to bigWig:

Each chromosome file of scores (wig files) can be converted into bigWig files. 64-bit binaries come from: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

The fetchChromSizes script is required to get the chromosome length for the wigToBigWig script

fetchChromSizes mm9 > mm9.chrom.sizes

Run wigToBigWig on each chromosome file, e.g:

wigToBigWig -clip chr1.data mm9.chrom.sizes chr1.bw

3) Calculate mean phastCons score for the coordinates of interest:

Another script called bigWigSummary can be obtained from UCSC, using the link above. It takes in a set of genome coordinate and will spit out the max or mean of values contained in the bigWig file, in our case phastCons scores.

CHENTONG
版权声明:本文为博主原创文章,转载请注明出处。
alipay.png WeChatPay.png

CHENTONG

CHENTONG
积微,月不胜日,时不胜月,岁不胜时。凡人好敖慢小事,大事至,然后兴之务之。如是,则常不胜夫敦比于小事者矣!何也?小事之至也数,其悬日也博,其为积也大。大事之至也希,其悬日也浅,其为积也小。故善日者王,善时者霸,补漏者危,大荒者亡!故,王者敬日,霸者敬时,仅存之国危而后戚之。亡国至亡而后知亡,至死而后知死,亡国之祸败,不可胜悔也。霸者之善著也,可以时托也。王者之功名,不可胜日志也。财物货宝以大为重,政教功名者反是,能积微者速成。诗曰:德如毛,民鲜能克举之。此之谓也。

生信宝典文章集锦

### 程序学习心得* [生物信息之程序学习](http://mp.weixin.qq.com/s?__biz=MzI5MTcwNjA4NQ==&mid=2247483927&idx=1&sn=23adf2b9d13400f2081f790e674e...… Continue reading

生信宝典Linux学习系列文章整理

Published on October 18, 2017

在线画图

Published on October 17, 2017