UCSC usages

Here lists how to use UCSC tablebrowser to get various gene annotation information.

Get gene annotation file in GTF or bed format

Open UCSC table browser and fill the following information and hit get output.

Gene annotation file in GTF format Gene annotation file in Bed format Sub-gene annotation file in Bed format

Get chromosome size file or genome size file
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A -e \
	"select chrom, size from hg19.chromInfo" > hg19.chromsome.size
fetchChromSizes mm9 > mm9.chrom.sizes
Get the type of each transcripts
  • For Refseq genes
    • Change table from refGene to refSeqStatus in above picture.
    • Click describe table schema to see the description of each table. Here lists the information for reference.


'Unknown', 'Reviewed', 'Validated', 'Provisional', 'Predicted', 'Inferred'

–Molecular type

'DNA', 'RNA', 'ds-RNA', 'ds-mRNA', 'ds-rRNA', 'mRNA', 'ms-DNA', 'ms-RNA', 'rRNA', 'scRNA',
'snRNA', 'snoRNA', 'ss-DNA', 'ss-RNA', 'ss-snoRNA', 'tRNA', 'cRNA', 'ss-cRNA', 'ds-cRNA', 'ms-rRNA'
  • For Ensemble genes
    • From track ensemblGenes and table ensemblSource.
    • Click describe table schema to see the description of each table. Here lists the information for reference.

name source

ENSMUST00000160944 transcribed_unprocessed_pseudogene

ENSMUST00000082908 snRNA

ENSMUST00000162897 processed_transcript

ENSMUST00000159265 processed_transcript

ENSMUST00000070533 protein_coding

ENSMUST00000161581 antisense

ENSMUST00000157765 snRNA

Get pseudogene annotation in GTF format

Open UCSC table browser and chose alltables for group and pseudoYale60 for table.

Get rRNA annotation in GTF format
  • Select “All Tables” from the group drop-down list
  • Select the “rmsk” table from the table drop-down list
  • Choose “GTF” as the output format
  • Type a filename in “output file” so your browser downloads the result
  • Click “create” next to filter
  • Next to “repClass,” type rRNA
  • Next to free-form query, select “OR” and type repClass = “tRNA”
  • Click submit on that page, then get output on the main page
Get repeat elements and their class and family information
  • UCSC table browser — mammal – mouse — mm9 - variation and repeats – repeatmarsker — rmsk – selected fields from primary and related tables

repeat masker file in GTF format

Get and use Phastcon conservation value
  • Get PhastCons scores from UCSC:

There are various phastCons score files, depending on the genomes in the pileup. E.g. http://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/. You can get them by ftp:

wget ftp://hgdownload.cse.ucsc.edu/goldenPath/mm9/phastCons30way/vertebrate/chr*

2) Convert wig to bigWig:

Each chromosome file of scores (wig files) can be converted into bigWig files. 64-bit binaries come from: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/

The fetchChromSizes script is required to get the chromosome length for the wigToBigWig script

fetchChromSizes mm9 > mm9.chrom.sizes

Run wigToBigWig on each chromosome file, e.g:

wigToBigWig -clip chr1.data mm9.chrom.sizes chr1.bw

3) Calculate mean phastCons score for the coordinates of interest:

Another script called bigWigSummary can be obtained from UCSC, using the link above. It takes in a set of genome coordinate and will spit out the max or mean of values contained in the bigWig file, in our case phastCons scores.

alipay.png WeChatPay.png




### 程序学习心得* [生物信息之程序学习](http://mp.weixin.qq.com/s?__biz=MzI5MTcwNjA4NQ==&mid=2247483927&idx=1&sn=23adf2b9d13400f2081f790e674e...… Continue reading

R统计绘图 - 柱状图

Published on August 12, 2017

R 学习 - 维恩图

Published on August 01, 2017