Notes for "computing for data analysis" [Coursera course]

  1. colClass.

Specifying this option instead of using default can make ‘read.table’ run MUCH faster, often twice as faster. In order to use this option, you have to know the class of each column in your data frame. If all of the columns are ‘numberic’, for example, then you can just set colClass="numeric". A quick and dirty way to figure out classes for each column is the following:

initial <- read.table("data",nrows=100)
classes <- sapply(initial,class)
tabAll <- read.table("data",colClass=classes)

  1. The ... represents argument list when put in the end. All additional parameters will be saved to ... and passed to inner functions.
    The ... argument is also necessary when the number of arguments passed to the functions can not be known in advance like:
args(paste)
function (..., sep=" ", collapse=NULL)

Once catch with ... is that any arguments that appear after ... on the argument list must be named explicitly and cannot be partially matched.

3.split function

Divide into Groups and Reassemble

Description:

‘split’ divides the data in the vector ‘x’ into the groups defined
by ‘f’. The replacement forms replace values corresponding to
such a division. ‘unsplit’ reverses the effect of ‘split’.

Usage:

split(x, f, drop = FALSE, …)
split(x, f, drop = FALSE, …) <- value
unsplit(value, f, drop = FALSE)

> x <- c(rnorm(10), runif(10), rnorm(10,1))
> f <- gl(3,10)
> split(x,f)
$`1`
 [1]  0.04568824  1.43881538  1.20726247  0.89607034 -1.82028884  0.41100988
 [7] -0.89489825 -1.91846889 -1.61372116  1.12882772

$`2`
 [1] 0.82255701 0.71847213 0.58078540 0.72514585 0.02594232 0.93178561
 [7] 0.85934484 0.10371824 0.99216828 0.45048527

$`3`
 [1]  1.1336430  1.4560381  0.5475995  1.9924938 -1.8635712  0.4090884
 [7]  2.0350326 -0.2822986 -0.1865551 -1.1014422

> a <- split(x,f)
> a$1
Error: unexpected numeric constant in "a$1"
> a[1]
$`1`
 [1]  0.04568824  1.43881538  1.20726247  0.89607034 -1.82028884  0.41100988
 [7] -0.89489825 -1.91846889 -1.61372116  1.12882772

> x
 [1]  0.04568824  1.43881538  1.20726247  0.89607034 -1.82028884  0.41100988
 [7] -0.89489825 -1.91846889 -1.61372116  1.12882772  0.82255701  0.71847213
[13]  0.58078540  0.72514585  0.02594232  0.93178561  0.85934484  0.10371824
[19]  0.99216828  0.45048527  1.13364297  1.45603806  0.54759947  1.99249377
[25] -1.86357122  0.40908842  2.03503258 -0.28229863 -0.18655514 -1.10144215
> f
 [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Levels: 1 2 3
#split by two factors
> f1 <- gl(2,5)
> f2 <- gl(5,2)
> f1
 [1] 1 1 1 1 1 2 2 2 2 2
Levels: 1 2
> f2
 [1] 1 1 2 2 3 3 4 4 5 5
Levels: 1 2 3 4 5
> interaction(f1,f2)
 [1] 1.1 1.1 1.2 1.2 1.3 2.3 2.4 2.4 2.5 2.5
Levels: 1.1 2.1 1.2 2.2 1.3 2.3 1.4 2.4 1.5 2.5
> x <- rnorm(10)
> str(split(x, list(f1,f2)))
List of 10
 $ 1.1: num [1:2] -0.855 -0.852
 $ 2.1: num(0) 
 $ 1.2: num [1:2] -0.629 -0.494
 $ 2.2: num(0) 
 $ 1.3: num -0.0677
 $ 2.3: num -0.296
 $ 1.4: num(0) 
 $ 2.4: num [1:2] 0.403 -0.791
 $ 1.5: num(0) 
 $ 2.5: num [1:2] -1.35 -1.01
> str(split(x, interaction(f1,f2)))
List of 10
 $ 1.1: num [1:2] -0.855 -0.852
 $ 2.1: num(0) 
 $ 1.2: num [1:2] -0.629 -0.494
 $ 2.2: num(0) 
 $ 1.3: num -0.0677
 $ 2.3: num -0.296
 $ 1.4: num(0) 
 $ 2.4: num [1:2] 0.403 -0.791
 $ 1.5: num(0) 
 $ 2.5: num [1:2] -1.35 -1.01

4.apply, sapply, tapply, mapply

apply

Apply Functions Over Array Margins

Description:

Returns a vector or array or list of values obtained by applying a
function to margins of an array or matrix.

Usage:

apply(X, MARGIN, FUN, …)

Arguments:

X: an array, including a matrix.

MARGIN: a vector giving the subscripts which the function will be
applied over. E.g., for a matrix ‘1’ indicates rows, ‘2’
indicates columns, ‘c(1, 2)’ indicates rows and columns.
Where ‘X’ has named dimnames, it can be a character vector
selecting dimension names.

> x <- c(rnorm(10), runif(10), rnorm(10,1))
> y <- matrix(x,nrow=3)
> y
           [,1]       [,2]       [,3]      [,4]       [,5]      [,6]      [,7]
[1,] 0.04568824  0.8960703 -0.8948982 1.1288277 0.58078540 0.9317856 0.9921683
[2,] 1.43881538 -1.8202888 -1.9184689 0.8225570 0.72514585 0.8593448 0.4504853
[3,] 1.20726247  0.4110099 -1.6137212 0.7184721 0.02594232 0.1037182 1.1336430
          [,8]       [,9]      [,10]
[1,] 1.4560381 -1.8635712 -0.2822986
[2,] 0.5475995  0.4090884 -0.1865551
[3,] 1.9924938  2.0350326 -1.1014422
> apply(y,1,mean)
[1] 0.2990596 0.1327723 0.4912411

tapply

Apply a Function Over a Ragged Array

Description:

Apply a function to each cell of a ragged array, that is to each
(non-empty) group of values given by a unique combination of the
levels of certain factors.

Usage:

tapply(X, INDEX, FUN = NULL, …, simplify = TRUE)

Arguments:

X: an atomic object, typically a vector.

INDEX: list of one or more factors, each of same length as ‘X’. The
elements are coerced to factors by ‘as.factor’.

> x <- c(rnorm(10), runif(10), rnorm(10,1))
> f <- gl(3,10)
> tapply(x,f,range)
$`1`
[1] -1.918469  1.438815

$`2`
[1] 0.02594232 0.99216828

$`3`
[1] -1.863571  2.035033

Apply a Function to Multiple List or Vector Arguments

Description:

‘mapply’ is a multivariate version of ‘sapply’. ‘mapply’ applies
‘FUN’ to the first elements of each … argument, the second
elements, the third elements, and so on. Arguments are recycled
if necessary.

Usage:

mapply(FUN, …, MoreArgs = NULL, SIMPLIFY = TRUE,
USE.NAMES = TRUE)

Arguments:

FUN: function to apply, found via ‘match.fun’.

…: arguments to vectorize over (vectors or lists of strictly
positive length, or all of zero length). See also ‘Details’.

MoreArgs: a list of other arguments to ‘FUN’.

> list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))
[[1]]
[1] 1 1 1 1

[[2]]
[1] 2 2 2

[[3]]
[1] 3 3

[[4]]
[1] 4

> mapply(rep,1:4,4:1)
[[1]]
[1] 1 1 1 1

[[2]]
[1] 2 2 2

[[3]]
[1] 3 3

[[4]]
[1] 4

> noise <- function(n, mean, sd) {
+ rnorm(n, mean, sd)
+ }
> noise(5,1,2)
[1]  2.3318087 -0.5513379  2.1479612  3.6134264  2.3675103
> noise(1:5,1:5,2) #wrong usage
[1] 1.1778732 0.3632799 6.2087896 2.9261086 3.5094597
> mapply(noise,1:5,1:5,2)
[[1]]
[1] -0.1760846

[[2]]
[1] -0.7810509 -1.4659350

[[3]]
[1] 3.0684418 0.7133993 4.5190991

[[4]]
[1] 1.946463 3.025599 2.906996 5.319200

[[5]]
[1] 5.443579 5.280245 2.844177 4.538039 6.393585

> list(noise(1,1,2), noise(2,2,2), noise(3,3,2), noise(4,4,2), noise(5,5,2))
[[1]]
[1] 1.524288

[[2]]
[1] 1.547129 5.261838

[[3]]
[1] 3.150581 4.107338 2.906161

[[4]]
[1] 3.280658 2.605533 3.905065 5.884222

[[5]]
[1] 5.200394 5.541272 4.189657 5.596289 5.019267

5.interaction

Compute Factor Interactions

Description:

‘interaction’ computes a factor which represents the interaction
of the given factors. The result of ‘interaction’ is always
unordered.

Usage:

interaction(…, drop = FALSE, sep = “.”, lex.order = FALSE)

> f1 <- gl(2,5)
> f2 <- gl(5,2)
> f1
 [1] 1 1 1 1 1 2 2 2 2 2
Levels: 1 2
> f2
 [1] 1 1 2 2 3 3 4 4 5 5
Levels: 1 2 3 4 5
> interaction(f1,f2)
 [1] 1.1 1.1 1.2 1.2 1.3 2.3 2.4 2.4 2.5 2.5
Levels: 1.1 2.1 1.2 2.2 1.3 2.3 1.4 2.4 1.5 2.5

6.plot

#plot two types of points in a scatter plot
> x <- rnorm(100)
> str(x)
 num [1:100] 1.128 -1.144 0.523 1.179 -0.602 ...
> y <- x + rnorm(100)
> str(y)
 num [1:100] 1.814 -2.452 0.271 1.844 -1.471 ...
> g <- gl(2,50, labels=c("Male", "Female"))
> str(g)
 Factor w/ 2 levels "Male","Female": 1 1 1 1 1 1 1 1 1 1 ...
> plot(x, y, type="n")
> points(x[g=="Male"], y[g=="Male"], col="green")
> points(x[g=="Female"], y[g=="Female"], col="blue")
> 

CHENTONG
版权声明:本文为博主原创文章,转载请注明出处。
alipay.png WeChatPay.png

CHENTONG

CHENTONG
积微,月不胜日,时不胜月,岁不胜时。凡人好敖慢小事,大事至,然后兴之务之。如是,则常不胜夫敦比于小事者矣!何也?小事之至也数,其悬日也博,其为积也大。大事之至也希,其悬日也浅,其为积也小。故善日者王,善时者霸,补漏者危,大荒者亡!故,王者敬日,霸者敬时,仅存之国危而后戚之。亡国至亡而后知亡,至死而后知死,亡国之祸败,不可胜悔也。霸者之善著也,可以时托也。王者之功名,不可胜日志也。财物货宝以大为重,政教功名者反是,能积微者速成。诗曰:德如毛,民鲜能克举之。此之谓也。

生信宝典文章集锦

生信的作用越来越大,想学的人越来越多,不管是为了以后发展,还是为了解决眼下的问题。但生信学习不是一朝一夕就可以完成的事情,也许你可以很短时间学会一个交互式软件的操作,却不能看完程序教学视频后就直接写程序。也许你可以跟着一个测序分析流程完成操作,但不懂得背后的原理,不知道什么...… Continue reading

生信宝典文章集锦

Published on January 01, 2100

生信宝典文章集锦

Published on January 01, 2100