====== Normal distribution functions ======
^ Function ^ Purpose ^
| dnorm | Normal density |
| pnorm | Normal distribution function |
| qnorm | Normal quantile function |
| rnorm | Normal random variates |
Table 8-1. Discrete distributions
| Discrete distribution | R name | Parameters |
| Binomial | binom | n = number of trials; \\ p = probability of success for one trial |
| Geometric | geom | p = probability of success for one trial |
| Hypergeometric | hyper | m = number of white balls in urn; \\ n = number of black balls in urn; \\ k = number of balls drawn from urn |
| Negative binomial (NegBinomial) | nbinom | size = number of successful trials; \\ either prob = probability of successful trial or \\ mu = mean |
| Poisson | pois | lambda = mean |
Table 8-2. Continuous distributions
| Continuous distribution | R name | Parameters |
| Beta | beta | shape1; shape2 |
| Cauchy | cauchy | location; scale |
| Chi-squared (Chisquare) | chisq | df = degrees of freedom |
| Exponential | exp | rate |
| F | f | df1 and df2 = degrees of freedom |
| Gamma | gamma | rate; either rate or scale |
| Log-normal (Lognormal) | lnorm | meanlog = mean on logarithmic scale; \\ sdlog = standard deviation on logarithmic scale |
| Logistic | logis | location; scale |
| Normal | norm | mean; \\ sd = standard deviation |
| Student’s t (TDist) | t | df = degrees of freedom |
| Uniform | unif | min = lower limit; \\ max = upper limit |
| Weibull | weibull | shape; scale |
| Wilcoxon | wilcox | m = number of observations in first sample; \\ n = number of observations in second sample |
===== pnorm, qnorm =====
Normal distribution
$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{\frac{-(x-\mu)^2}{2\sigma^2}} $
Assume that the test scores of a college entrance exam fits a normal distribution. Furthermore, the mean test score is 72, and the standard deviation is 15.2. What is the percentage of students scoring 84 or more in the exam?
> pnorm(72, mean=72, sd=15.2, lower.tail=FALSE)
[1] 0.5
> pnorm(1.96)
[1] 0.9750021
> pnorm(1.96)-pnorm(-1.96)
[1] 0.9500042
> pnorm(c(1.96, -1.96))
[1] 0.9750021 0.0249979
> pnorm(84, mean=72, sd=15.2, lower.tail=FALSE)
[1] .2149176
> qnorm(.2149176, mean=72, sd=15.2, lower.tail=FALSE)
[1] 84
===== rnorm =====
Random samples from a normal distribution
> set.seed(1024)
> rnorm(50)
[1] -0.778662882 -0.389476396 -2.033798329 -0.982373104 0.247890054
[6] -2.103864629 -0.381418049 2.074919838 1.027138407 0.473014228
[11] -1.879263193 -1.239189026 1.160418602 0.003671291 -0.095452066
[16] 1.795551228 -1.322138481 -0.276086413 -0.743976510 -1.070050125
[21] -0.349525474 0.805559661 1.605301660 1.447595754 -0.128302224
[26] -0.538926447 0.391586050 0.879217023 -0.824732092 0.732876423
[31] -0.664914510 0.360885549 1.011930957 -0.235916848 1.353589893
[36] -0.268632965 1.019877368 -0.279706500 -0.618146278 -0.499273059
[41] -0.153716777 1.220869694 -0.669570510 -1.209660342 1.024096655
[46] 0.603955311 -0.568653469 -0.891303117 -2.525145692 0.589357049
===== qt, pt =====
$t = \frac{Z}{\sqrt{\frac{V}{m}}}$
> qt(c(0.025, 0.975), df=5)
[1] -2.5706 2.5706
> qt(c(0.025, 0.975), df=10)
[1] -2.228139 2.228139
> qt(c(0.025, 0.975), df=20)
[1] -2.085963 2.085963
> qt(c(0.025, 0.975), df=30)
[1] -2.042272 2.042272
> qt(c(0.025, 0.975), df=40)
[1] -2.021075 2.021075
> qt(c(0.025, 0.975), df=50)
[1] -2.008559 2.008559
. . . . . .
> qt(c(0.025, 0.975), df=50000)
[1] -1.960011 1.960011
====== Counting the Number of Combinations ======
A common problem in computing probabilities of discrete variables is counting combinations: the number of distinct subsets of size k that can be created from n items. The number is given by
$$n!/r!(n − r)!$$
But it’s much more convenient to use the choose function—especially as n and k grow larger:
> choose(5,3) # How many ways can we select 3 items from 5 items?
[1] 10
> choose(50,3) # How many ways can we select 3 items from 50 items?
[1] 19600
> choose(50,30) # How many ways can we select 30 items from 50 items?
[1] 4.712921e+13
These numbers are also known as **binomial coefficients**.
====== Generating Combinations ======
> combn(1:5,3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 2 2 2 3
[2,] 2 2 2 3 3 4 3 3 4 4
[3,] 3 4 5 4 5 5 4 5 5 5
The function is not restricted to numbers. We can generate combinations of strings, too. Here are all combinations of five treatments taken three at a time:
> combn(c("T1","T2","T3","T4","T5"), 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] "T1" "T1" "T1" "T1" "T1" "T1" "T2" "T2" "T2" "T3"
[2,] "T2" "T2" "T2" "T3" "T3" "T4" "T3" "T3" "T4" "T4"
[3,] "T3" "T4" "T5" "T4" "T5" "T5" "T4" "T5" "T5" "T5"
====== Generating Random Numbers ======
The simple case of generating **uniform random numbers** **between 0 and 1** is handled by the runif function:
> runif(1)
[1] 0.5119812
Generating a vector of 10 such values:
> runif(10)
[1] 0.03475948 0.88950680 0.90005434 0.95689496 0.55829493 0.18407604
[7] 0.87814788 0.71057726 0.11140864 0.66392239
> runif(1, min=-3, max=3) # One uniform variate between -3 and +3
[1] 2.954591
> rnorm(1) # One standard Normal variate
[1] 1.048491
> rnorm(1, mean=100, sd=15) # One Normal variate, mean 100 and SD 15
[1] 108.7300
> rbinom(1, size=10, prob=0.5) # One binomial variate
[1] 3
> rpois(1, lambda=10) # One Poisson variate
[1] 13
> rexp(1, rate=0.1) # One exponential variate
[1] 8.430267
> rgamma(1, shape=2, rate=0.1) # One gamma variate
[1] 20.47334
> rnorm(3, mean=c(-10,0,+10), sd=1) # mean이 각 -10,0,10이고 각 mean의 sd가 1인 경우에, random score를 구할것
[1] -11.195667 2.615493 10.294831
Recycling the vector . . .
> rnorm(6, mean=c(-10,0,+10), sd=1)
[1] -11.74168122 0.56572232 11.88595452 -11.13726844
[5] 0.03274875 9.02216868
====== Generating Reproducible Random Numbers ======
After generating random numbers, you may often want to **reproduce the same sequence of “random” numbers** every time your program executes.
> set.seed(165) # Initialize the random number generator to a known state
> runif(10) # Generate ten random numbers
[1] 0.1159132 0.4498443 0.9955451 0.6106368 0.6159386 0.4261986 0.6664884
[8] 0.1680676 0.7878783 0.4421021
> set.seed(165) # Reinitialize to the same known state
> runif(10) # Generate the same ten "random" numbers
[1] 0.1159132 0.4498443 0.9955451 0.6106368 0.6159386 0.4261986 0.6664884
[8] 0.1680676 0.7878783 0.4421021
====== Generating a Random Sample ======
Suppose your World Series data contains a vector of years when the Series was played. You can select 10 years at random using sample:
year <- c(1900:2016) # years in vector year
world.series <- data.frame(year)
> sample(world.series$year, 10)
[1] 1906 1963 1966 1928 1905 1924 1961 1959 1927 1934
The items are randomly selected, so running sample again (usually) produces a different result:
> sample(world.series$year, 10)
[1] 1968 1947 1966 1916 1970 1961 1936 1913 1914 1958
Replacement in random sampling: Specify replace=TRUE to sample with replacement.
> set.seed(121)
sample(world.series$year, 10)
[1] 1906 1963 1966 1928 1905 1924 1961 1959 1927 1934
set.seed(121)
sample(world.series$year, 10)
[1] 1906 1963 1966 1928 1905 1924 1961 1959 1927 1934
====== Generating Random Sequences ======
> sample(set, n, replace=TRUE)
> sample(c("H","T"), 10, replace=TRUE)
[1] "H" "H" "H" "T" "T" "H" "T" "H" "H" "T"
> sample(c(FALSE,TRUE), 20, replace=TRUE)
[1] TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
[13] TRUE TRUE FALSE TRUE TRUE FALSE FALSE TRUE
> sample(c(FALSE,TRUE), 20, replace=TRUE, prob=c(0.2,0.8))
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE
[13] TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE
====== Randomly Permuting a Vector ======
> sample(1:10)
[1] 5 8 7 4 3 9 2 6 1 10
====== Calculating Probabilities for Continuous Distributions ======
| Distribution | Distribution function: P(X ≤ x) |
| Normal | pnorm(x, mean, sd) |
| Student’s t | pt(x, df) |
| Exponential | pexp(x, rate) |
| Gamma | pgamma(x, shape, rate) |
| Chi-squared (χ2) | pchisq(x, df) |
> pnorm(66, mean=70, sd=3)
[1] 0.09121122
> pnorm(73, mean=70, sd=3)
[1] ??
> b <- pnorm(-1)
> a <- pnorm(1)
> a-b
[1] 0.6826895
> b <- pnorm(-2)
> a <- pnorm(2)
> a-b
[1] 0.9544997
> a <- pnorm(3)
> b <- pnorm(-3)
> a-b
[1] 0.9973002
> b <- pnorm(-1.959964)
> a <- pnorm(1.959964)
> a-b
[1] 0.95
====== Converting Probabilities to Quantiles ======
> qnorm(0.8413447, mean=70, sd=3)
[1] 73
> pnorm(73, mean=70, sd=3)
[1] 0.8413447
> qnorm(c(0.025, 0.975)) # 5% 바깥쪽의 점수는 약 +-2sd 점수인 -2, 2
[1] -1.959964 1.959964
====== Plotting a Density Function ======
> x <- seq(from=-3, to=+3, length.out=100)
> plot(x, dnorm(x))
{{dnorm_x.png|Normal distribution plot}}
> x <- seq(from=-3, to=+3, length.out=100)
> y <- dnorm(x)
> plot(x, y, main="Standard Normal Distribution", type='l',
+ ylab="Density", xlab="Quantile")
> abline(h=0)
{{standard_normal_distribution.png|standard_normal_distribution}}
> # The body of the polygon follows the density curve where 1 <= z <= 2
> region.x <- x[1 <= x & x <= 2]
> region.y <- y[1 <= x & x <= 2]
>
> # We add initial and final segments, which drop down to the Y axis
> region.x <- c(region.x[1], region.x, tail(region.x,1))
> region.y <- c( 0, region.y, 0)
> polygon(region.x, region.y, density=-1, col="red")
{{standard_normal_distribution_1-2.png|standard_normal_distribution}}