T7: Normal, T Distributions and tests

Normal distribution

We have talked about several distributions and tests so far in the lab. To see the help files for most of them, see ?Distributions

Remember as we discussed in lectures, we normally state that a variable is ~N(mean, VARIANCE). But in these commands you need the standard deviation instead. (you can google how to get the sd from the variance if you have forgotten)

To see the help file for all these:

?Normal

To generate a random sample from a normal distribution:

sample.normal <- rnorm(n=100,mean=4,sd=2)

To calculate a z score from your sample/population, you can use R as a calculator.

To calculate the probability of greater/lesser than a value in a given normal distribution (e.g. you can use this as an interactive table)

# probability of less than 1.7 in a normal distribution of N(4,2^2)
pnorm(1.7,mean=4,sd=2,lower.tail = TRUE)
[1] 0.1250719
# probability of greater than 1.8 in a normal distribution of N(4,2^2)
1 - pnorm(1,mean=4,sd=2,lower.tail = TRUE)
[1] 0.9331928
# or
pnorm(1,mean=4,sd=2,lower.tail = FALSE)
[1] 0.9331928

To calculate the value for a given probability

# what value is less than 60% of the data?
qnorm(0.6,mean=4,sd=2,lower.tail = TRUE)
[1] 4.506694
# what value is greater than 80% of the data?
qnorm(0.8,mean=4,sd=2,lower.tail = FALSE)
[1] 2.316758

Wilks Shapiro test for normality

To test for normality:

First, have a look at the histogram! Here is the code for the Shapiro-Wilk test.

shapiro.test(HousesNY$Price)

    Shapiro-Wilk normality test

data:  HousesNY$Price
W = 0.96341, p-value = 0.1038

QQ-Norm plot

You can also make a QQ-Norm plot

We discussed the basic qqnorm command last week: qqplot(variable). For example `qqplot(malepirates$age)`` makes a qq-norm plot of the age column in the data.frame I created earlier on male pirates. There is a nicer version inside the ggpubr package.

library(ggpubr)
ggqqplot(HousesNY$Price,col="blue")

YOU CAN INTERPRET IT HERE: https://www.learnbyexample.org/r-quantile-quantile-qq-plot-base-graph/

T-distribution

What even is this? See this nice resource: https://365datascience.com/tutorials/statistics-tutorials/students-t-distribution/

To see the help file for all these:

?TDist

To calculate a t-statistic from your sample/population, you can use R as a calculator. To calculate the probability of greater/lesser than a value in a given t-distribution (e.f. you can use this as an interactive t-table)

# probability of seeing less than 1.7 in a  t-distribution 
# with 20 degrees of freedom
pt(1.55,df=20,lower.tail = TRUE)
[1] 0.9315892

To calculate the value for a given probability

# what value is greater than 90% of the data in a t-distribution with df=25
qt(0.9,df=25,lower.tail = TRUE)
[1] 1.316345

To conduct a full t-test on some data:

# Conduct a two-sided t-test where we think that the data comes from a T-distribution with mean 100.
t.test(HousesNY$Price,mu=100,alternative="two.sided")

    One Sample t-test

data:  HousesNY$Price
t = 2.3954, df = 52, p-value = 0.02024
alternative hypothesis: true mean is not equal to 100
95 percent confidence interval:
 102.2125 125.0516
sample estimates:
mean of x 
 113.6321 

or see the detailed tutorial here: http://www.sthda.com/english/wiki/one-sample-t-test-in-r for one-sample

and here for comparing two samples: http://www.sthda.com/english/wiki/unpaired-two-samples-t-test-in-r