This tutorial covers loads of summary statistics that might be useful.
Here’s the structure of EVERY command in R.
COMMAND
is the function used to read
the data
read.csv
for CSV files
input_variable"
should be replaced
with the case sensitive R variable you want to apply it to. Look at your
environment tab to see what you have saved. other_options
includes (optional)
additional options for your command
"output_variable"
is the name of the
variable you choose to assign your data to.
x <- 5
assigns the number 5 to a variable called
x Here I will show a few examples for the a non spatial dataset on houseprices.
First I load the data here
To have a look at the data there are many options. You can:
piratedataset
into the console or a code
chunk)View(variable_name)
(View is a command
from the tidyverse package).head(variable_name)
to see the first 6
lines or so (good for quick checks)glimpse(variable_name)
to get a nice
summary.names(variable_name)
to get the column
names.DO NOT PUT View(dataname) into a code chunk (or remove it before you knit). It breaks R-studio sometimes
For example
To see what the column names are, you can use the
names(dataset)
command. I use this in the console A LOT for
copy/pasting names into my code/report.
## [1] "Price" "Beds" "Baths" "Size" "Lot"
Or the glimpse command:
To find the number of rows and columns, these are useful. Or look at the environment tab, or some summaries include it
To see what type of data R thinks you have, try the class command
or for a column
Or you can do things manually, using the $ symbol to choose a column. All of this is for the price column
mean(HousesNY$Price)
median(HousesNY$Price)
mode(HousesNY$Price)
sd(HousesNY$Price)
var(HousesNY$Price)
IQR(HousesNY$Price)
range(HousesNY$Price)
To look at the summaries there are a load of options. Choose your favourites:
summary(dataset)
skim(dataset)
in the skimr packagesummarize(dataset)
in the papeR package. This looks
pretty powerful, I’m just learning itNone are better or worse than others - simply choose what works for you in the moment.
or
What if you want to find more sophisticated statistics e.g. the avergae price per size of house.
Here we use the group_by()
and summarise()
commands and save our answers to a new variable.
We are making use of the pipe symbol, %>%, which takes the answer from group_by and sends it directly to the summarise command
Here is some data on frost dates at weather stations (i’ll update on house data later)
To summarise results by the type of weather station:
frost.summary.type <- group_by(frost, by=Type_Fake) %>%
summarise(mean(Latitude),
max(Latitude),
min(Dist_to_Coast))
frost.summary.type
Here, my code is: