T5: Loading Data

Reading in and loading data

In-built datasets

There are many datasets built into R, and even more that come with packages. To load them you simply use the data command. Typing data() will bring up a load of the possible datasets.

For example, this loads the iris dataset:

data("iris")

# From the dplyr package
glimpse(iris)
Rows: 150
Columns: 5
$ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
$ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
$ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
$ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
$ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…

If you want to specify data from a specific package, we can also tell R that:

data("pirates", package = "yarrr")
mean(pirates$parrots)
[1] 2.819

All the datasets in R have a help file by using the help menu or putting a ? in front of its name. DO THIS IN THE CONSOLE NOT A CODE CHUNK.

?pirates


Loading data from Excel files

R can easily read in Microsoft Excel spreadsheets using the readxl package:

  1. Make sure the readxl package is loaded.
    E.g. is library(readxl) in your library code chunk?
    Have you run the code chunk?

  2. Place your excel file in your project folder.
    E.g. here I placed Data_frostday.xlsx into my project folder. MAKE SURE YOU OPEN R-STUDIO USING YOUR LAB PROJECT!! If you are not sure what I mean see Projects: How do I know if I am running one? and returning to your project

  3. Make a new code chunk and add the read_excel() command e.g.

    frost <- read_excel("Data_frostday.xlsx")

    Here the command is read_excel(), you are applying this to “frostdays.xlsx” (e.g. reading in an excel file with that name), then assigning the result to a variable called frost. Because you are using your project, R knows to look inside your project folder to find the file.

If this works, there should be no errors and nothing prints on the screen when you run the code chunk.

When I ran it, in the environment tab, frost appeared with a description as a table with 76 rows (observations/obs), and 7 columns (variables). In R, this type of table/spreadsheet is called a data.frame.

# Read in the frost.xlsx file in my project folder and assign it to a variable called frost
library(readxl)
frost    <- read_excel("Data_frostdata.xlsx")
names(frost)
[1] "Station"             "State"               "Type_Fake"          
[4] "Avg_DOY_SpringFrost" "Latitude"            "Longitude"          
[7] "Elevation"           "Dist_to_Coast"      

Or you can put the full file path in the read_excel command

# Read in the frost.xlsx file in my project folder and assign it to a variable called frost
library(readxl)
frost    <- read_excel("Data_frostdata.xlsx")
names(frost)
[1] "Station"             "State"               "Type_Fake"          
[4] "Avg_DOY_SpringFrost" "Latitude"            "Longitude"          
[7] "Elevation"           "Dist_to_Coast"      


Troubleshooting

It says it can’t find the file: - Are you running the right project? e.g. does it say Lab 3 at the top of the screen? - Did you put the file into your Lab folder? - Did you spell it right and include the full .xslx extension? - Did you use quote marks?

It says read_excel doesn’t exist - Did you install the readxl package? - Did you load the readxl package? Go click the code chunk with the library command again! - Did you spell the command right? (case sensitive) - Did you use () afterwards so R understands that it’s a command?


Using the wizard: Sometimes you just can’t get it working. In those cases, try the import wizard:

  • Go to the file menu at the very top of the screen. Click import dataset, then From Excel. Use the wizard to find your file and get it looking correct. It will show you the code you need in the code preview.
  • Because we want to include this file in the markdown, rather than pressing OK, copy the code preview text and put it in your code chunk. DO NOT PUT THE VIEW LINE IN THERE, or every time you run it will open a new tab with the data.


Reading in csv Files

.csv files are comma separated text files, you can read them into microsoft excel. In R, you don’t need any special package to read in a csv file

  1. Place the csv file into your project folder

  2. Use the read_csv() command to read it into R. Assign it to a variable or it will just print onto the screen

  3. Run the code chunk, then click on the variable name in the Environment quadrant to check that it read in correctly (especially make sure that column names have read in correctly)

For example, for to read in a csv file on ozone and summarise:

# Read in the some data on ozone
ozone    <- read.csv("Data_Ozone.csv")

# Check the column names, or click on its name in the Environment quadrant
summary(ozone)
    LOCATION     SITE_NAME          SHORT_NAME           LATITUDE    
 Min.   :2001   Length:451         Length:451         Min.   :32.35  
 1st Qu.:2402   Class :character   Class :character   1st Qu.:34.15  
 Median :2844   Mode  :character   Mode  :character   Median :36.01  
 Mean   :2802                                         Mean   :36.14  
 3rd Qu.:3134                                         3rd Qu.:37.94  
 Max.   :3759                                         Max.   :41.85  
   LONGITUDE      OZONE_1000PPB    POPULATION_DENSITY
 Min.   :-124.2   Min.   : 3.457   Min.   :  0.0000  
 1st Qu.:-121.5   1st Qu.:23.617   1st Qu.:  0.5499  
 Median :-120.0   Median :28.304   Median : 14.6029  
 Mean   :-119.7   Mean   :30.347   Mean   : 34.9754  
 3rd Qu.:-118.0   3rd Qu.:35.254   3rd Qu.: 53.2731  
 Max.   :-114.6   Max.   :84.655   Max.   :406.6252  


Troubleshooting

It says it can’t find the file: - Are you running the right project? e.g. does it say Lab 2 at the top of the screen? - Did you put the file into your Lab 2 folder? - Did you spell it right and include the full .csv extension? - Did you use quote marks?


Using the wizard: Sometimes you just can’t get it working. In those cases, try the import wizard:

  • Go to the file menu at the very top of the screen. Click import dataset, then From Excel. Use the wizard to find your file and get it looking correct. It will show you the code you need in the code preview.
  • Because we want to include this file in the markdown, rather than pressing OK, copy the code preview text and put it in your code chunk. DO NOT PUT THE VIEW LINE IN THERE, or every time you run it will open a new tab with the data.

Reading in txt Files

Same as above but you use the read.txt command. You get a lot of options here, from telling R if it has headers/column names to changing the ‘delimiter’. See the help file and http://www.sthda.com/english/wiki/reading-data-from-txt-csv-files-r-base-functions for more.