1 Reading in data


1.1 DON’T USE ‘FILE/IMPORT’ OR DRAG IN DATA

We often have to read in data from file. For those of you who have used before, you might have done this interactively e.g. going to File/Import data. The problem with this is that when you press ‘knit’ the computer don’t know how to reproduce your actions, so it will crash.

So we want to learn the code commands to put into a code chunk. Here are some common ways to do so (there are many others!)




1.2 REMEMBER TO USE PROJECTS

The advantage of using projects is that you don’t have to ‘tell’ the computer where to look for data. So for those of you who used R before, you don’t have to use commands like setwd().

INSTEAD PUT THE DATA FILES IN THE SAME FOLDER AS YOUR PROJECT .Rproj file.

R will automatically look there first without you needing to tell it any locations.

ADVANCED: If you want to be fancy, you can make a sub-folder and the use “./foldername/filename..” in front of your file name. e.g, the . means look in the current folder, then go into the subfolder called foldername, then look for your filename.




1.3 Reading in csv files

First, make sure your csv is in your lab folder and you are running your lab project.

CSV files are like basic spreadsheets without formatting. E.g. comma separated values. The easiest way to read in a csv file is using the read.csv() command. This is built into R, you don’t need to install any packages.

mydata <- read.csv("FILENAME.csv")
  • The command is read.csv() - if you go to ?read.csv you can see lots of other options like skipping rows etc.

  • FILENAME.csv” is whatever you have called your file. Its case sensitive.

  • mydata is the variable name you wish to save it as. This can be anything you like (e.g. if i read in data on frost dates, I might call it frost).

Other valid commands that do the same thing are read_csv() (from the readr package) and fread() (from the data.table package). These are faster and more adaptable, but you would have do download the packages and they read the data into their own propitiatory format.

1.3.1 I got an error

If you get an error, it means that either you have typed the filename in wrongly (for example you didn’t include the .csv) or that the file isn’t in exactly the same folder as your .rproj, or that you’re not running your project so the computer doesn’t know where to look.

1.3.2 My data doesn’t have column names

If your data doesn’t have column names, you can tell R that it doesn’t like this:

mydata <- read.csv("FILENAME.csv",header=FALSE)

Then you can rename your columns using the names() command.




1.4 Reading in excel files

First, make sure your xlsx or xls file is in your lab folder and you are running your lab project.

More complex spreadsheets are easily read in using the read_excel() command from the readxl package.

First, go to the packages tab, click install and download/install the readxl package. Then add readxl to your library code chunk and run (e.g. library(readxl) )

library(readxl)

Now you can read in excel files using this command

mydata2 <- read_excel("FILENAME.xlsx")
  • The command is read_excel() - if you go to ?read_excel you can see lots of other options like skipping rows etc.

  • “FILENAME.xlsx” is whatever you have called your file. Its case sensitive and you need the extension.

  • mydata2 is the variable name you wish to save it as. This can be anything you like (e.g. if i read in data on frost dates, I might call it frost).

Other valid commands that do the same thing are read_table() (from the readr package) and fread() (from the data.table package). These are faster and more adaptable, but you would have do download the packages and they read the data into their own propitiatory format.

If you get an error, it means that either you have typed the filename in wrongly (for example you didn’t include the .csv) or that the file isn’t in exactly the same folder as your .rproj, or that you’re not running your project so the computer doesn’t know where to look.




1.5 Loading built-in data

R also has many built-in datasets. To load them into R:

  1. In your library code chunk, load the library that contains the dataset <br

  2. Load the dataset using the ‘data’ command

  3. This will create a ‘data promise’ in your Environment tab. To look at or use the data, click on its NAME, or run any command. Here, I used the glimpse command, but you can do anything.

For example, to load the palmer penguins data.

library(palmerpenguins) #normally this would be at the top of the code chunk
data("penguins")
glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species           <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g       <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex               <fct> male, female, female, NA, female, male, female, male…
## $ year              <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…




1.6 Loading spatial data with st_read


Geospatial files like GeoPackages are easily read using the st_read() command from the sf package.

st_read() makes it easy to read spatial data. It automatically detects the file type and returns a spatial object. You can use it to read many kinds of vector data, including Shapefiles (.shp), GeoJSON (.geojson), GeoPackage (.gpkg), KML/KMZ files, and CSVs with WKT geometry.

If the coordinate reference system (CRS) is specified in the file, it will be loaded automatically with the data.

  • First, make sure your spatial file (often .gpkg) is saved in your lab folder and you are running your lab project.

  • Now go to the top of your script, add library(sf) to your library code chunk and run it. If there is an error, go to the packages tab and install the sf package by clicking on the Install button. Then try again.

library(sf)
  • Scroll back down to the place you want to read in your spatial file. You can read geospatial files using this command:
mydata <- st_read("FILENAME.gpkg")
  • To understand what you did:

    • st_read() is the command
    • “FILENAME.gpkg” is your actual file name. Include the extension and match the name exactly (case sensitive).
    • mydata is the variable name you want to use. You can choose anything helpful—for example, counties if the file contains county boundaries.

For example, this will read in your Lab 4 mystery data and save it as a variable called mystery1:

mystery1 <- st_read("Lab04_MysteryData1.gpkg")
## Reading layer `GEOG364_Lab4_MysteryData1' from data source 
##   `/Users/hgreatrex/Documents/GitHub/Teaching/STAT-462/Stat462-2025/Lab04_MysteryData1.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 28 features and 0 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 1.381952 ymin: 38.66057 xmax: 4.23568 ymax: 40.01613
## Geodetic CRS:  WGS 84
  • You can view a summary of the data by typing its name. This includes:

  • the geometry type (e.g., POINT, LINESTRING, POLYGON)

  • the coordinate reference system (e.g., WGS84, EPSG:4326)

  • the bounding box (spatial extent)

mystery1
## Simple feature collection with 28 features and 0 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 1.381952 ymin: 38.66057 xmax: 4.23568 ymax: 40.01613
## Geodetic CRS:  WGS 84
## First 10 features:
##                         geom
## 1  POINT (1.395875 38.68629)
## 2   POINT (1.56276 38.66057)
## 3  POINT (1.394164 38.87473)
## 4  POINT (1.381952 38.96174)
## 5    POINT (1.4977 38.99677)
## 6  POINT (3.023175 39.31032)
## 7  POINT (2.866836 39.44245)
## 8  POINT (3.054251 39.41818)
## 9  POINT (3.136457 39.40375)
## 10 POINT (2.389429 39.58377)


Getting Help

You can see the help file by typing ?st_read in the console. This shows extra options like filtering by layer or bounding box.

If you get an error:

  • Check that you typed the filename correctly, including .gpkg

  • Make sure the file is in the same folder as your .Rproj

  • Confirm that you’re running your project—otherwise R may not know where to look

  • You can see the summary of the data by typing its name. This will include its spatial representation (points), its coordinate reference system (long/lat WGS84 - CRS 4236), and its bounding box.

mystery1
## Simple feature collection with 28 features and 0 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: 1.381952 ymin: 38.66057 xmax: 4.23568 ymax: 40.01613
## Geodetic CRS:  WGS 84
## First 10 features:
##                         geom
## 1  POINT (1.395875 38.68629)
## 2   POINT (1.56276 38.66057)
## 3  POINT (1.394164 38.87473)
## 4  POINT (1.381952 38.96174)
## 5    POINT (1.4977 38.99677)
## 6  POINT (3.023175 39.31032)
## 7  POINT (2.866836 39.44245)
## 8  POINT (3.054251 39.41818)
## 9  POINT (3.136457 39.40375)
## 10 POINT (2.389429 39.58377)

1.6.1 If you get an error:

  • Check that you typed the filename correctly, including .gpkg
  • Make sure the file is in the same folder as your .Rproj
  • Confirm that you’re running your project – otherwise R may not know where to look.

You can see the help file by typing ?st_read into the console, you can see other options like filtering by layer, bounding box, etc.