We often have to read in data from file. For those of you who have used before, you might have done this interactively e.g. going to File/Import data. The problem with this is that when you press ‘knit’ the computer don’t know how to reproduce your actions, so it will crash.
So we want to learn the code commands to put into a code chunk. Here are some common ways to do so (there are many others!)
The advantage of using projects is that you don’t have to ‘tell’ the computer where to look for data. So for those of you who used R before, you don’t have to use commands like setwd().
INSTEAD PUT THE DATA FILES IN THE SAME FOLDER AS YOUR PROJECT .Rproj file.
R will automatically look there first without you needing to tell it any locations.
ADVANCED: If you want to be fancy, you can make a sub-folder and the use “./foldername/filename..” in front of your file name. e.g, the . means look in the current folder, then go into the subfolder called foldername, then look for your filename.
First, make sure your csv is in your lab folder and you are running your lab project.
CSV files are like basic spreadsheets without formatting. E.g. comma separated values. The easiest way to read in a csv file is using the read.csv() command. This is built into R, you don’t need to install any packages.
mydata <- read.csv("FILENAME.csv")
The command is read.csv() - if you go to ?read.csv
you can see lots of other options like skipping rows etc.
“FILENAME.csv” is whatever you have called your
file. Its case sensitive.
mydata is the variable name you wish to save it as.
This can be anything you like (e.g. if i read in data on frost dates, I
might call it frost).
Other valid commands that do the same thing are
read_csv() (from the readr package) and
fread() (from the data.table package). These are faster and
more adaptable, but you would have do download the packages and they
read the data into their own propitiatory format.
If you get an error, it means that either you have typed the filename in wrongly (for example you didn’t include the .csv) or that the file isn’t in exactly the same folder as your .rproj, or that you’re not running your project so the computer doesn’t know where to look.
If your data doesn’t have column names, you can tell R that it doesn’t like this:
mydata <- read.csv("FILENAME.csv",header=FALSE)
Then you can rename your columns using the names()
command.
First, make sure your xlsx or xls file is in your lab folder and you are running your lab project.
More complex spreadsheets are easily read in using the
read_excel() command from the readxl
package.
First, go to the packages tab, click install and download/install the
readxl package. Then add readxl to your library code chunk
and run (e.g. library(readxl) )
library(readxl)
Now you can read in excel files using this command
mydata2 <- read_excel("FILENAME.xlsx")
The command is read_excel() - if you go to
?read_excel you can see lots of other options like skipping rows
etc.
“FILENAME.xlsx” is whatever you have called your file. Its case sensitive and you need the extension.
mydata2 is the variable name you wish to save it as.
This can be anything you like (e.g. if i read in data on frost dates, I
might call it frost).
Other valid commands that do the same thing are
read_table() (from the readr package) and
fread() (from the data.table package). These are faster and
more adaptable, but you would have do download the packages and they
read the data into their own propitiatory format.
If you get an error, it means that either you have typed the filename in wrongly (for example you didn’t include the .csv) or that the file isn’t in exactly the same folder as your .rproj, or that you’re not running your project so the computer doesn’t know where to look.
R also has many built-in datasets. To load them into R:
In your library code chunk, load the library that contains the
dataset <br
Load the dataset using the ‘data’ command
This will create a ‘data promise’ in your Environment tab. To look at or use the data, click on its NAME, or run any command. Here, I used the glimpse command, but you can do anything.
For example, to load the palmer penguins data.
library(palmerpenguins) #normally this would be at the top of the code chunk
data("penguins")
glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex <fct> male, female, female, NA, female, male, female, male…
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…
Geospatial files like GeoPackages are easily read using the
st_read() command from the sf package.
st_read() makes it easy to read spatial data. It
automatically detects the file type and returns a spatial object. You
can use it to read many kinds of vector data, including Shapefiles
(.shp), GeoJSON (.geojson), GeoPackage (.gpkg), KML/KMZ files, and CSVs
with WKT geometry.
If the coordinate reference system (CRS) is specified in the file, it will be loaded automatically with the data.
First, make sure your spatial file (often .gpkg) is saved in your lab folder and you are running your lab project.
Now go to the top of your script, add library(sf) to
your library code chunk and run it. If there is an error, go to the
packages tab and install the sf package by clicking on
the Install button. Then try again.
library(sf)
mydata <- st_read("FILENAME.gpkg")
To understand what you did:
st_read() is the commandmydata is the variable name you want to use. You can
choose anything helpful—for example, counties if the file
contains county boundaries.For example, this will read in your Lab 4 mystery data and save it as
a variable called mystery1:
mystery1 <- st_read("Lab04_MysteryData1.gpkg")
## Reading layer `GEOG364_Lab4_MysteryData1' from data source
## `/Users/hgreatrex/Documents/GitHub/Teaching/STAT-462/Stat462-2025/Lab04_MysteryData1.gpkg'
## using driver `GPKG'
## Simple feature collection with 28 features and 0 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1.381952 ymin: 38.66057 xmax: 4.23568 ymax: 40.01613
## Geodetic CRS: WGS 84
You can view a summary of the data by typing its name. This includes:
the geometry type (e.g., POINT, LINESTRING, POLYGON)
the coordinate reference system (e.g., WGS84, EPSG:4326)
the bounding box (spatial extent)
mystery1
## Simple feature collection with 28 features and 0 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1.381952 ymin: 38.66057 xmax: 4.23568 ymax: 40.01613
## Geodetic CRS: WGS 84
## First 10 features:
## geom
## 1 POINT (1.395875 38.68629)
## 2 POINT (1.56276 38.66057)
## 3 POINT (1.394164 38.87473)
## 4 POINT (1.381952 38.96174)
## 5 POINT (1.4977 38.99677)
## 6 POINT (3.023175 39.31032)
## 7 POINT (2.866836 39.44245)
## 8 POINT (3.054251 39.41818)
## 9 POINT (3.136457 39.40375)
## 10 POINT (2.389429 39.58377)
Getting Help
You can see the help file by typing ?st_read in the
console. This shows extra options like filtering by layer or bounding
box.
If you get an error:
Check that you typed the filename correctly, including
.gpkg
Make sure the file is in the same folder as your
.Rproj
Confirm that you’re running your project—otherwise R may not know where to look
You can see the summary of the data by typing its name. This will include its spatial representation (points), its coordinate reference system (long/lat WGS84 - CRS 4236), and its bounding box.
mystery1
## Simple feature collection with 28 features and 0 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 1.381952 ymin: 38.66057 xmax: 4.23568 ymax: 40.01613
## Geodetic CRS: WGS 84
## First 10 features:
## geom
## 1 POINT (1.395875 38.68629)
## 2 POINT (1.56276 38.66057)
## 3 POINT (1.394164 38.87473)
## 4 POINT (1.381952 38.96174)
## 5 POINT (1.4977 38.99677)
## 6 POINT (3.023175 39.31032)
## 7 POINT (2.866836 39.44245)
## 8 POINT (3.054251 39.41818)
## 9 POINT (3.136457 39.40375)
## 10 POINT (2.389429 39.58377)
.gpkg.RprojYou can see the help file by typing ?st_read into the
console, you can see other options like filtering by layer, bounding
box, etc.