This tutorial covers reading in every type of file I can think of.


1. General pointers

Here’s the structure of EVERY command in R.

name_youre_saving_it_as  <- COMMAND( "InputFilename.Extension" , other_options)
  • COMMAND is the function used to read the data
    • e.g.,read.csv for CSV files

  • "InputFilename.Extension" should be replaced with your file name.

  • other_options includes (optional) additional options for your command
    • e.g. nrow specifies the number of rows within the read.csv() command

  • "name_youre_saving_it_as" is the name of the variable you choose to assign your data to.
    • e.g x <- 5 assigns the number 5 to a variable called x

Using this format, here’s an example of how to read the first two rows of a .csv file called “myfrost.csv,” save it as the variable “froooostdata,” and summarize it:

froooostdata  <- read.csv( "myfrost.csv" , nrow=2)
summary(froooostdata)
Didn’t work? Click here to see a few common mistakes/issues


Mistake 1

#Incorrect
summary("myfrost.csv")

This will summarize the WORD “myfrost.csv” as a character string. To summarize data, you must first load it into R and assign it to a variable using <-.


Mistake 2

#Incorrect
read.csv("myfrost.csv")
summary(myfrost)

The first line loads the data into R but doesn’t save it as a variable, so it will print the data on the screen. The second line then attempts to summarize a non-existent variable named myfrost, which will crash.




2. Organizing your data

You should be running a PROJECT. If so:

  • either place your files in your main project folder.
  • OR create a sub-folder and put your data in there.

If your files are in the main folder, you do not need to use setwd() or add an address. Commands like this will work.

mydata <- read.csv("filename.csv")
mydata <- st_read("filename.shp")
mydata <- read.table("filename.txt")

If you use a sub-folder, you need to add the location of the sub-folder to your command. Lets imagine my folder is called Dbata (mis-spelled so you can see there’s nothing special about the name), I would adjust the commands to look like this

mydata <- read.csv("./Dbata/filename.csv")
mydata <- st_read("./Dbata/filename.shp")
mydata <- read.table("./Dbata/filename.txt")

The “.” means “look inside the current project folder”, then the “/Data” means look for a subfolder called Dbata.




3. Read in common data types

3.1. .csv files

 mydata  <- read.csv("filename.csv")

or

 library(tidyverse) 
 mydata  <- read_csv("filename.csv")
Click here to see a tutorial on these commands


There are two ways of doing this:


read.csv() : If your file is small, then this command works without loading any packages at all.

mydata <- read.csv("filename.csv")

For example, if I had a dataset called trees in a subfolder called data, I would run

trees.data <- read.csv("./data/trees.csv")



read_csv() : This is a more sophisticated way to read in csv data - it is much faster for large datasets. You first need to add library(tidyverse) to your library code chunk at the top of your script and run:

library(tidyverse) # put this in your library code chunk at the top
mydata <- read.csv("filename.csv")

For example,

trees.data <- read_csv("trees.csv")





3.2. .txt files

 mydata  <- read.txt("filename.txt")

or

 library(tidyverse) 
 mydata  <- read_lines("filename.txt")
Click here to see a tutorial on these commands






3.3. .xlsx and .xls files

 library(readxl) 
 mydata  <- read_excel("filename.xlsx")
Click here to see a tutorial on this command






3.3. .shp Shape Files

 library(sf) 
 mydata  <- st_read("filename.shp")

IMPORTANT! For this to work, there should be MANY sub-files with the same name but different extension in your folder (e.g. filename.shp, filename.dbx..). You need to put ALL of them in your folder, then this single command reads them in.


Click here to see a tutorial on this command






3.4. .geo

 library(sf) 
 mydata  <- st_read("filename.shp")

IMPORTANT! For this to work, there should be MANY sub-files with the same name but different extension in your folder (e.g. filename.shp, filename.dbx..). You need to put ALL of them in your folder, then this single command reads them in.


Click here to see a tutorial on this command
























# FILE| .csv 
#______________________________________
 mydata  <- read.csv("filename.csv")
 # or
 library(tidyverse) 
 mydata  <- read_csv("filename.csv")

 
# FILE| .txt (tab separated)
#______________________________________
 mydata  <- read.txt("filename.txt",sep="\t", header=TRUE)


# FILE| .xls or xlsx
#______________________________________
 library(readxl) 
 mydata  <- read_excel("filename.xlsx")

 
# FILE| Shape-files.  
# There should be MANY sub-files with the same name 
# but different extension in your folder (e.g. filename.shp, filename.dbx..)
# You need ALL of them in your folder, then this single command reads them in
#______________________________________
  library(sf) 
  mydata  <- st_read("filename.shp")