This tutorial covers reading in every type of file I can think of.
Here’s the structure of EVERY command in R.
COMMAND
is the function used to read
the data
read.csv
for CSV files
"InputFilename.Extension"
should be
replaced with your file name. other_options
includes (optional)
additional options for your command
"name_youre_saving_it_as"
is the name
of the variable you choose to assign your data to.
x <- 5
assigns the number 5 to a variable called
x Using this format, here’s an example of how to read the first two rows of a .csv file called “myfrost.csv,” save it as the variable “froooostdata,” and summarize it:
Mistake 1
This will summarize the WORD “myfrost.csv” as a character string. To summarize data, you must first load it into R and assign it to a variable using <-.
Mistake 2
The first line loads the data into R but doesn’t save it as a variable, so it will print the data on the screen. The second line then attempts to summarize a non-existent variable named myfrost, which will crash.
You should be running a PROJECT. If so:
If your files are in the main folder, you do not need to use setwd() or add an address. Commands like this will work.
mydata <- read.csv("filename.csv")
mydata <- st_read("filename.shp")
mydata <- read.table("filename.txt")
If you use a sub-folder, you need to add the location of the sub-folder to your command. Lets imagine my folder is called Dbata (mis-spelled so you can see there’s nothing special about the name), I would adjust the commands to look like this
mydata <- read.csv("./Dbata/filename.csv")
mydata <- st_read("./Dbata/filename.shp")
mydata <- read.table("./Dbata/filename.txt")
The “.” means “look inside the current project folder”, then the “/Data” means look for a subfolder called Dbata.
or
There are two ways of doing this:
read.csv()
: If your file is small, then this command
works without loading any packages at all.
For example, if I had a dataset called trees in a subfolder called data, I would run
read_csv()
: This is a more sophisticated way to read in
csv data - it is much faster for large datasets. You first need to add
library(tidyverse)
to your library code chunk at the top of
your script and run:
library(tidyverse) # put this in your library code chunk at the top
mydata <- read.csv("filename.csv")
For example,
or
IMPORTANT! For this to work, there should be MANY sub-files with the same name but different extension in your folder (e.g. filename.shp, filename.dbx..). You need to put ALL of them in your folder, then this single command reads them in.
IMPORTANT! For this to work, there should be MANY sub-files with the same name but different extension in your folder (e.g. filename.shp, filename.dbx..). You need to put ALL of them in your folder, then this single command reads them in.
# FILE| .csv
#______________________________________
mydata <- read.csv("filename.csv")
# or
library(tidyverse)
mydata <- read_csv("filename.csv")
# FILE| .txt (tab separated)
#______________________________________
mydata <- read.txt("filename.txt",sep="\t", header=TRUE)
# FILE| .xls or xlsx
#______________________________________
library(readxl)
mydata <- read_excel("filename.xlsx")
# FILE| Shape-files.
# There should be MANY sub-files with the same name
# but different extension in your folder (e.g. filename.shp, filename.dbx..)
# You need ALL of them in your folder, then this single command reads them in
#______________________________________
library(sf)
mydata <- st_read("filename.shp")