By the end of this week’s lab, you will be able to:
Assignment 2 is due by midnight Next Wed. See here I PROVIDE HELP UNTIL THE END OF NEXT WEEK’S LAB (final evening is for your own finishing up).
There is a TEAMS discussion for lab help CLICK HERE. Remember to include a screenshot of the issue and a short desciption of the problem. Also try googling the error first.
Every time you re-open R studio check you are using your project file (does it say Lab 2 at the top?).
EVERY TIME YOU RE-OPEN R-STUDIO YOU NEED TO RE-RUN ALL YOUR CODE CHUNKS. The easiest way to do this is to press the “Run All” button (see the Run menu at the top of your script)
If the labs are causing major problems or your computer hardware is struggling (or you have any other software issue), Talk to Dr Greatrex. We can fix this and there are other options for “online R” that you can use.
IF YOU ARE DOING THIS ON YOUR COMPUTER: First, go and look at your STAT-462 folder on your computer. Make sure that everything looks right (e.g. a single sub-folder for each lab containing your project file, your Rmd and your html, along with any datafiles/pics as needed). If so, congrats! If not, chat to Dr G.
As you can see, the friendly text is back for your new lab-script. This is a pain and we want to make life easy in terms of formatting. So our first job in this lab is to make a template so future reports are easier.
Work through Tutorial 5 Report template to create and save a template to your STAT-462 folder and also a version inside your lab 2 folder.
For the lab 2 folder one, update the title etc to be about Lab 2.
In the library section of your lab report, add a new code chunk and use this code to load the following libraries. If some don’t exist on your computer or on the cloud, use [Tutorial 2.3] (https://psu-spatial.github.io/stat462-2022/T1_R_Basics.html#23_Adding_a_new_package) to install/download them.
library(tidyverse)
library(dplyr)
library(ggpubr)
library(skimr)
library(ggplot2)
library(plotly)
library(ISLR)
You should now be able to see how headings work. If you click the A symbol on the top right you can also change the formatting of text (you can also do this in the basic text - see Tutorial 4)
I also want you to be able to embed images or videos into your report.
Q1:
Follow Tutorial 4.7 Images to add an image of your choice into your report. (note, non english characters in file names or huge photos will break R)
Q2:
If you wish to add a video from youtube, you will need to install a new package called vembedr
. Follow Tutorial 2.3 to install this package and add it to your library code chunk.
Now make a code chunk and use the embed_url
command to embed a video of your choice from youtube/vimeo etc in your report. If you have issues, see teams.
Q3:
Below both, use a bullet point list (see tutorial 4 or google) to explain why you chose those photos/images.
TUTORIAL 8 AND LAB 1 MIGHT HELP YOU HERE
Q4:
In your code showcase section, answer these questions, making sure to use full sentences in your conclusions. If it helps to make your report easier to read, feel free to include the question text.
A sample of 36 obese rock-hopper penguins in a zoo were put on a special diet for a year - part of a large nationwide programme. The average weight loss was 11oz and the standard deviation of the weight loss was 19oz. (note, that a positive weight loss implies reduced weight over time).
Q4a:
Given our sample, by hand or in R, calculate a 99% confidence interval for the true mean weight reduction of all zoo penguins in the entire programme (not just our zoo). Make sure to show your workings or R code. Ideally, but optional, upload a mini diagram of the distribution you are using to calculate your confidence interval & mark any key values.
Q4b:
Based on the interval you calculate above, do you have sufficient evidence at your 99% level of significance to believe that the weight-loss programme is working and the penguins are losing weight?
Q4c:
The average penguin actually weighs about 3Kg. Is this diet something you would recommend for meaningful weight loss?
Q5:
Tests are being carried out on a new drug designed to relieve the symptoms of the flu, specifically on the number of hours people can sleep. The new drug is given in tablet form one evening to a random sample of 16 people who have colds. The number of hours they sleep may be assumed to be Normally distributed and is recorded below.
There is also a very large control group of people who have colds but are not given the drug. The mean number of hours they sleep is 6.6 hrs.
## [1] Hours slept by people given the new drug
## [1] 2.3
## [1] 5
## [1] 6
## [1] 6.4
## [1] 6.7
## [1] 6.7
## [1] 6.9
## [1] 7.2
## [1] 7.2
## [1] 7.4
## [1] 7.6
## [1] 7.8
## [1] 7.9
## [1] 8.1
## [1] 8.1
## [1] 9.7
You can enter the sleep data into R using this code.
# The c command sticks things together
sleep <- c(8.1,6.7,2.3,7.2,8.1,9.7,6.0,7.4,6.4,6.9,5.0,7.8,6.7,7.2,7.6,7.9)
Q5a:
By hand, carry out a hypothesis test at the 1% significance level that the drug has any impact on the length of time people sleep. You can use R as a calculator to get things like the mean. Include a screenshot of your [neat] workings in this report. Make sure to include:
Q5b:
Use R and the t.test command to calculate the t-test for the data above. Comment on whether your two results agree (e.g. did you make a mistake anywhere). See the Tutorial 7 for code
We are now going to work on “real data analysis”, filling in the sections in the rest of your report.
You have three dataset choices. Choose ONE of:
Note, there are missing values in the penguins dataset. To ignore them in a given command, try adding ,na.rm=TRUE to the command e.g.
example <- c(1,4,5,2,3,NA,2,4)
mean(example)
## [1] NA
mean(example, na.rm=TRUE)
## [1] 3
To simply remove any row with missing data, try the na.omit()
command e g.
test <- data.frame(A=c(1,3,4),B=c(NA,3,1))
test
## A B
## 1 1 NA
## 2 3 3
## 3 4 1
test2 <- na.omit(test)
test2
## A B
## 2 3 3
## 3 4 1
Choose one column of continuous numeric data that interests you as your response variable. Choose a DIFFERENT dataset to your friends.
Note, putting the code above into an R code chunk will allow you to load the data, but you may have to retype the quote-marks if you get an error.
NEW TUTORIALS HAVE APPEARED - Tutorial 6 (summary stats), Tutorial 7 (plots), Tutorial 8 (distributions)
Using the tutorials and teaching notes on Canvas, fill in the report template to the end of EDA (end of step 2b in the teaching notes), making as professional plots/analyses as you can.
SEE THE PDF FROM CANVAS WITH MORE DETAIL ABOUT WHAT TO DO: https://psu.instructure.com/courses/2174925/files/132549205
As you are choosing your own dataset and your own response variable, we do not have the worked answers. So here is how we are grading things:
29-30: Just exceptional. It’s clear from your text and code
25-28: You did most of this, but your R plots might have been less professional (say auto column labels), or you didn’t do a spell check (IT’S NEXT TO THE KNIT BUTTON), or it wasn’t fully clear from your write-up what the study design or sample distribution was.
Below 25 gets less for big things missing or say, lots of code but no text explaining the output
Note, this is not a writing class and we are not grading you on your grammar or English literacy, especially knowing that English is not all of your first language. We are grading you on being able to clearly communicate the suitability of your dataset for a model analysis at an undergrad student level - even if you don’t use jargon words to do it. Being able to do regression analysis also means being able to communicate your results to a non-expert, so you do need to do more than just the code/maths. If you are worried, you are welcome to send me a message on teams/canvas and we can chat.
Remember that an A is 94%, so you can ignore this section and still easily get an A. But here is your time to shine. Also, if you are struggling in another part of the lab, you can use this to gain back points.
To get the final 4 marks in the lab, you need to show me something new, e.g. you need to go above and beyond the lab questions in some way.
Here are some ideas:
Remember to save your work throughout and to spell check your writing (next to the save button).
Now, press the knit button for the final time.
If you have not made any mistakes in the code then R should create a html file in your lab 2 folder which includes your answers. If you look at your lab 1 folder, you should see this there - complete with a very recent time-stamp.
In that folder, double click on the html file. This will open it in your browser. CHECK THAT THIS IS WHAT YOU WANT TO SUBMIT.
If you are on R studio cloud, see Tutorial 1 for how to download your files
Now go to Canvas and submit BOTH your html and your .Rmd file in Lab 2.
See the table below for what this means - 100% is hard to get!
HTML FILE SUBMISSION - 8 marks
RMD CODE SUBMISSION - 8 marks
WRITING/CODE STYLE - 10 MARKS
Your code and document is neat and easy to read. LOOK AT YOUR HTML FILE IN YOUR WEB-BROWSER BEFORE YOU SUBMIT. There is also a spell check next to the save button.
You have written your answers below the relevant code chunk in full sentences in a way that is easy to find and grade. For example, you have written in full sentences, it is clear what your answers are referring to. You have used units and explained your workings.
MARKDOWN SHOWCASE: 10 MARKS
You use full sentences and units, You have great Markdown formatting
R-CODE SHOWCASE: 30 MARKS
You have managed to successfully complete all the code challenges
EDA: 30 MARKS
See above for ideas on grading.
Above and beyond: 4 MARKS
See above for ideas on grading
[100 marks total]
Overall, here is what your lab should correspond to:
Grade | % Mark | Rubric |
---|---|---|
A* | 98-100 | Exceptional. Not only was it near perfect, but the graders learned something. THIS IS HARD TO GET. |
NA | 96+ | You went above and beyond |
A | 94+: | Everything asked for with high quality. Class example |
A- | 90+ | The odd minor mistake, All code done but not written up in full sentences etc. A little less care |
B+ | 87+ | More minor mistakes. Things like missing units, getting the odd question wrong, no workings shown |
B | 84+ | Solid work but the odd larger mistake or missing answer. Completely misinterpreted something, that type of thing |
B- | 80+ | Starting to miss entire/questions sections, or multiple larger mistakes. Still a solid attempt. |
C+ | 77+ | You made a good effort and did some things well, but there were a lot of problems. (e.g. you wrote up the text well, but messed up the code) |
C | 70+ | It’s clear you tried and learned something. Just attending labs will get you this much as we can help you get to this stage |
D | 60+ | You attempt the lab and submit something. Not clear you put in much effort or you had real issues |
F | 0+ | Didn’t submit, or incredibly limited attempt. |