MATH1005 Wednesady 11am Semester 2

Assessment Dates

Rough assessment dates have been provided below. These have been added purely to help you have an idea about what is coming up. These assessment dates should not be taken as fact. It is on the onus of students to check Canvas, EdStem and the Unit of Study outline.

Extra Help/Staff Contact

Any extra help about course material should be asked on EdStem. Students are free to email with personal issues, but questions about content will be redirected to EdStem.

Calendar

Week Slides Class Notes Misc. Further Learning Assessments
Week 1
(Aug 2)
Introduction Lab sheet from class Britannica Simpson’s Paradox Article
R Markdown Cheat Sheet
RQuiz1: Design of Experiments (Aug 6)
Week 2
(Aug 9)
Lab 2 Lab sheet from class
Please read clarification 3
Please see below for notes and clarifications from todays lab. Article on how to pick the right chart type
Interpreting skewness from a boxplot
RQuiz2: Data & Graphical Summaries (Aug 13)
Week 3
(Aug 16)
Lab 3
Code for graph in slides
Lab sheet from class RQuiz3: Numerical Summaries (Aug 20)
Week 4
(Aug 23)
Lab 4 Lab sheet from class Download the lab sheet for some tips on how to approach manual calculations for the normal distribution. RQuiz4: Normal Model (Aug 27)
Week 5
(Aug 30)
No slides this week. Please scroll below for week 5 notes Lab sheet from class RQuiz5: Linear Model (Sep 3)
Census Date
(Aug 31)
Last day to drop a unit without incurring financial or academic penalty.
Week 6
(Sep 6)
Lab sheet from class Can computer’s generate random numbers? RQuiz6: Understanding Chance (Sep 10)
Week 7
(Sep 13)
Lab sheet from class
Box model question 1.1
See Week 7 Notes! First Group Project Peer Review (Sep 14)
RQuiz7: Chance Variability (Sep 17)
Week 8
(Sep 20)
See Lab8Solution.html on Canvas Second Group Project Peer Review (Sep 22)
RQuiz8: Normal Approximation (Sep 24)
Assessment
(Sep 22)
Group Project 1 Due
Mid Semester Break
(Sep 25-29)
No class.
Week 9
(Oct 4)
See Lab9Solution.html on Canvas RQuiz9: Sample Survey + Bias (Oct 8)
Week 10
(Oct 11)
z-test question
Uber question
RQuiz10: z-test (October 15)
Week 11
(Oct 18)
Caffeine Question
Vitamin-C Question
See Lab11Solution.html on Canvas
How to include figure captions in RMarkdown RQuiz11: t-test (October 22)
Assessment
(Oct 20)
Individual Project 2 Due
Week 12
(Oct 25)
Week 13
(Nov 1)
Last week of classes.
STUVAC
(Nov 6-10)
Study vacation.
Exam Period
(Nov 13-25)
Exam data to be released by the University.

Week 2: Notes/Clarifications


Clarification 1

In the week 2 slides, on slide 10, the “qualitative” and “quantitative” labels in the flow chart were in the wrong location. The slides have been re-uploaded with the “qualtiative” and “quantitative” labels in the correct position.

Clarification 2

In the tutorial, I mentioned that we would soon be learning how to use ggplot to create our plots. This is actually not the case with MATH1005. If you are enjoying R and think that data science is something you would like to pursue, I would definetly recommend checking out ggplot!

Clarification 3

I was asked in class what las=2 inside the barplot code does. This changes the x-axis category names to be vertical.

I incorrectly said in class that this removes the column without a category name. This is incorrect! The reason why there apeared to be a missing title was that the string “Wednesday” was too long to be shown horizontally.

Notes - Complex Boxplot

In class, we didn’t quite get enough time to finish some of the harder plotting questions. One such question was regarding investigating the pattern between “age” and “crash type”. Here we are considering one quantiative variable (age), and one qualitative variable (crash type). This means that boxplots would be a good choice.

The first thing we do is select the “age” data: age = road$Age

Now, if we were to run class(age), we would see that the age list is of type character (which implies it is currently a qualitative variable). We actually want to change it to a numeric variable, and we can do that by running, ageN = as.numeric(age). We now have a variable called ageN which stores the ages as a numeric type.

We also need to extract the crash type, which we can do by: crash_type = road$Crash_Type

Now we can create the boxplot using the following code: boxplot(ageN ~ crash_type, horizontal=T, col=c("light blue", "light green", "light pink"), main = "Age distribution by crash type")

How does this work? Well, the main thing is that we need to tell R how we want the boxplots to be formed. ageN ~ crash_type informs R that we want “ageN” to be the value that we are finding the distribution of, and we want to seperate the ages by crash_type. The other parameters should be fairly straight forward to understand - if they’re not, change them and see what happens!

To visualise what this plot looks like you can check out 2.0.2 of the “Lab Sheet From Class” in the “Class Notes” section of week 2.

There is one more boxplot in the explore section, but try this out youself! We’ll go through it next tutorial.


Week 5: Notes


For the first part of the tutorial, we went through the group project. Here, I wanted to provide some broad thoughts about how to go about sourcing addition research in scientific reports.

Whenever we write something in a scientific report that did not come from us, it’s important that we also include a source to add some weight behind what we have just claimed. Citing is really important, because it shows that we didn’t simply make up what we claimed.

Here is a short excerpt (that has been slightly modified) from a report I wrote for one of my university classes:

Despite concerns that in industry, women only account for 25% of computer-science related jobs (Daley, 2021), this does not hold among the people which responded to the DATA2x02 survey according to the Chi-squared goodness of fit test. In fact, 36% of DATA2x02 students identify as female, which is much larger than 25%.

Now, I’m not claiming that my above excerpt is a masterpiece, but I do think it does a good job of showing how we can intertwine additional research into our reports. In the above excerpt, I am making a claim that in “industry, womeon only account for 25% of computer-science related jobs.” But I am no expert in this field, and so I have to share where I got this “25%” value from.

I do this by including what is called an in-text citation, evident where in brackets I write “Daley, 2021”. This comes straight after I mention this “25%” number which I gathered from a source online. I include the citation to indicate that this figure is not my own observation, but from somone else (who would know much more about this than me).

In the reference list later in my report, I would have a full citation for Daley, written as:

Daley, S. (2021) Women in Tech Statistics Show the Industry Has a Long Way to Go. https://builtin.com/women-tech/women-in-tech-workplace-statistics

In a nut shell, whenever we are making a claim that does not come from our own research, we have to include a reference to where we found that claim.

There are many different referencing styles that you can use, but in this course we require APA citations. You can find more information about how to write APA citations and in-text citations here:


Week 7: Notes


In the original lab sheet, we were told to use the multicon package to work out the population standard deviation. However, this package has become outdated, and is not suported in the newest versions of R (the versions we are using). Hence, we need to find some other way to work out the population sd. Here are some methods:

Method 1: Using the rafalib Library

I did not previously know that this option existed until we used it in another class that I teach. The rafalib library allows us to calculate the population standard deviation directly (it is a very easy method)!

Before using this method, you first have to install the rafalib package. To do this, type the following into the console, and then press enter:

install.packages("rafalib")

Notice here the use of quotation marks around rafalib when installing the package.

Now, to load in rafalib, in an R-chunk towards the top of your R-Markdown document, type the following to load in the library:

library(rafalib)

Notice that this time we don’t include the quotation marks around rafalib.

To find the population sd of a list/vector/column of data, we type:

popsd(variable)

This is the method that I used in the “lab sheet from class” in the week 7 class notes section.

When we have access to R, I would definetly recommend method 1 as the best of the three methods!

Method 2: Working out the Population SD by Hand

This is probably the most tiring of the option, but you could work out the population sd by using the formula (see this article - the population sd part).

From the example from class where we have a list of data with the numbers 0,0,0,1, the following would yield the population sd:

sqrt( ( (1-0.25)^2+ (0-0.25)^2+ (0-0.25)^2+ (0-0.25)^2 ) / 4 )

Method 3: Multipy the Sample SD by a Factor

Another solution that you could use is to multiple the sample standard deviation (which is built into R) by the factor sqrt( (n-1) /n ), where n is the number of elements in our population.

Using the example from class with the population 0,0,0,1, in R, we can find the population sd by doing:

sqrt((4-1)/4) * sd(population)

Here, we use n = 4 as we have four elements in our population.