DATA1001 Semester 1

Extra Resources

R Nuggest Videos

Articles

Blog Post: A Quick Guide to Counting (Permutations)

Interactive Websites

Note: these are additional resources. You should stick to official course resources for correctness.

Hypothesis Tests Visualised

Lab 9 Tutorial Flowchart

Lab 10 Solutions

1 Sample z and t Test Question Solutions

Lab 11 Solutions

Lab 4.3 Challenge 2c Tutor Exercise Solutions

Short Explanation

We know from the empirical rule that:

At x = 7 (−1 SD), 16% of the data is to the left
At x = 10 (the mean), 50% of the data is to the left
So 34% of the data sits between 7 and 10

We want the 30th percentile — the value with 30% of the data to its left.

Start with a simple guess. If the data were spread evenly (uniformly) between 7 and 10, the midpoint x = 8.5 would split that 34% in half — giving 17% on each side. That would put 16% + 17% = 33% to the left of 8.5.

Now correct for the curve shape. The normal distribution isn’t uniform — it’s taller near the mean (10) and shorter near −1 SD (7). This means more of the 34% is packed into the right half (8.5 to 10) and less is in the left half (7 to 8.5). So the left half actually holds less than 17%, which brings our 33% estimate down to approximately 30%.

The 30th percentile is approximately x ≈ 8.5.

Detailed Explanation

What are we looking for?

The 30th percentile is the value where 30% of the data falls to its left. We need to estimate where this value sits using only the 68-95-99.7% rule.

Step 1: Use the empirical rule to narrow it down

From the empirical rule and symmetry, we know:

At x = 7 (one standard deviation below the mean), the cumulative area is 16%
At x = 10 (the mean), the cumulative area is 50%

Since 30% falls between 16% and 50%, the 30th percentile must be somewhere between x = 7 and x = 10.

We also know that the total area between 7 and 10 is 50% − 16% = 34%.

Step 2: Make a simple first estimate using a uniform assumption

To get a starting estimate, let’s temporarily pretend the data is spread evenly (uniformly) between 7 and 10. Under this assumption, the midpoint x = 8.5 would split the 34% exactly in half:

17% between x = 7 and x = 8.5
17% between x = 8.5 and x = 10

Adding up from the left: 16% + 17% = 33% to the left of x = 8.5.

That’s close to 30%, but slightly too high. So our uniform estimate overshoots by about 3%.

Step 3: Correct for the shape of the normal curve

Why did we overshoot? Because the normal distribution is not uniform between 7 and 10. The curve is:

Shorter near x = 7 (further from the mean, so less data per unit width)
Taller near x = 10 (closer to the mean, so more data per unit width)

This means the 34% is not split equally at the midpoint. In reality:

The left half (7 to 8.5) contains less than 17% — because the curve is shorter here
The right half (8.5 to 10) contains more than 17% — because the curve is taller here

So when we assumed 17% was in the left half, we overestimated. The true cumulative area at x = 8.5 is less than 33%. So we can just decrease it slightly to 30%. Remember, we are just after an estimate.

The 30th percentile of N(10, 3²) is approximately x ≈ 8.5.

Lab 4.3 Challenge 2b Olympic Swimmers Exercise Solutions

Short Explanation

We know from the empirical rule that at x = 189.8 (+1 SD above the mean), 84% of the data is to the left. That means 16% of men are taller than 189.8cm.

But we want 189cm, not 189.8cm. Since 189cm is slightly below +1 SD, there will be slightly more than 16% of men above it.

How much more? The region between the mean (179.8) and +1 SD (189.8) contains 34% of the data spread over 10cm. If we assume this is spread uniformly, each cm accounts for roughly 3.4%. Since 189cm is 0.8cm below 189.8, that adds back about 0.8 × 3.4% ≈ 2.7%.

Now correct for the curve shape. The normal curve is taller near the mean and shorter near +1 SD. So near 189.8, each cm actually holds less data than the uniform 3.4% estimate. This brings our adjustment down slightly from 2.7% to roughly 2%.

Our estimate: approximately 16% + 2% = 18% of Australian men are 189cm or taller.

Detailed Explanation

What are we looking for?

We want to estimate the proportion of Australian men whose height is 189cm or greater, given that heights follow a normal distribution with mean 179.8cm and standard deviation 10cm.

Step 1: Locate 189cm relative to the SD boundaries

The key SD boundaries are:

Mean = 179.8cm
+1 SD = 179.8 + 10 = 189.8cm
+2 SD = 179.8 + 20 = 199.8cm

So 189cm sits just below +1 SD (it is 0.8cm short of 189.8).

Step 2: Start with what we know at +1 SD

From the empirical rule, 68% of the data falls within ±1 SD of the mean. By symmetry, 34% is between the mean and +1 SD. This means:

Area to the left of +1 SD (189.8cm) = 50% + 34% = 84%
Area to the right of +1 SD (189.8cm) = 16%

So 16% of Australian men are taller than 189.8cm. But we want the area to the right of 189cm, which is slightly further left on the curve — so the answer will be a bit more than 16%.

Step 3: Estimate the extra area between 189cm and 189.8cm

The interval from the mean (179.8) to +1 SD (189.8) is 10cm wide and contains 34% of the data. Under a uniform assumption, each cm in this interval holds about:

34% ÷ 10 = 3.4% per cm

The gap between 189cm and 189.8cm is 0.8cm, so the uniform estimate gives:

0.8 × 3.4% ≈ 2.7%

Step 4: Correct for the curve shape

The normal curve is not uniform between the mean and +1 SD. It is:

Taller near the mean (179.8cm) — so more data per cm there
Shorter near +1 SD (189.8cm) — so less data per cm there

Since 189cm is close to +1 SD where the curve is shorter, each cm actually holds less than 3.4%. So our uniform estimate of 2.7% is a slight overestimate. Adjusting down, the extra area is approximately 2%.

16% + 2% = approximately 18% of Australian men are 189cm or taller.

Credit: The lab 4.3 explanations were generated using Claude guided by me.