The art of statistics

Modern statistics

  • motivate by problem solving
  • start with visualization and exploring data
  • focus on what can be reasonably learned from data, biases in data, concluding causation
  • models and algorithms
  • assessing uncertainty through re-sampling data (boostrap)
  • probability theory as neat way of turning random variation into uncertainty about what is true
  • hypothesis testing and its potential problems
  • bayesian methods
  • PPDAC: problem -> plan -> data -> analysis -> conclusion

What problems do I care about

  • housing price
  • employability

Looking at data: What was the pattern of Harold Shipman’s murders

  • Problem: can more detail tell us more about what Shipman did?
  • Plan: compare actual times at which his patients died with the times of deaths recorded by other local GPs
  • Data: a huge exercise requiring examination of death certificates
  • Analysis: simple plotting

Inference and bias: How many sexual partners have people in Britain had in their lifetime?

  • Problem: cannot know this as a fact
  • Plan: survey in which people are carefully asked about the sexual activity
  • Data: reports of numbers of partners
  • Analysis: plotting and summary statistics

Regression, prediction and algorithms: Who was the luckiest person on the titanic?

The mysteries of the P-value

  • P-value: a measure of the conflict between the data and a null hypothesis of no effect
  • Specifically, P = probability of getting such an extreme result, were the null hypothesis true
  • Not the probability of the null hypothesis
  • Traditional threshold of 5% to declare statistically significant
  • no significant does not mean no effect
  • if many tests or crucial decision, use more stringent threshold

Quick analogy

  • H0: The defendant is innocent
  • Evidence (data): test results, testimony
  • p-value: how likely you’d see that evidence if the defendant were actually innocent A very low p-value is like very strong evidence against the defendant being innocent -> convict the defendant (reject H0) If p=0.01 If the defendant is truly innocent, there is only 1% chance of seeing evidence this strong. The evidence is very unlikely under the assumption of innocence, which leads you to reject the assumption of innocence.

Another analogy - Explain to me p-value like I’m 5

  • H0: I didn’t eat all the cookies
  • T (data): The cookie crumble over their shirt, chocolate over their hands
  • Question: If they really didn’t eat the cookies, how likely you will see these evidences? p=0.01 => 1% you will see these => extremely rare
  • p = 0.01 < 0.05 => reject H0 null hypothesis you don’t believe them
  • p > 0.05 => can’t reject H0 which mean you can’t tell if the defendant didn’t eat all the cookies
Jasmine Nguyen