Stat 200: Introduction to Statistical Inference

This website is from Autumn 2018/19.

Overview

This course is about theoretical statistics. It is aimed at Masters level students in statistics, advanced undergraduates, and it is also suitable for Doctoral students in sciences and engineering and other programs. This is the place in our MS program where you get statistics theory. Most of the other courses are applied or are organized around a topic like multivariate analysis or time series where theory will be mixed in with applications.

For our MS and undergraduate students the sequencing is

Stats 116 \(\longrightarrow\) Stats 200 \(\longrightarrow\) all the other 200 level Statistics courses
If you have not had Stats 116, you might not be able to follow the course. There will be a bit of review at the start. If this course is not enough theory for you there is also a PhD-level theory sequence, Stats 300ABC. It is best to do both. Retaking something at a much higher level deepens your understanding. It is not a waste.

The homework will require some computation. The best language for what will be asked is R. Many people who are new to R start with RStudio. It may be feasible to do the first few problem sets in Matlab or python or some other language. Spreadsheets are not suitable.


Goals

Here are our 3 primary goals:
  1. Learn principles underlying statistical methods. For instance: moments, likelihood, Bayes, sample vs population.
  2. Learn about inferential tasks. For instance: testing, estimation, confidence intervals, model selection.
  3. Learn to use models based on a few specific distributions. For instance: normal, binomial, Poisson.
The \(``\)for instance\("\) examples above are the most important ones, but others will be included so you can start to see patterns. Those goals are intermediate. The underlying goal is for you to deeply understand how statistics works, so you absorb the next courses most efficiently.

From explore courses

Modern statistical concepts and procedures derived from a mathematical framework. Statistical inference, decision theory; point and interval estimation, tests of hypotheses; Neyman-Pearson theory. Bayesian analysis; maximum likelihood, large sample theory. Prerequisite: 116.
The prerequisite is to be taken seriously. The prerequisite for stat 116 is: MATH 52 (integral calculus of several variables) and familiarity with infinite series, or equivalent. Test yourself on problem set 0

We will use a little bit of matrix algebra, such as matrix multiplication, determinants and inversion. We do not require a full course in linear algebra.


Classes

Building 200-002.   (History corner of the quad)
Tuesday, Thursday 1:30 to 2:50

Syllabus

This is our plan. There are two places that will be used for either catchup or enrichment or maybe a bit of both depending on how the timing works out. Think of them as expansion joints like in a bridge. We follow the text by Rice for the most part, in close to his order, with some omissions and some additions.
The strongly advised cadence is to read before each week starts. Then the lecture material will seem easier. It's actually less work than reading after.

Week Readings Topics Notes
Sep 25, 27 Rice Ch 4-6 Intro to statistics, review of probability, outline of course lec01 lec02
Oct 2, 4 Rice Ch 8.1-8.5 Method of moments and maximum likelihood (ML) estimation lec03 lec04
Oct 9, 11 Rice 8.6-8.8 ML theory: sufficiency, Fisher info, Cramer-Rao. Bayesian approach lec05 lec06
Oct 16, 18 Rice 9.1-9.6 Testing: Neyman-Pearson, goodness of fit, multiple comparisons lec07 lec08
Oct 23, 25 Rice 9.8, 10.1-10.4, 10.6-10.7 Graphical summaries: QQ-plots, empirical distributions, density estimates lec09 lec10
Oct 30, Nov 1 Catch up and/or enrichment + smidgen of review. Then midterm (25%) (location = Bishop Auditorium) lec11
Nov 6, 8 Rice 6, 11.1-11.3.1, 14.1-14.3 Normal theory, paired samples, simple linear regression lec12 lec13 lec_t
Nov 13, 15 Rice 13 Categorical data lec14 lec15
Nov 20, 22 \(\varnothing\) Week is off for Thanksgiving
Nov 27, 29 4.6 10.4.6, 14.6, 11.4 Delta method, plug-in, bootstrap, permutations, randomization lec16 lec17
Dec 4, 6 11.3.2, sign test (p365,461) Nonparametrics. Followups to this course. Review.
Dec 10 Final exam. Bishop Auditorium. (50%)

Potential enrichment topics include: more about Bayes, logistic regression, the reproducibility crisis, sparsity, regularization. Some of these things can come in as examples in class or homework.


Instructor

Art Owen
Sequoia Hall 130
My userid is owen on stanford.edu
Office hour: Monday 11am to noon

Or catch me just after class for an office minute. That is often very efficient.
If you want to ask about the midterm or final, please ask in class so everybody hears the same information.

TAs

Day Time TA Office email Meeting room
Tuesday 3:00-5:00 Samyak Rajanala Sequoia 241 samyak@stanford.edu Sequoia 105
Tuesday 5:00-7:00 Jaime Gimenez Sequoia 240 roquero@stanford.edu Sequoia 207
Wednesday 4:00-6:00 Zhimei Ren Sequoia 242 zren@stanford.edu Sequoia 207
Thursday 8:30-10:30 Kevin Han Sequoia 241 kevinwh@stanford.edu Sequoia 105
Thursday 3:30-5:30 Andy Tsao Sequoia 235 andytsao@stanford.edu 460 429
Friday 3:30-5:30 Yuchen Wu Sequoia 241 wuyc14@stanford.edu 380 (Math corner) 381T

Notes and texts

The main text is "Mathematical Statistics and Data Analysis", third edition (2009) by John Rice.

This is a standout book that has proved its worth over the years.
We will avoid Ch 7 on survey sampling. For a one quarter course it is difficult to segue between finite population and infinite populations when it comes to concepts and notation. See stat 204 for finite populations.

Problems (25%)

Here is a blurb about office hours for this course. Here is a guide for TAs grading this course.

The problem sets are available to students registered in the class.
I post them on canvas as they are added.
Be sure to give Axess a working email address:
I expect to send a small number of important emails about problem sets and the homework there. Most other announcements will be made in class. If you email me about the class, be sure to have stat 200 in your subject line. Otherwise, your email won't show when I search for course-related emails.
Late penalties apply:
We will count days late on each problem set. Each day late is penalized by 10% of the homework value. Homework more than 3 days late will ordinarily get 0. If you're travelling, you can email a pdf file. For sickness, interviews and other events, up to 3 late days total are forgiven at the end of the quarter. (Work late enough to get zero does not get redeemed though.)

Midterm Exam (25%)

The midterm is on Thursday November 1 in class.

The midterm is closed book and is also closed to notes, calculators and phones. You may be asked to supply short derivations or proofs, to give advice on how to handle some hypothetical data, or diagnose a problem.

Final Exam (50%)

The exam is on Monday December 10 from 12:15 to 3:15 in a room to be announced. Do not book travel that conflicts with this date. University policy is that students may not register for two classes with exams at the same time.

The exam is closed book and is also closed to notes, calculators and phones. You may be asked to supply short derivations or proofs, to give advice on how to handle some hypothetical data, or diagnose a problem. Exam questions are different from homework questions. HW questions are largely exercises to deepen your understanding. Exam questions are designed to measure your understanding.