3 Inferences
\[ \require{physics} \require{braket} \]
\[ \newcommand{\dl}[1]{{\hspace{#1mu}\mathrm d}} \newcommand{\me}{{\mathrm e}} \]
$$
$$
\[ \newcommand{\pdfbinom}{{\tt binom}} \newcommand{\pdfbeta}{{\tt beta}} \newcommand{\pdfpois}{{\tt poisson}} \newcommand{\pdfgamma}{{\tt gamma}} \newcommand{\pdfnormal}{{\tt norm}} \newcommand{\pdfexp}{{\tt expon}} \]
\[ \newcommand{\distbinom}{\operatorname{B}} \newcommand{\distbeta}{\operatorname{Beta}} \newcommand{\distgamma}{\operatorname{Gamma}} \newcommand{\distexp}{\operatorname{Exp}} \newcommand{\distpois}{\operatorname{Poisson}} \newcommand{\distnormal}{\operatorname{\mathcal N}} \]
3.1 Inferential statistics
Definition 3.1 (Population and sample [1])
- A population data set is a collection (or set) of data measured on all experimental units of interest to you.
- A sample is a subset of data selected from a population.
- A random sample of \(n\) experimental units is one selected from the population in such a way that every different sample of size \(n\) has an equal probability of selection.
Definition 3.2 (Statistical inference [1])
- A statistical inference is an estimate, prediction, or some other generatlization about a population based on information contianed in a sample.
- A measure of reliability is a statement about the degree of uncertainty associated with a statistical inference.
- Identify population
- Identify variable(s)
- Collect sample data
- Inference about population based on sample
- Measure of reliability for inference
3.2 Estimators
This section is based on [2, Chapter 4].
3.2.1 Sampling
Consider a random variable \(X\) with an unknown distribution. Our information about the distribution of \(X\) comes from a sample on \(X\): \(\qty{X_1,\ldots,X_n}\).
- The sample ovservations \(\qty{X_1,\ldots,X_n}\) have the same distribution as \(X\).
- \(n\) denotes the sample size.
- When the sample is actually drawn, we use \(x_1,\ldots,x_n\) as the realizations of the sample.
Definition 3.3 (Random sample) If the random variables \(X_1,\ldots, X_n\) are i.i.d, then these random variable constitute a random sample of size \(n\) from the common distribution.
Definition 3.4 (Statistics) Let \(X_1,\ldots,X_n\) denote a sample on a random variable \(X\). Let \(T=T(X_1,\ldots,X_n)\) be a function of the sample. \(T\) is called a statistic. Once a sample is drawn, \(t=T(x_1,\ldots,x_n)\) is called the realization of \(T\).
Definition 3.5 (Sampling distribution)
- The distribution of \(T\) is called the sampling distribution.
- The standard deviation of the sampling distribution is called the standard error of estimate.
Theorem 3.1 (The Central Limit Theorem) For large sample sizes, the sample mean \(\bar{X}\) from a population with mean \(\mu\) and a standard deviation \(\sigma\) has a sampling distribution that is approximately normal, regardless of the probability distribution of the sampled population.
3.2.2 Point estimation
Assume that the distribution of \(X\) is known down to an unknown parameter \(\theta\) where \(\theta\) can be a vector. Then the pdf of \(X\) can be written as \(f(x;\theta)\). In this case we might find some statistic \(T\) to estimate \(\theta\). This is called a point estimator of \(\theta\). A realization \(t\) is called an estimate of \(\theta\).
Definition 3.6 (Unbiasedness) Let \(X_1,\ldots,X_n\) is a sample on a random varaible \(X\) with pdf \(f(x;\theta)\). Let \(T\) be a statistic. We say that \(T\) is an unbiased estimator of \(\theta\) if \(E(T)=\theta\).
The typical example of estimators are sample mean and sample variance. Let \(X\) be a random variable, with mean \(\mu\) and variance \(\sigma^2\). Consider a sample \(\set{X_i}\) of size \(n\). By definition all \(X_i\)’s are i.i.d. Therefore \(\operatorname{E}\qty(X_i)=\mu\), and \(\operatorname{Var}\qty(X_i)=\sigma^2\) for any \(i=1,\ldots, n\).
Theorem 3.2 The following are the unbiased estimators of \(\mu\) and \(\sigma^2\) of \(X\).
- \(\bar{X}=\dfrac1n\sum_{i=1}^nX_i\) is called the sample mean of the samples.
- \(s^2=\dfrac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2\) is called the sample variance of the samples.
They are the unbiased estimators of \(\mu\) and \(\sigma^2\), respectively. \[ \operatorname{E}(\bar X)=\mu,\quad \operatorname{E}(s^2)=\sigma^2. \]
Click for proof.
\[ \begin{aligned} \operatorname{E}\qty(\bar{X})&=\operatorname{E}\qty(\frac1n\sum_{i=1}^nX_i)=\frac1n\sum_{i=1}^n\operatorname{E}\qty(X_i)=\frac1n\sum_{i=1}^n\mu=\mu,\\ \operatorname{E}\qty(s^2)&=\frac{1}{n-1}\operatorname{E}\qty[\sum_{i=1}^n(X_i-\bar{X})^2]=\frac{1}{n-1}\sum_{i=1}^n\operatorname{E}\mqty[\qty(X_i-\bar{X})^2]\\ &=\frac{1}{n-1}\sum_{i=1}^n\qty(\operatorname{Var}\qty(X_i-\bar{X})+\qty(\operatorname{E}\qty(X_i-\bar{X}))^2)\\ &=\frac{1}{n-1}\sum_{i=1}^n\qty(\operatorname{Var}\qty(\frac{n-1}{n}X_i-\frac1nX_1-\ldots-\frac1nX_n)+\qty(\operatorname{E}\qty(X_i)-\operatorname{E}\qty(\bar{X}))^2)\\ &=\frac{1}{n-1}\sum_{i=1}^n\qty(\frac{(n-1)^2}{n^2}\operatorname{Var}\qty(X_i)+\frac1{n^2}\operatorname{Var}\qty(X_1)+\ldots+\frac1{n^2}\operatorname{Var}\qty(X_n))\\ &=\frac{1}{n-1}\sum_{i=1}^n\qty(\frac{(n-1)^2}{n^2}\sigma^2+\frac1{n^2}\sigma^2+\ldots+\frac1{n^2}\sigma^2)\\ &=\frac{n}{n-1}\frac{(n-1)^2+n-1}{n^2}\sigma^2=\sigma^2. \end{aligned} \]
Please pay attention to the denominator of the sample variance. The \(n-1\) is due to the degree of freedom: all \(X_i\)’s and \(\bar{X}\) are not independent to each other.
3.2.3 Confidence intervals
Definition 3.7 (Confidence interval) Consider a sample of \(X\). Fix a number \(0<\alpha<1\). Let \(L\) and \(U\) be two statistics. We say the interval \((L,U)\) is a \((1-\alpha)100\%\) confidence interval for \(\theta\) if
\[ 1-\alpha=\Pr[\theta\in(L,U)]. \]
Theorem 3.3 (Large-Sample \(100(1-\alpha)\%\) Confidence interval) \[ L,U=\bar{X}\pm z_{\alpha/2}\qty(\frac{s}{\sqrt{n}}), \] where \(z_{\alpha/2}=1.96\) if \(\alpha=5\%\).
- For any \(n\), if \(X_i\sim \mathcal N(\mu, \sigma^2)\), \(T_n=\dfrac{\bar{X}-\mu}{S/\sqrt{n}}\) has a Student’s \(t\)-distribution of degree of freedom \(n-1\).
- When \(n\) is big enough, for any distribution \(X_i\), \(Z_n=\dfrac{\bar{X}-\mu}{S/\sqrt{n}}\) is approximately \(\mathcal N(0,1)\).
- Student’s \(t\)-distribution of degree of freedom \(n-1\) is approaching \(\mathcal N(0,1)\) when \(n\) is increasing. When \(n=30\) they are very close to each other. Therefore in many cases Statisticians require sample size \(\geq30\).
- For large sample or small sample, the coefficients to compute confidence intervals are \(z_{\alpha/2}\) or \(t_{\alpha/2}\). These two numbers come from normal distribution or Student’s \(t\)-distribution.
3.3 Hypothesis test
Elements of a Statistical Test of Hypothesis
- Null Hypothesis \(𝐻_0\)
- Alternative Hypothesis \(𝐻_𝑎\)
- Test Statistic
- Level of significance \(\alpha\)
- Rejection Region
- \(𝑃\)-Value
- Conclusion
Given some data, we would like to know whether these data are “exotic” enough–under the assumption that the null hypothesis \(H_0\) is true–to justify rejecting \(H_0\). In other words, we compute the probability of obtaining a test statistic at least as extreme as the observed value, assuming \(H_0\) is true. This probability is called the p-value. Once the p-value is smaller than the chosen level of significance \(\alpha\), we reject \(H_0\).
Consider a test statistic \(T(X)\) and we observe \(T(X)=t_{\text{obs}}\). Then
\[ p\text{-value}=\Pr(T(X)\geq t_{\text{obs}}\mid H_0 \text{ is true}) \] is the p-value for a right-tailed test. The key idea in constructing a hypohesis test is to choose a test statistic and a rejection region so that
\[ \Pr(\text{the statistic falls in the rejection region}\mid H_0\text{ is true})=\alpha. \] This ensures the test has expected Type I error rate \(\alpha\).
- Type I error: rejecting \(H_0\) when \(H_0\) is actually true.
- Type II error: failing to reject \(H_0\) when \(H_0\) is actually false.
Hypothesis test is designed to control Type I error rate (which is to reduce false positive rate, and which is to increase precision), since the significance level \(\alpha\) is the probability of Type I errors.
\[ \alpha=\Pr(\text{reject }H_0\mid H_0\text{ is true}). \]
When using Hypothesis test, the scenario is usually that people capture some signals in order to prove an effect happens. The null hypothesis (\(H_0\)) is assumed to be the default case, and they want to make sure that once the signal is captured, the effect happens. In this case it is ok to miss some events that happens without the signal. In other words, people prioritize not making a false claim than missing an opportunity.
We could balance Type I and Type II errors by controlling \(\alpha\).
- Reduing \(\alpha\) will make the test less likely to make Type I errors but increase the likelihood of Type II errors.
- Increasing sample size reduces probability of making both types of errors, which can improve the test’s reliability.
3.3.1 t-test
A t-test is used to test a hypothesis about a population mean when the population standard deviation is unknown and the sample size is small or moderate. In our case, we consider the standard one-sample t-test: given a set of random observations, we want to determine whether the population mean of the underlying random variable is equal to 0 or not.
Assume that the given values are \(\{x_1,x_2,\ldots,x_n\}\), and the underlying random variable is \(X\sim N(\mu, \sigma^2)\). The hypotheses are
- \(H_0\): \(\mu=0\).
- \(H_a\): \(\mu\neq0\).
The values can be treated as realizations of i.i.d random variables \(X_i\)’s. We have the following statistics:
- Sample mean: \(\bar{X}=\frac1n\sum_{i=1}^nX_i\).
- Sample standard deviation: \(s=\sqrt{\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar X)^2}\)
- Sample size: \(n\).
These statistics have sampling distributions that can be described exactly under the normality assumption.
Based on the discussion in Section 2.4.2, we have that
- \(Z=\frac{\bar X-\mu}{\sigma/\sqrt n}\sim N(0,1)\)
- \(U=\frac{(n-1)s^2}{\sigma^2}\sim \chi^2_{n-1}\)
- \(Z\) and \(U\) are independent, and the degree of freedom of \(U\) is \(\nu = n-1\),
- \(t=\frac{Z}{\sqrt{U/(n-1)}}\sim t_{n-1}\).
Since the alternative hypothesis is \(\mu\neq0\), we need to consider both tails, corresponding to \(\mu>0\) and \(\mu<0\). Therefore the boundary of the rejection region is chosen so that each tail has probability \(\alpha/2\). In other words, the critical value \(t_{1-\alpha/2,n-1}\) satisfies, under \(H_0\), \[ \Pr(|t|>t_{1-\alpha/2,n-1})=\alpha. \] This critical value \(t_{1-\alpha/2,n-1}\) is usually found from a t-distribution table or via the inverse c.d.f. of the t-distribution.