3 Inferences

3.1 General theory

Cited from [1, Chapter 4].

3.1.1 Sampling

Consider a random variable \(X\) with an unknown distribution. Our information about the distribution of \(X\) comes from a sample on \(X\): \(\qty{X_1,\ldots,X_n}\).

The sample ovservations \(\qty{X_1,\ldots,X_n}\) have the same distribution as \(X\).
\(n\) denotes the sample size.
When the sample is actually drawn, we use \(x_1,\ldots,x_n\) as the realizations of the sample.

Definition 3.1 (Random sample) If the random variables \(X_1,\ldots, X_n\) are iid, then these random variable constitute a random sample of size \(n\) from the common distribution.

Definition 3.2 (Statistics) Let \(X_1,\ldots,X_n\) denote a sample on a random variable \(X\). Let \(T=T(X_1,\ldots,X_n)\) be a function of the sample. \(T\) is called a statistic. Once a sample is drawn, \(t=T(x_1,\ldots,x_n)\) is called the realization of \(T\).

Definition 3.3 (Sampling distribution)

The distribution of \(T\) is called the sampling distribution.
The standard deviation of the sampling distribution is called the standard error of estimate.

Theorem 3.1 (The Central Limit Theorem) For large sample sizes, the mean \(\bar{y}\) of a sample from a population with mean \(\mu\) and a standard deviation \(\sigma\) has a sampling distribution that is approximately normal.

3.1.2 Point estimation

Assume that the distribution of \(X\) is known down to an unknown parameter \(\theta\) where \(\theta\) can be a vector. Then the pdf of \(X\) can be written as \(f(x;\theta)\). In this case we might find some statistic \(T\) to estimate \(\theta\). This is called a point estimator of \(\theta\). A realization \(t\) is called an estimate of \(\theta\).

Definition 3.4 (Unbiasedness) Let \(X_1,\ldots,X_n\) is a sample on a random varaible \(X\) with pdf \(f(x;\theta)\). Let \(T\) be a statistic. We say that \(T\) is an unbiased estimator of \(\theta\) if \(E(T)=\theta\).

Let \(X\) be a random variable, with mean \(\mu\) and variance \(\sigma^2\). Consider a sample \(\set{X_i}\) of size \(n\). By definition all \(X_i\)’s are iid. Therefore \(\Exp\qty(X_i)=\mu\), and \(\Var\qty(X_i)=\sigma^2\) for any \(i=1,\ldots, N\).

Consider the following statistics:

\(\bar{\mu}=\dfrac1N\sum_{i=1}^NX_i\),
\(\bar{\sigma}^2=\dfrac{1}{N-1}\sum_{i=1}^N(X_i-\bar{\mu})^2\).

Lemma 3.1

\(\Exp(\bar{\mu})=\mu\).
\(\Exp(\bar{\sigma}^2)=\sigma^2\).

Proof

\[ \begin{aligned} \Exp\qty(\bar{\mu})&=\Exp\qty(\frac1N\sum_{i=1}^NX_i)=\frac1N\sum_{i=1}^N\Exp\qty(X_i)=\frac1N\sum_{i=1}^N\mu=\mu,\\ \Exp\qty(\bar{\sigma}^2)&=\frac{1}{N-1}\Exp\qty[\sum_{i=1}^N(X_i-\bar{\mu})^2]=\frac{1}{N-1}\sum_{i=1}^N\Exp\mqty[\qty(X_i-\bar{\mu})^2]\\ &=\frac{1}{N-1}\sum_{i=1}^N\qty(\Var\qty(X_i-\bar{\mu})+\qty(\Exp\qty(X_i-\bar{\mu}))^2)\\ &=\frac{1}{N-1}\sum_{i=1}^N\qty(\Var\qty(\frac{N-1}{N}X_i-\frac1NX_1-\ldots-\frac1NX_N)+\qty(\Exp\qty(X_i)-\Exp\qty(\bar{\mu}))^2)\\ &=\frac{1}{N-1}\sum_{i=1}^N\qty(\frac{(N-1)^2}{N^2}\Var\qty(X_i)+\frac1{N^2}\Var\qty(X_1)+\ldots+\frac1{N^2}\Var\qty(X_N))\\ &=\frac{1}{N-1}\sum_{i=1}^N\qty(\frac{(N-1)^2}{N^2}\sigma^2+\frac1{N^2}\sigma^2+\ldots+\frac1{N^2}\sigma^2)\\ &=\frac{N}{N-1}\frac{(N-1)^2+N-1}{N^2}\sigma^2=\sigma^2. \end{aligned} \]

Definition 3.5 The following are the unbiased estimators of \(\mu\) and \(\sigma^2\) of \(X\).

\(\bar{\mu}=\dfrac1N\sum_{i=1}^NX_i\) is called the sample mean of the samples.
\(\bar{\sigma}^2=\dfrac{1}{N-1}\sum_{i=1}^N(X_i-\bar{\mu})^2\) is called the sample variance of the samples.

Caution

Please pay attention to the denominator of the sample variance. The \(N-1\) is due to the degree of freedom: all \(X_i\)’s and \(\bar{\mu}\) are not independent to each other.

3.1.3 Confidence intervals

Definition 3.6 (Confidence interval) Consider a sample of \(X\). Fix a number \(0<\alpha<1\). Let \(L\) and \(U\) be two statistics. We say the interval \((L,U)\) is a \((1-\alpha)100\%\) confidence interval for \(\theta\) if

\[ 1-\alpha=\Pr[\theta\in(L,U)]. \]

Theorem 3.2 (Large-Sample \(100(1-\alpha)\%\) Confidence interval) \[ L,U=\bar{\mu}\pm z_{\alpha/2}\qty(\frac{\bar{\sigma}}{\sqrt{n}}), \] where \(z_{\alpha/2}=1.96\) if \(\alpha=5\%\).

For any \(n\), if \(X_i\sim \mathcal N(\mu, \sigma^2)\), \(T_n=\dfrac{\bar{X}-\mu}{S/\sqrt{n}}\) has a Student’s \(t\)-distribution of degree of freedom \(n-1\).
When \(n\) is big enough, for any distribution \(X_i\), \(Z_n=\dfrac{\bar{X}-\mu}{S/\sqrt{n}}\) is approximately \(\mathcal N(0,1)\).
Student’s \(t\)-distribution of degree of freedom \(n-1\) is approaching \(\mathcal N(0,1)\) when \(n\) is increasing. When \(n=30\) they are very close to each other. Therefore in many cases Statisticians require sample size \(\geq30\).
For large sample or small sample, the coefficients to compute confidence intervals are \(z_{\alpha/2}\) or \(t_{\alpha/2}\). These two numbers come from normal distribution or Student’s \(t\)-distribution.

3.2 Hypothesis test

Elements of a Statistical Test of Hypothesis

Null Hypothesis \(𝐻_0\)
Alternative Hypothesis \(𝐻_𝑎\)
Test Statistic
Level of significance \(\alpha\)
Rejection Region
\(𝑃\)-Value
Conclusion