3 Inferences
\[ \require{physics} \require{braket} \]
\[ \newcommand{\dl}[1]{{\hspace{#1mu}\mathrm d}} \newcommand{\me}{{\mathrm e}} \]
\[ \newcommand{\Exp}{\operatorname{E}} \newcommand{\Var}{\operatorname{Var}} \newcommand{\Mode}{\operatorname{mode}} \]
\[ \newcommand{\pdfbinom}{{\tt binom}} \newcommand{\pdfbeta}{{\tt beta}} \newcommand{\pdfpois}{{\tt poisson}} \newcommand{\pdfgamma}{{\tt gamma}} \newcommand{\pdfnormal}{{\tt norm}} \newcommand{\pdfexp}{{\tt expon}} \]
\[ \newcommand{\distbinom}{\operatorname{B}} \newcommand{\distbeta}{\operatorname{Beta}} \newcommand{\distgamma}{\operatorname{Gamma}} \newcommand{\distexp}{\operatorname{Exp}} \newcommand{\distpois}{\operatorname{Poisson}} \newcommand{\distnormal}{\operatorname{\mathcal N}} \]
3.1 General theory
Cited from [1, Chapter 4].
3.1.1 Sampling
Consider a random variable \(X\) with an unknown distribution. Our information about the distribution of \(X\) comes from a sample on \(X\): \(\qty{X_1,\ldots,X_n}\).
- The sample ovservations \(\qty{X_1,\ldots,X_n}\) have the same distribution as \(X\).
- \(n\) denotes the sample size.
- When the sample is actually drawn, we use \(x_1,\ldots,x_n\) as the realizations of the sample.
Definition 3.1 (Random sample) If the random variables \(X_1,\ldots, X_n\) are iid, then these random variable constitute a random sample of size \(n\) from the common distribution.
Definition 3.2 (Statistics) Let \(X_1,\ldots,X_n\) denote a sample on a random variable \(X\). Let \(T=T(X_1,\ldots,X_n)\) be a function of the sample. \(T\) is called a statistic. Once a sample is drawn, \(t=T(x_1,\ldots,x_n)\) is called the realization of \(T\).
Definition 3.3 (Sampling distribution)
- The distribution of \(T\) is called the sampling distribution.
- The standard deviation of the sampling distribution is called the standard error of estimate.
Theorem 3.1 (The Central Limit Theorem) For large sample sizes, the mean \(\bar{y}\) of a sample from a population with mean \(\mu\) and a standard deviation \(\sigma\) has a sampling distribution that is approximately normal.
3.1.2 Point estimation
Assume that the distribution of \(X\) is known down to an unknown parameter \(\theta\) where \(\theta\) can be a vector. Then the pdf of \(X\) can be written as \(f(x;\theta)\). In this case we might find some statistic \(T\) to estimate \(\theta\). This is called a point estimator of \(\theta\). A realization \(t\) is called an estimate of \(\theta\).
Definition 3.4 (Unbiasedness) Let \(X_1,\ldots,X_n\) is a sample on a random varaible \(X\) with pdf \(f(x;\theta)\). Let \(T\) be a statistic. We say that \(T\) is an unbiased estimator of \(\theta\) if \(E(T)=\theta\).
Let \(X\) be a random variable, with mean \(\mu\) and variance \(\sigma^2\). Consider a sample \(\set{X_i}\) of size \(n\). By definition all \(X_i\)’s are iid. Therefore \(\Exp\qty(X_i)=\mu\), and \(\Var\qty(X_i)=\sigma^2\) for any \(i=1,\ldots, N\).
Consider the following statistics:
- \(\bar{\mu}=\dfrac1N\sum_{i=1}^NX_i\),
- \(\bar{\sigma}^2=\dfrac{1}{N-1}\sum_{i=1}^N(X_i-\bar{\mu})^2\).
Lemma 3.1
- \(\Exp(\bar{\mu})=\mu\).
- \(\Exp(\bar{\sigma}^2)=\sigma^2\).
\[ \begin{aligned} \Exp\qty(\bar{\mu})&=\Exp\qty(\frac1N\sum_{i=1}^NX_i)=\frac1N\sum_{i=1}^N\Exp\qty(X_i)=\frac1N\sum_{i=1}^N\mu=\mu,\\ \Exp\qty(\bar{\sigma}^2)&=\frac{1}{N-1}\Exp\qty[\sum_{i=1}^N(X_i-\bar{\mu})^2]=\frac{1}{N-1}\sum_{i=1}^N\Exp\mqty[\qty(X_i-\bar{\mu})^2]\\ &=\frac{1}{N-1}\sum_{i=1}^N\qty(\Var\qty(X_i-\bar{\mu})+\qty(\Exp\qty(X_i-\bar{\mu}))^2)\\ &=\frac{1}{N-1}\sum_{i=1}^N\qty(\Var\qty(\frac{N-1}{N}X_i-\frac1NX_1-\ldots-\frac1NX_N)+\qty(\Exp\qty(X_i)-\Exp\qty(\bar{\mu}))^2)\\ &=\frac{1}{N-1}\sum_{i=1}^N\qty(\frac{(N-1)^2}{N^2}\Var\qty(X_i)+\frac1{N^2}\Var\qty(X_1)+\ldots+\frac1{N^2}\Var\qty(X_N))\\ &=\frac{1}{N-1}\sum_{i=1}^N\qty(\frac{(N-1)^2}{N^2}\sigma^2+\frac1{N^2}\sigma^2+\ldots+\frac1{N^2}\sigma^2)\\ &=\frac{N}{N-1}\frac{(N-1)^2+N-1}{N^2}\sigma^2=\sigma^2. \end{aligned} \]
Definition 3.5 The following are the unbiased estimators of \(\mu\) and \(\sigma^2\) of \(X\).
- \(\bar{\mu}=\dfrac1N\sum_{i=1}^NX_i\) is called the sample mean of the samples.
- \(\bar{\sigma}^2=\dfrac{1}{N-1}\sum_{i=1}^N(X_i-\bar{\mu})^2\) is called the sample variance of the samples.
Please pay attention to the denominator of the sample variance. The \(N-1\) is due to the degree of freedom: all \(X_i\)’s and \(\bar{\mu}\) are not independent to each other.
3.1.3 Confidence intervals
Definition 3.6 (Confidence interval) Consider a sample of \(X\). Fix a number \(0<\alpha<1\). Let \(L\) and \(U\) be two statistics. We say the interval \((L,U)\) is a \((1-\alpha)100\%\) confidence interval for \(\theta\) if
\[ 1-\alpha=\Pr[\theta\in(L,U)]. \]
Theorem 3.2 (Large-Sample \(100(1-\alpha)\%\) Confidence interval) \[ L,U=\bar{\mu}\pm z_{\alpha/2}\qty(\frac{\bar{\sigma}}{\sqrt{n}}), \] where \(z_{\alpha/2}=1.96\) if \(\alpha=5\%\).
- For any \(n\), if \(X_i\sim \mathcal N(\mu, \sigma^2)\), \(T_n=\dfrac{\bar{X}-\mu}{S/\sqrt{n}}\) has a Student’s \(t\)-distribution of degree of freedom \(n-1\).
- When \(n\) is big enough, for any distribution \(X_i\), \(Z_n=\dfrac{\bar{X}-\mu}{S/\sqrt{n}}\) is approximately \(\mathcal N(0,1)\).
- Student’s \(t\)-distribution of degree of freedom \(n-1\) is approaching \(\mathcal N(0,1)\) when \(n\) is increasing. When \(n=30\) they are very close to each other. Therefore in many cases Statisticians require sample size \(\geq30\).
- For large sample or small sample, the coefficients to compute confidence intervals are \(z_{\alpha/2}\) or \(t_{\alpha/2}\). These two numbers come from normal distribution or Student’s \(t\)-distribution.
3.2 Hypothesis test
Elements of a Statistical Test of Hypothesis
- Null Hypothesis \(𝐻_0\)
- Alternative Hypothesis \(𝐻_𝑎\)
- Test Statistic
- Level of significance \(\alpha\)
- Rejection Region
- \(𝑃\)-Value
- Conclusion