4 Intro to regression

We want to use sample data to investigate the relationships among a group of variables, ultimately to create a model for some variable that can be used to predict its value in the futre.

Language:

Predicting the value of \(X\) by \(\operatorname{E}(X)\) is tantamount to using \(\operatorname{E}(X)\) as a model for the true \(X\).
The variable to be modeled is called the dependent (or reponse) variable.

4.1 Probabilistic model for \(y\)

The probabilistic model for \(y\) is \[ y=\operatorname{E}(y)+\text{Random error}. \] It is called probabilistic since we can make a probability statement about the magnitude of the deviation between \(y\) and \(\operatorname{E}(y)\).

General Form of Probabilistic Model in Regression

\[ y=\operatorname{E}(y)+\varepsilon \] where

\(y\): Dependent variable
\(\operatorname{E}(y)\): Mean value of \(y\)
\(\varepsilon\): Unexplainable or random error

This model suggests that y will come back to its mean evautually. This is why it is called regression model.

Definition 4.1 The variables used to predict \(y\) are called independent variables and are denoted by \(x_i\).

Regression Modeling

Hypothesize the form of the model for \(\operatorname{E}(y)\).
Collect the sample data.
Use the sample data to estimate unknown parameters in the model.
Specify the probability distribution of \(\varepsilon\), and estimate any unknown parameters of this distribution.
Statistically check the usefulness of the model.
Check the validity of the assumptions on \(\varepsilon\), and make model modifications if necessary.
When satisfied that the model is useful, and assumptions are met, use the model to make inferences.