Exercises and Projects
4.4. Exercises and Projects#
CHOOSE ONE: Please apply the random forest to one of the following datasets.
the
iris
dataset.the dating dataset.
the
titanic
dataset.
Please answer the following questions.
Please use grid search to find the good
max_leaf_nodes
andmax_depth
.Please record the cross-validation score and the OOB score of your model and compare it with the models you learned before (kNN, Decision Trees).
Please find some typical features (using the Gini importance) and draw the Decision Boundary against the features you choose.
Please use the following code to get the mgq
dataset.
from sklearn.datasets import make_gaussian_quantiles
X1, y1 = make_gaussian_quantiles(cov=2.0, n_samples=200, n_features=2,
n_classes=2, random_state=1)
X2, y2 = make_gaussian_quantiles(mean=(3, 3), cov=1.5, n_samples=300,
n_features=2, n_classes=2, random_state=1)
X = np.concatenate((X1, X2))
y = np.concatenate((y1, -y2 + 1))
Please build an AdaBoost
model.
Please use RandomForestClassifier
, ExtraTreesClassifier
and KNeighbourClassifier
to form a voting classifier, and apply to the MNIST
dataset.