B
BoS_88
Guest
BoS_88 Asks: Is there a point in hyperparameter tuning for Random Forests?
I have a binary classification task with substantial class imbalance (99% negative - 1% positive). I want to developed a Random Forest model to make prediction, and after establishing a baseline (with default parameters), I proceed to hyperparameter tuning with scikit-learn's GridSearchCV.
After setting some parameters (e.g.
Therefore, my question is 'What is the point of GridSearch if the outcome is overfitting?' Have I misunderstood its purpose?
My code:
I have a binary classification task with substantial class imbalance (99% negative - 1% positive). I want to developed a Random Forest model to make prediction, and after establishing a baseline (with default parameters), I proceed to hyperparameter tuning with scikit-learn's GridSearchCV.
After setting some parameters (e.g.
max_depth
, min_samples_split
, etc.), I noticed that the best parameters, once GridSearch was done, are highest max parameters (max_depth
) and the smallest min parameters (min_samples_split
, min_samples_leaf
). In other words, GridSearchCV favored the combination of parameters that fits most closely to the training set, i.e. overfitting it. I always thought that cross-validation would protect from this scenario.Therefore, my question is 'What is the point of GridSearch if the outcome is overfitting?' Have I misunderstood its purpose?
My code:
Code:
rf = RandomForestClassifier(random_state=random_state)
param_grid = {
'n_estimators': [100, 200],
'criterion': ['entropy', 'gini'],
'max_depth': [5, 10, 20],
'min_samples_split': [5, 10],
'min_samples_leaf': [5, 10],
'max_features': ['sqrt'],
'bootstrap': [True],
'class_weight': ['balanced']
}
rf_grid = GridSearchCV(estimator=rf,
param_grid=param_grid,
scoring=scoring_metric,
cv=5,
verbose=False,
n_jobs=-1)
best_rf_grid = rf_grid.fit(X_train, y_train)
```
SolveForum.com may not be responsible for the answers or solutions given to any question asked by the users. All Answers or responses are user generated answers and we do not have proof of its validity or correctness. Please vote for the answer that helped you in order to help others find out which is the most helpful answer. Questions labeled as solved may be solved or may not be solved depending on the type of question and the date posted for some posts may be scheduled to be deleted periodically. Do not hesitate to share your thoughts here to help others.