
142 views
Sklearn Ensemble in Python
Scikit-learn (sklearn) provides a module for ensemble methods in Python. Ensemble methods combine the predictions of multiple machine learning models to improve overall predictive performance. Scikit-learn offers several ensemble techniques, including bagging, boosting, and stacking. Here’s an overview of some commonly used ensemble methods in scikit-learn:
- Bagging (Bootstrap Aggregating):
- Bagging is an ensemble technique where multiple instances of the same base model are trained on different subsets of the training data, typically created by random sampling with replacement (bootstrap).
- Scikit-learn provides the
BaggingClassifier
andBaggingRegressor
classes for bagging-based ensemble methods.
Python
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
base_classifier = DecisionTreeClassifier()
bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10, random_state=0)
- Random Forest:
- Random Forest is an ensemble technique that uses bagging with decision trees as base models. It adds randomness by selecting a random subset of features for each tree.
- Scikit-learn provides the
RandomForestClassifier
andRandomForestRegressor
classes for Random Forests.
Python
from sklearn.ensemble import RandomForestClassifier
random_forest_classifier = RandomForestClassifier(n_estimators=100, random_state=0)
- AdaBoost (Adaptive Boosting):
- AdaBoost is a boosting ensemble technique that assigns weights to training samples and combines the predictions of multiple weak learners to create a strong learner.
- Scikit-learn provides the
AdaBoostClassifier
andAdaBoostRegressor
classes for AdaBoost.
Python
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
base_classifier = DecisionTreeClassifier(max_depth=1)
adaboost_classifier = AdaBoostClassifier(base_classifier, n_estimators=50, random_state=0)
- Gradient Boosting:
- Gradient Boosting is a boosting ensemble technique that builds an ensemble of decision trees sequentially, with each tree correcting the errors of the previous ones.
- Scikit-learn provides the
GradientBoostingClassifier
andGradientBoostingRegressor
classes for Gradient Boosting.
Python
from sklearn.ensemble import GradientBoostingClassifier
gradient_boosting_classifier = GradientBoostingClassifier(n_estimators=100, random_state=0)
- Stacking:
- Stacking is an ensemble technique that combines the predictions of multiple base models using a meta-model.
- Scikit-learn does not have a dedicated
StackingClassifier
orStackingRegressor
class, but you can implement stacking using custom code.
Python
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import StackingClassifier
base_estimators = [
('random_forest', RandomForestClassifier(n_estimators=100, random_state=0)),
('gradient_boosting', GradientBoostingClassifier(n_estimators=100, random_state=0))
]
stacking_classifier = StackingClassifier(estimators=base_estimators, final_estimator=LogisticRegression())
These are some of the ensemble methods available in scikit-learn. Depending on your problem and data, you can choose the appropriate ensemble technique and customize its parameters to achieve better predictive performance in your machine learning tasks.