Cover Image for Sklearn Ensemble in Python
142 views

Sklearn Ensemble in Python

Scikit-learn (sklearn) provides a module for ensemble methods in Python. Ensemble methods combine the predictions of multiple machine learning models to improve overall predictive performance. Scikit-learn offers several ensemble techniques, including bagging, boosting, and stacking. Here’s an overview of some commonly used ensemble methods in scikit-learn:

  1. Bagging (Bootstrap Aggregating):
  • Bagging is an ensemble technique where multiple instances of the same base model are trained on different subsets of the training data, typically created by random sampling with replacement (bootstrap).
  • Scikit-learn provides the BaggingClassifier and BaggingRegressor classes for bagging-based ensemble methods.
Python
 from sklearn.ensemble import BaggingClassifier
 from sklearn.tree import DecisionTreeClassifier

 base_classifier = DecisionTreeClassifier()
 bagging_classifier = BaggingClassifier(base_classifier, n_estimators=10, random_state=0)
  1. Random Forest:
  • Random Forest is an ensemble technique that uses bagging with decision trees as base models. It adds randomness by selecting a random subset of features for each tree.
  • Scikit-learn provides the RandomForestClassifier and RandomForestRegressor classes for Random Forests.
Python
 from sklearn.ensemble import RandomForestClassifier

 random_forest_classifier = RandomForestClassifier(n_estimators=100, random_state=0)
  1. AdaBoost (Adaptive Boosting):
  • AdaBoost is a boosting ensemble technique that assigns weights to training samples and combines the predictions of multiple weak learners to create a strong learner.
  • Scikit-learn provides the AdaBoostClassifier and AdaBoostRegressor classes for AdaBoost.
Python
 from sklearn.ensemble import AdaBoostClassifier
 from sklearn.tree import DecisionTreeClassifier

 base_classifier = DecisionTreeClassifier(max_depth=1)
 adaboost_classifier = AdaBoostClassifier(base_classifier, n_estimators=50, random_state=0)
  1. Gradient Boosting:
  • Gradient Boosting is a boosting ensemble technique that builds an ensemble of decision trees sequentially, with each tree correcting the errors of the previous ones.
  • Scikit-learn provides the GradientBoostingClassifier and GradientBoostingRegressor classes for Gradient Boosting.
Python
 from sklearn.ensemble import GradientBoostingClassifier

 gradient_boosting_classifier = GradientBoostingClassifier(n_estimators=100, random_state=0)
  1. Stacking:
  • Stacking is an ensemble technique that combines the predictions of multiple base models using a meta-model.
  • Scikit-learn does not have a dedicated StackingClassifier or StackingRegressor class, but you can implement stacking using custom code.
Python
 from sklearn.ensemble import RandomForestClassifier
 from sklearn.ensemble import GradientBoostingClassifier
 from sklearn.linear_model import LogisticRegression
 from sklearn.ensemble import StackingClassifier

 base_estimators = [
     ('random_forest', RandomForestClassifier(n_estimators=100, random_state=0)),
     ('gradient_boosting', GradientBoostingClassifier(n_estimators=100, random_state=0))
 ]
 stacking_classifier = StackingClassifier(estimators=base_estimators, final_estimator=LogisticRegression())

These are some of the ensemble methods available in scikit-learn. Depending on your problem and data, you can choose the appropriate ensemble technique and customize its parameters to achieve better predictive performance in your machine learning tasks.

YOU MAY ALSO LIKE...

The Tech Thunder

The Tech Thunder

The Tech Thunder


COMMENTS