
Sklearn Regression Models in Python
The Scikit-learn (sklearn) is a popular Python library for machine learning, including regression tasks. It provides a wide range of regression models that can be used for various types of regression problems. In this guide, I’ll demonstrate how to use some of the most commonly used regression models available in scikit-learn.
Step 1: Import Necessary Libraries
First, you need to import the necessary libraries:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score
Step 2: Load and Prepare the Dataset
You need a dataset for regression analysis. For this example, I’ll use a synthetic dataset:
# Create a synthetic dataset
X = np.random.rand(100, 1) * 10
y = 2 * X + 1 + np.random.randn(100, 1)
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 3: Train and Evaluate Regression Models
Now, let’s train and evaluate three different regression models: Linear Regression, Decision Tree Regression, and Random Forest Regression.
# Linear Regression
linear_reg = LinearRegression()
linear_reg.fit(X_train, y_train)
y_pred_linear = linear_reg.predict(X_test)
# Decision Tree Regression
decision_tree_reg = DecisionTreeRegressor()
decision_tree_reg.fit(X_train, y_train)
y_pred_dt = decision_tree_reg.predict(X_test)
# Random Forest Regression
random_forest_reg = RandomForestRegressor(n_estimators=100, random_state=42)
random_forest_reg.fit(X_train, y_train)
y_pred_rf = random_forest_reg.predict(X_test)
Step 4: Evaluate the Models
You can evaluate the models using metrics like Mean Squared Error (MSE) and R-squared (R2):
# Evaluate Linear Regression
mse_linear = mean_squared_error(y_test, y_pred_linear)
r2_linear = r2_score(y_test, y_pred_linear)
print("Linear Regression:")
print(f"Mean Squared Error: {mse_linear:.2f}")
print(f"R-squared: {r2_linear:.2f}")
# Evaluate Decision Tree Regression
mse_dt = mean_squared_error(y_test, y_pred_dt)
r2_dt = r2_score(y_test, y_pred_dt)
print("\nDecision Tree Regression:")
print(f"Mean Squared Error: {mse_dt:.2f}")
print(f"R-squared: {r2_dt:.2f}")
# Evaluate Random Forest Regression
mse_rf = mean_squared_error(y_test, y_pred_rf)
r2_rf = r2_score(y_test, y_pred_rf)
print("\nRandom Forest Regression:")
print(f"Mean Squared Error: {mse_rf:.2f}")
print(f"R-squared: {r2_rf:.2f}")
This code will train and evaluate three different regression models and display the Mean Squared Error (MSE) and R-squared (R2) values for each model.
Feel free to replace the synthetic dataset with your own data for real-world regression tasks. Scikit-learn offers a wide range of regression models and tools for hyperparameter tuning and model selection, allowing you to choose the best model for your specific problem.