
Sklearn Logistic Regression in Python
Logistic Regression is a popular classification algorithm used for binary and multiclass classification problems. Scikit-learn (sklearn) provides a convenient implementation of Logistic Regression for Python. Here’s how you can use sklearn’s Logistic Regression for classification tasks:
Step 1: Import the Necessary Libraries
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
Step 2: Prepare Your Data
You should have a dataset with features and corresponding target labels (0 or 1 for binary classification, or multiple labels for multiclass classification). Here’s an example using a simple dataset:
# Sample data
X = np.array([[1.2, 2.3], [0.5, 2.7], [2.4, 3.2], [1.7, 1.8]])
y = np.array([0, 1, 0, 1])
Step 3: Split the Data
Split the dataset into training and testing sets to evaluate the model’s performance:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Step 4: Create and Train the Logistic Regression Model
Create an instance of the Logistic Regression model and train it on the training data:
model = LogisticRegression()
model.fit(X_train, y_train)
Step 5: Make Predictions
Use the trained model to make predictions on the testing data:
y_pred = model.predict(X_test)
Step 6: Evaluate the Model
Evaluate the model’s performance using metrics like accuracy, confusion matrix, and classification report:
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Classification Report:\n{class_report}")
Here’s a complete example:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
# Sample data
X = np.array([[1.2, 2.3], [0.5, 2.7], [2.4, 3.2], [1.7, 1.8]])
y = np.array([0, 1, 0, 1])
# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train the model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Classification Report:\n{class_report}")
This example demonstrates how to use sklearn’s Logistic Regression for a simple binary classification problem. You can apply similar steps to more complex datasets and classification tasks.