Ensemble Learning#
Introduction to Ensemble Learning:
Ensemble learning is a powerful technique in machine learning where multiple models, often called “weak learners,” are combined to create a stronger, more accurate model. The main idea is that by aggregating the predictions of several models, we can achieve better performance than any single model could on its own.
Two popular and simple approaches to ensemble learning are Bagging and Boosting:
Bagging (Bootstrap Aggregating):
Bagging involves training multiple models independently on different random subsets of the training data. The final prediction is made by averaging the predictions of these models (for regression) or by voting (for classification). This method helps to reduce the variance of the model and prevent overfitting, particularly with high-variance models like decision trees.
Boosting:
Boosting, in contrast, trains models sequentially, with each new model focusing on correcting the errors made by the previous ones. The models are combined in such a way that the final ensemble model is a weighted sum of the individual models, where more weight is given to models that perform better. This process reduces both bias and variance, resulting in a strong overall model.
Boosting#
Suppose we have a weak classifier denoted as
By appropriately combining
Follow the algorithm below:
Input: Sample distribution
Base learning algorithm
Number of learning rounds
Process:
% Initialize the distribution.For
: % Train a weak learner from distribution . % Evaluate the error of .
Output:
Homework : Correct following code#
The base learner must be selected such as Bayesian.
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load the MNIST dataset from a local .npz file
mnist_data = np.load('..//mnist.npz')
# Extract the training and test sets
x_train = mnist_data['x_train']
y_train = mnist_data['y_train']
x_test = mnist_data['x_test']
y_test = mnist_data['y_test']
# Preprocess the data
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0
x_train = x_train.reshape(-1, 784)
x_test = x_test.reshape(-1, 784)
# The dataset is already for digits 0 through 9, no need to filter
# The following are placeholders to ensure the training and test sets include all digits.
def base_learner(D_t):
# Extract features and labels from the weighted distribution
X = np.array([x for x, y, w in D_t])
y = np.array([y for x, y, w in D_t])
sample_weights = np.array([w for x, y, w in D_t])
# Train a Naive Bayes classifier
clf = GaussianNB()
clf.fit(X, y, sample_weight=sample_weights)
# Return a function that makes predictions using the trained model
return lambda x: clf.predict([x])[0], clf
def adjust_distribution(D_t, h_t, epsilon_t):
# Adjust the distribution based on the error rate epsilon_t
new_D = []
for x, y, w in D_t:
if h_t(x) != y:
w *= np.exp(epsilon_t)
new_D.append((x, y, w))
# Normalize weights
total_weight = sum(w for _, _, w in new_D)
new_D = [(x, y, w/total_weight) for x, y, w in new_D]
return new_D
def combine_outputs(weak_learners, x):
# Majority voting for the final classifier
votes = np.zeros(10) # For 10 classes
for h in weak_learners:
prediction = h(x)
votes[prediction] += 1
return np.argmax(votes)
def boosting(D, L, T):
# Initialize distribution with equal weights
D_t = [(x, y, 1/len(D)) for x, y in D]
weak_learners = []
base_learner_accuracies = []
for t in range(T):
# Train a weak learner on the current distribution
h_t, clf = L(D_t)
weak_learners.append(h_t)
# Evaluate the error of the weak learner
epsilon_t = sum(w for (x, y, w) in D_t if h_t(x) != y)
# Calculate accuracy of the current weak learner
predictions = clf.predict(x_train)
accuracy = accuracy_score(y_train, predictions)
base_learner_accuracies.append(accuracy)
print(f"Weak Learner {t + 1} Accuracy: {accuracy:.4f}")
# Adjust the distribution based on the error
D_t = adjust_distribution(D_t, h_t, epsilon_t)
# Combine outputs of all weak learners
return lambda x: combine_outputs(weak_learners, x), base_learner_accuracies
# Prepare the training data for boosting
D_train = [(x, y) for x, y in zip(x_train, y_train)]
# Number of boosting rounds
T = 10
# Train the boosted classifier
H, base_learner_accuracies = boosting(D_train, base_learner, T)
# Evaluate the final model on the test set
y_pred = [H(x) for x in x_test]
# Calculate the accuracy of the final model
final_accuracy = accuracy_score(y_test, y_pred)
print("Final Model Accuracy on the test set:", final_accuracy)
# Compare accuracies
print("Base Learner Accuracies:", base_learner_accuracies)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[1], line 6
3 from sklearn.metrics import accuracy_score
5 # Load the MNIST dataset from a local .npz file
----> 6 mnist_data = np.load('..//mnist.npz')
8 # Extract the training and test sets
9 x_train = mnist_data['x_train']
File H:\HadiSadoghiYazdi\PL\Lib\site-packages\numpy\lib\_npyio_impl.py:459, in load(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)
457 own_fid = False
458 else:
--> 459 fid = stack.enter_context(open(os.fspath(file), "rb"))
460 own_fid = True
462 # Code to distinguish from NumPy binary files and pickles.
FileNotFoundError: [Errno 2] No such file or directory: '..//mnist.npz'
The AdaBoost method#
The general boosting procedure described above requires two key operations: Adjust_Distribution
and Combine_Outputs
.
The cost function we aim to minimize for a classifier involves both the parameters of the classifier and the weights of each classifier used in the combination. The cost function is given by:
where
For a two-class classifier, the error
Here,
then,
This expression shows that
The cost function can be further expanded as follows:
Substituting
This can be rewritten as:
or:
where
Given:
For correctly classified samples:
For misclassified samples:
The cost function
The general formula for
For correctly classified samples
In this case,
Thus, the contribution to
For misclassified samples
In this case,
Thus, the contribution to
Combining both contributions:
Weights of combination#
Given the definitions:
(true classifications) (false classifications)
and using the error rate definition:
we can substitute these into the derivative calculation.
Rewrite
Substitute
which becomes:
Express
From the definition of
Rearrange to express
Solve for
So:
Substitute into the Exponential Term
Now substitute this into the exponential term:
Solve for
Taking the natural logarithm of both sides:
So:
Using Taylor expansion of
desired classifier
by noticing that
$
Denote a distribution
$
In the expression
If
: , so the expression becomes .If
: , so the expression becomes .
Therefore:
When the prediction
is correct, the expression equals 1.When the prediction
is incorrect, the expression equals -1.
The ideal classifier is:
As can be seen, the ideal
Here,
If
Thus,
, which is calculated as follow,
Adaboost Algorithm#
Input:
Dataset
Base learning algorithm
Number of learning rounds
Process:
Initialize the weight distribution
.For
:Train a classifier
using the base algorithm on dataset under the distribution .Calculate the error
of , where .If
, then and continue,Determine the weight
of the classifier .Update the distribution for the next round: $
Z_t D_{t+1} $ is a valid distribution.
End the loop.
Output:
The final hypothesis is given by
How its work Adaboost#
import numpy as np
from sklearn.datasets import make_classification
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
import matplotlib.pyplot as plt
# Generate synthetic 2D dataset
X, y = make_classification(n_samples=200, n_features=2, n_informative=2, n_redundant=0, n_clusters_per_class=1, flip_y=0, random_state=42)
# Initialize base classifier
base_classifier = DecisionTreeClassifier(max_depth=1)
# Initialize AdaBoost classifier
adaboost = AdaBoostClassifier(base_classifier, n_estimators=5, algorithm='SAMME.R', random_state=42)
# Fit AdaBoost classifier to the data
adaboost.fit(X, y)
# Predict using the fitted model
y_pred = adaboost.predict(X)
# Plot the decision boundary
def plot_decision_boundary(X, y, model):
h = .02 # Step size in the mesh
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.coolwarm)
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', marker='o', cmap=plt.cm.coolwarm)
plt.title('AdaBoost Decision Boundary')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
# Plot the decision boundary
plot_decision_boundary(X, y, adaboost)
e:\HadiSadoghiYazdi\.M_HomePage\Lib\site-packages\sklearn\ensemble\_weight_boosting.py:527: FutureWarning: The SAMME.R algorithm (the default) is deprecated and will be removed in 1.6. Use the SAMME algorithm to circumvent this warning.
warnings.warn(

Homework : Adaboost#
Play with diferent n_estimators
write adaboost without sklearn.ensemble
What we do with AdaBoost#
What we do with AdaBoost can be explained by first reviewing the following works we have mentioned:
Bahraini, T., Hosseini, S. M., Ghasempour, M., & Sadoghi Yazdi, H. (2022). Density-oriented linear discriminant analysis. Expert Systems with Applications, 187, 115946.
In the study “Density-oriented linear discriminant analysis,” the authors tackle big data challenges by integrating AdaBoost with a novel base learner, DLDA (Density-Oriented Linear Discriminant Analysis). This combination enhances classification accuracy and efficiency, especially in high-dimensional and imbalanced datasets. The approach is scalable and suitable for various big data applications, offering an innovative solution to improve machine learning performance in complex environments. Future work includes optimizing DLDA and exploring integrations with other advanced techniques.
Miniproject: The LogitBoost algorithm#
Miniproject: The LPBoost algorithm#
Bagging#
Bagging, short for Bootstrap Aggregating, reduces errors by combining multiple independent base learners. It achieves this by generating different training subsets using bootstrap sampling, where each subset is created by sampling with replacement from the original dataset. Multiple base learners are trained on these subsets, and their outputs are aggregated via voting for classification or averaging for regression. Bagging is effective for both binary and multi-class classification problems.
In the context of bagging, after combining
Bagging algorithm#
Input:
Dataset
Base learning algorithm
Number of base learners
Process:
For
:Train the base learner:
where
is the bootstrap distribution (sampling with replacement)
End for
Output:
$