Learning_Rule#
Table of Introduction in PR#
Solution Method (Learning Rule)#
The learning rule or solution method is the process by which a model adjusts its parameters to minimize the loss function and improve its performance. Here are some common solution methods used in machine learning:
1. Gradient Descent#
Overview: Gradient descent is an optimization algorithm used to minimize the loss function by iteratively moving towards the steepest descent, i.e., the negative gradient of the loss function.
Variants:
Batch Gradient Descent: Computes the gradient of the loss function with respect to the entire dataset.
Stochastic Gradient Descent (SGD): Computes the gradient for each sample and updates the parameters iteratively.
Mini-batch Gradient Descent: Combines the advantages of batch and stochastic gradient descent by updating parameters iteratively with a small subset of the data.
Example of gradient descent update rule $\( w = w - learning_rate * gradient \)$
2. Newton’s Method#
Overview: Newton’s method uses the second-order derivative (Hessian) of the loss function to find the parameter updates. It converges faster than gradient descent but is computationally expensive for large datasets.
Update Rule:
another types such as Conjugate Gradient Method, Quasi-Newton Methods (Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm, …), Regularization Techniques (L1 Regularization (Lasso), L2 Regularization (Ridge) )
3. Genetic Algorithms#
Overview: These are search heuristics that mimic the process of natural selection to find optimal solutions.
Process: Includes selection, crossover, and mutation to evolve the parameters over generations.
4. Bayesian Optimization#
Overview: Bayesian optimization utilizes Bayesian statistics to minimize a function by constructing a probabilistic model of the function. This model is then employed to select the most promising parameters for evaluation within the actual function.
Example: Particle Filter in Bayesian Optimization: Particle filters, also recognized as Sequential Monte Carlo methods, are utilized for estimating the posterior distribution of state variables within dynamic systems. They employ a collection of particles (samples) to depict the posterior distribution and iteratively update them using Bayesian inference.