Learning_Rule#

Table of Introduction in PR#

  1. introduction

  2. Dataset

  3. Model

  4. Cost Function

  5. Solution method

Solution Method (Learning Rule)#

The learning rule or solution method is the process by which a model adjusts its parameters to minimize the loss function and improve its performance. Here are some common solution methods used in machine learning:

1. Gradient Descent#

  • Overview: Gradient descent is an optimization algorithm used to minimize the loss function by iteratively moving towards the steepest descent, i.e., the negative gradient of the loss function.

  • Variants:

    • Batch Gradient Descent: Computes the gradient of the loss function with respect to the entire dataset.

    • Stochastic Gradient Descent (SGD): Computes the gradient for each sample and updates the parameters iteratively.

    • Mini-batch Gradient Descent: Combines the advantages of batch and stochastic gradient descent by updating parameters iteratively with a small subset of the data.

Example of gradient descent update rule $\( w = w - learning_rate * gradient \)$

2. Newton’s Method#

  • Overview: Newton’s method uses the second-order derivative (Hessian) of the loss function to find the parameter updates. It converges faster than gradient descent but is computationally expensive for large datasets.

  • Update Rule:

\[ w= w - (H^{-1}) \nabla L(w) \]

another types such as Conjugate Gradient Method, Quasi-Newton Methods (Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm, …), Regularization Techniques (L1 Regularization (Lasso), L2 Regularization (Ridge) )

3. Genetic Algorithms#

  • Overview: These are search heuristics that mimic the process of natural selection to find optimal solutions.

  • Process: Includes selection, crossover, and mutation to evolve the parameters over generations.

4. Bayesian Optimization#

  • Overview: Bayesian optimization utilizes Bayesian statistics to minimize a function by constructing a probabilistic model of the function. This model is then employed to select the most promising parameters for evaluation within the actual function.

  • Example: Particle Filter in Bayesian Optimization: Particle filters, also recognized as Sequential Monte Carlo methods, are utilized for estimating the posterior distribution of state variables within dynamic systems. They employ a collection of particles (samples) to depict the posterior distribution and iteratively update them using Bayesian inference.