Cost_Function#

Table of Introduction in PR#

  1. introduction

  2. Dataset

  3. Model

  4. Cost Function

  5. Solution method

Its components are error, loss function and adjustment term.

\[ \text{E}\left\{l\left( e \right) \right\}+\lambda*Reularization Term \]

The optimal parameter is obtained by minimizing the average loss function 𝐿 considering the defined error 𝑒 on the parameter in question, along with the adjustment term known as knowledge. This process ensures that the parameter achieves its best fit by balancing between minimizing loss and incorporating relevant adjustments based on acquired knowledge.

Explanation of the Above Equation by Example What is the meaning of $\(\text{E}\left\{l\left( e \right) \right\}\)$ ? Suppose you want to get married and you have many suggestions from your family and friends, some of which you are considering yourself. However, these suggestions might not perfectly match your criteria. This discrepancy is called the error 𝑒 . It is important to note that the amount of loss caused by this error is not necessarily equivalent to the error itself.

For a better understanding, consider this example: suppose one of your criteria for an ideal spouse is having honey-colored eyes and large eyes. If the proposed person has slightly larger eyes than expected, this discrepancy, or error, is minimal. Therefore, even with the error, the loss can be negligible; in other words, the loss function might be zero even with the error. This scenario describes a relaxed loss function 𝑙(𝑒). Depending on the problem and data type, a function can be applied to the error to obtain the desired loss.

A noteworthy point is that the effect of different errors on the loss function can be considered. Now, for each matchmaking case, a loss is associated with the person being considered. $\(l\left(e_{1} \right)\)$ is the loss caused by choosing the first option, and

\[l\left(e_{n} \right)\]

is the loss caused by choosing the last option. Here, the need for a cost function becomes apparent. In other words, the combination of all losses should be taken into account, and a selection should be made such that the chosen spouse results in a minimum combination of losses.

Types of Loss Functions#

Loss functions are used to evaluate the performance of a model. Two main types are mentioned:

Supervised Loss Functions#

In this type, the desired outcome is known. For example, we know that this pattern is an apple, the price of this item is $2, or this MRI image belongs to a healthy individual. Two examples of supervised loss functions are as follows:

Classification Loss Functions:

  • These functions are used for classification problems, where the goal is to categorize samples into different classes. An example of these functions is the 0-1 loss function, which assigns a loss of 1 for each misclassification and a loss of 0 for correct classifications.

  • Another commonly used classification loss function is the cross-entropy loss, which is widely used in training neural networks for multi-class classification problems. It measures the performance of a classification model whose output is a probability value between 0 and 1.

Regression Loss Functions: These functions are used for regression problems, where the goal is to predict numerical values.

  • Square loss: An example of these functions is the Mean Squared Error (MSE), which calculates the mean of the squares of the differences between predicted and actual values.

  • Absolute loss: Another example is the Mean Absolute Error (MAE), which measures the mean of the absolute differences between predicted and actual values.

  • Huber loss: The Huber loss function is another regression loss that is less sensitive to outliers in data than the MSE. It combines the best properties of MSE and MAE by being quadratic when the error is small and linear when the error is large.

Unsupervised Loss Functions#

In this type, the desired outcome is not known. For instance, in determining an index from a set of data, only the data is available, and the index is unknown. Similarly, in clustering data, we do not know to which cluster a particular data point belongs. Examples include:

Clustering Loss Functions:

These functions are used in clustering problems where the goal is to group similar data points together. An example is the K-means loss function, which aims to minimize the within-cluster sum of squares, effectively measuring the variance within each cluster.

Dimensionality Reduction Loss Functions:

These functions are used in dimensionality reduction techniques, where the goal is to reduce the number of variables under consideration and can be divided into feature selection and feature extraction. An example is the Principal Component Analysis (PCA) loss function, which minimizes the reconstruction error, aiming to retain as much variance as possible in the reduced dimensions.