Clustering Concept#
Introduction#
From now on, all issues will be presented from the perspective of minimizing the cost function. Therefore, the method of solution, the application of data for learning, the learning rule, and the type of model will be explained in the context of the cost function. We will place greater emphasis on the cost function.
Index of Data Calculation#
In the field of data analysis and pattern recognition, the term “Index of Data Calculation” refers to various statistical and mathematical indices used to describe and summarize datasets. These indices help in understanding the distribution, central tendency, and variability of the data. This chapter will cover key indices such as the mean, variance, and other measures like the center and support vector data description.
Mean of Data#
The mean, often referred to as the average, is a measure of the central tendency of a dataset. It is calculated by summing all the data points and dividing by the number of points. The mean provides a single value that represents the center of the data distribution.
where \(n\) is the number of data points, and \(x_i\) represents each data point.
Variance of Data#
Variance measures the spread or dispersion of the data points around the mean. It provides insight into how much the data varies. A higher variance indicates that the data points are more spread out from the mean.
where \(\mu\) is the mean of the data.
Center of Data#
The center of the data can also be represented using other measures like the median and mode:
Median: The middle value when the data points are arranged in ascending order. If the number of data points is even, the median is the average of the two middle values.
Mode: The value that appears most frequently in the dataset.
Support Vector Data Description#
Support Vector Data Description (SVDD) is a technique used in machine learning to describe the boundary of a dataset. It is particularly useful for anomaly detection and classification. SVDD creates a sphere that encompasses most of the data points, with support vectors lying on the boundary.
Support Vectors: These are the data points that lie on the boundary of the sphere and are crucial for defining the shape and position of the boundary.
Radius and Center: The radius of the sphere and its center are determined during the training process to best fit the data.