Dimensionality and High Dimensional data in Machine Learning
Dimensionality
In Machine Learning and Data Science world dimensionality refers to the number of attributes a dataset has. For example, we have a telecommunication dataset having large numbers of attributes (region, tenure, age, address, etc). Each attribute is written in a CSV file, with each column representing each dimension. Dimensionality in machine learning is different from those which are used in mathematics or science.
High Dimensional Data
High Dimensional data means, the dataset in which the number of features exceeded the number of observations. The dataset has extremely high attributes and makes it more complex for computations. For example, we have ‘n’ numbers of observations or data points and ‘p’ no. of features or attributes. If in a dataset the values of n and p are 1000 and 2000 then the data becomes high dimensional data.
In simple words no matter how big or small is the dataset if the number of observations is greater than the number of attributes (n>p) even if it has values in single digits then also the data is high dimensional data. For computing high dimensional data, one of the most popular machine learning algorithms is Support Vector Machines(SVM). To know details about the SVM algorithm click here.
Dimensionality Reduction
In simple terms dimensionality reduction means simplifying the data for better understanding either numerically or visually. There are different methods to reduce the dimensionality of data like grouping similar data by using tools like multidimensional scaling. To know in-depth knowledge about dimensionality reduction click here.
Comments
Post a Comment