Geometric significance of eigenvalues and eigenvectors in linear algebra and their potential value in data analysis
Data publikacji: 24 wrz 2025
Otrzymano: 17 sty 2025
Przyjęty: 05 maj 2025
DOI: https://doi.org/10.2478/amns-2025-0985
Słowa kluczowe
© 2025 Yingdi Li, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Eigenvalues and eigenvectors are important attributes of matrices, which have important applications in quantum mechanics, machine learning, signal and image processing, and other subject areas, and their concepts and connotations should be skillfully mastered and deeply understood by science and engineering students. Let a square array
Eigenvalues and eigenvectors have a large number of application cases in big data analysis. When processing big data, in order to reduce the processing difficulty, it is usually necessary to reduce the dimensionality of the sample data, and principal component analysis is a method to project the high-dimensional data into a low-dimensional space using linear transformation under the premise of losing as little information as possible [1-5]. For example, in the observation of different cars, measuring their number of seats, number of tires, number of doors, number of windows, number of tires, size of cylinders, etc., some of these indicators have strong correlation, and this redundant information needs to be removed during data processing. Therefore, it is reasonable for principal component analysis to consider indicators with high variance as better for class differentiation [6-7]. PCA is one of the commonly used algorithms for big data analysis and machine learning, and it is an algorithm that will be involved in the study of computer, electronic information, economics, and medicine, etc. [8-10]. In this example, in addition to the use of eigenvalues and eigenvectors in linear algebra, it also involves square matrix diagonalization and coordinate transformation [11-14].
In this paper, we study the trajectory of vector
Eigenvalues and eigenvectors are two important concepts in linear algebra, which are now widely used in hot areas such as dynamical systems, machine learning, image processing and data analysis [15]. In this paper, we take the 2nd order square matrix as an example, focusing on explaining the geometric significance of eigenvalues and eigenvectors.
In the plane, vector
Consider the linear transformation
Using
The distribution of the trajectories of vector
When matrix When When
The following is an example:
Example 1: Given matrix
Solution: Matrix
Equation (3) represents an ellipse whose long and short axes are not on the coordinate axes.
The Matlab program is applied to plot the geometry of equation (3), and the result is shown in Fig. 1, from which it can be seen that the unit element is transformed into an ellipse whose long and short axes are not on the coordinate axes by the linear transformation

The trajectory of linear transformation
Taking
Thus, define

The geometric meaning of eigenvalues and eigenvectors
Equation (3) is examined from the point of view of quadratic forms. Equation (3) represents an ellipse, as shown in Fig. 3(a), when the long and short axes of the ellipse are no longer on the coordinate axes. Its long and short axes are made to fall on the coordinate axes by picking an orthogonal linear transformation.
The quadratic type (3) corresponds to matrix
Equation (4) is an ellipse with the long and short axes falling on the coordinate axes, as shown in Figure 3(b).
Let
Equation (5) is also an ellipse with the long and short axes falling on the coordinate axes, as shown in Figure 3(c). Figure 3(b) is equivalent to making a clockwise rotation of 45° for Figure 3(a), and Figure 3(c) is equivalent to making a counterclockwise rotation of 45° for Figure 3(a).

The corresponding curve of equation(3)~(5)
Matrix
An example of the curve represented by equation (6) is given below.
Example 2: Consider matrix
This shows that by this linear transformation, the unit circle becomes line segment

The trajectory of linear transformation
A linear transformation changes
As:
Example 3: Consider the matrix
This shows that by this linear transformation, the unit circle becomes a line segment:

The trajectory of linear transformation
Since the corresponding eigenvalue of this matrix is 0, and 0 is a 2-fold eigenroot. The corresponding unitized eigenvector:
Eigenvalues and eigenvectors have very important applications in modern science, as the basis of linear algebra, it has an important role in theory and real life, and a lot of data analysis work is closely related to it. This section will mainly introduce the application of eigenvalues and eigenvectors in data analysis, including principal component analysis and, through the example study to explain the application effect of various methods residing in the eigenvalues and eigenvectors, to reveal the geometric significance of the eigenvalues and eigenvectors play an important role in data analysis in a more in-depth manner.
The starting point of the principal component analysis method is to compute from a set of features a set of new features [16] in descending order of importance, which are linear combinations of the original features and are uncorrelated with each other.
Denote
Here the linear combination is required to have coefficients modulo 1, which can be obtained in order to unify the
Eq. (10) is written in matrix form as:
Consider the first new feature
Its variance is:
That is, the eigenequation of the covariance array
So, the optimal
The second new feature
Substituting into Eq. (10) and organizing, we can get:
Considering (16), the irrelevant requirements are equivalent to requirements
Maximizing the variance of
There are
It is equal to the sum of the variances of the individual original features.
The individual column vectors of the exchange matrix
As a feature extraction method, it is generally desirable to represent the data with fewer principal components. If the first
Figure 6 shows an example of the magnitude of the individual eigenvalues on some dataset, and it can be seen that the first three eigenvalues, i.e., the variance of the first three principal components, account for most of the total variance, and one can decide on the choice of a few principal components to represent the total data based on such an eigenvalue mapping. In many cases, it is possible to determine in advance the proportion of the total variance of the data that it is hoped that the new eigenvalues will represent, and then try to calculate the appropriate

Eigenvalue of Principal component analysis
The selection of relatively few principal components to represent the data can be used not only as a dimensionality reduction of the features, but also to eliminate noise from the data. Generally the principal components (or called sub-components) arranged at the back in the eigenvalue spectrum generally represent random noise in the data. In this case, if the very small component of
Spatial aberration spectroscopy (SHS) is a new hyperspectral remote sensing detection technology, the two-dimensional measured interferometric data acquired by spatial aberration spectrometer will be infected by a variety of influences, which will reduce the accuracy of the recovered spectra, so the experiments in this section focus on investigating the spatial aberration interferometric data correction method based on principal component analysis.
Experimental data The test data were collected using the spatial aberration spectrometer HEP-765-S, which has a fundamental frequency wavelength of 764.8 nm and a spectral resolution due to 0.01 nm. The monochromatic light source was a hollow cathode lamp potassium lamp, and the continuous light source was a GY-10 high-pressure spherical xenon lamp. The raw interferograms were collected using a spatial aberration spectrometer in a darkroom environment. The size of both images was 2048×2048 pixels, and each line in the figure represents a set of interferometric data. Both interferograms have the phenomenon of uneven intensity distribution, and there are irregularly shaped spots or patches in some areas, and the existence of these effects will reduce the accuracy of the recovered spectra, which need to be corrected and processed. Data processing and analysis There is a Fourier transform relationship between the interferogram and the spectrogram, and the spectral data can be obtained by Fourier transforming the preprocessed two-dimensional interferogram. The two-dimensional interferograms to be processed in this experiment have 2048 groups of one-dimensional interferometric data in a hundred rows, and 2048 groups of spectral data can be obtained after Fourier transform processing of each group of interferometric data, which can be expressed as According to the above analysis, 2048 rows of spectral data are obtained after de-baselining and Fourier transforming the original interferograms. 2048 sets of spectral data are processed by principal component analysis, and the eigenvalues, eigenvectors and projection values of each principal component are sorted, of which the eigenvalues, contribution rates and cumulative contribution rates of the first 10 principal components are shown in Table 1. As can be seen from Table 1, the first two principal components have a larger contribution than the other principal components and are not of an order of magnitude, and the cumulative contribution rate reaches 97.71%.
The results of the first ten principal components
| Principal component | Eigenvalue | Contribution rate/% | Cumulative contribution/% |
|---|---|---|---|
| 1 | 2847.83 | 51.43 | 51.43 |
| 2 | 2672.39 | 46.28 | 97.71 |
| 3 | 61.27 | 0.83 | 98.54 |
| 4 | 54.78 | 0.46 | 99.00 |
| 5 | 10.83 | 0.63 | 99.63 |
| 6 | 6.28 | 0.21 | 99.84 |
| 7 | 3.12 | 0.05 | 99.89 |
| 8 | 2.76 | 0.03 | 99.92 |
| 9 | 2.14 | 0.02 | 99.94 |
| 10 | 1.82 | 0.01 | 99.95 |
Figure 7 shows the average spectrogram and the spectrograms of the first ten principal components, the two characteristic peaks of the potassium lamp are and 766.70nm and 770.59nm, it can be seen from the figure that in the first principal component and the second principal component, the noise around the two characteristic peaks is small, and the characteristic peak intensity is large. The noise gradually increases in the third principal component to the tenth principal component, and the intensity of the two peaks in each principal component gradually decreases, and none of them are located at 766.70nm and 770.59nm, indicating that the noise has become the main influence in these principal components.

Spectra of potassium lamps
In order to verify that the principal component analysis method has the same denoising effect on the interference of continuous light, the xenon lamp spectral data were processed in the same way. The contributions of the first three principal components in the processed xenon lamp spectral data were 42.78%, 36.94%, and 0.83%, respectively, of which the contributions of the first two principal components were greater than 35%, with a cumulative contribution of 79.72%. The average spectrogram with the first ten principal components spectrogram is shown in Figure 8. As can be seen from Fig. 8, the intensity of the xenon characteristic peaks in the first two principal components is larger, and the intensity of the xenon characteristic peaks in the third principal component to the tenth principal component is basically the same as the noise intensity, i.e., the first two principal components can be used as xenon spectral components.

Spectrum of xenon lamps
In order to quantitatively evaluate the effectiveness of the principal component analysis method, 300 rows of less noisy data were randomly selected from 2048 rows of data, and the mean square errors before and after spectral correction were calculated for the 524th, 596th and 974th rows of the spectra among them, and the results are shown in Table 2. From Table 2, it can be seen that the mean square error values of the three rows of spectra after correction are 0.134, 0.108 and 0.114, respectively, which are smaller than the mean square error values before correction, indicating that the correction effect of principal component analysis is better.
MSE of xenon lamp spectra before and after denoising
| Mean square error | Line 524 | Line 596 | Line 974 |
|---|---|---|---|
| Before denoising | 0.548 | 0.476 | 0.503 |
| After denoising | 0.134 | 0.108 | 0.114 |
Cluster analysis is a common method in data analysis, and among data clustering, spectral clustering is one of the most popular methods. Spectral clustering is based on the theory of spectral map division, compared with the traditional clustering algorithms, this class of algorithms in the consideration of the continuous relaxation form of the problem, the original problem is transformed into the search for the eigenvalues and eigenvectors of the Laplacian matrix. The algorithm is capable of recognizing non-convex distributions and is also applied to many practical problems, which are well represented in the fields of image segmentation, text mining, and bioinformatics research. In this section, the spectral clustering algorithm is investigated.
The spectral clustering algorithm is based on the theory of spectral graph partitioning, and regards data clustering as a multiplexed partitioning problem of an undirected graph [17]. It is assumed that each data sample is regarded as a vertex V of the graph, and the connected edges E between the vertices are assigned a weight value W according to the similarity between the data, thus obtaining an undirected weighted graph G = (V, E) based on the similarity of the samples. From the point of view of optimal graph partitioning, it is to minimize the similarity between any two subgraphs of the partition and maximize the similarity within each subgraph.
The optimal solution of the graph partitioning problem is an NP problem. -A better solution is to consider the continuous relaxed form of the problem, whereupon the original problem can be transformed into a spectral decomposition of the Laplacian matrix, thus obtaining a globally optimal solution of the graph partition criterion in the relaxed continuous domain.
The similarity matrix, also known as the affinity matrix, is usually denoted by W or A. This matrix is defined as:
Spectral clustering according to different criterion functions and spectral mapping methods, there are a variety of different implementation methods. A representative one is the NJW algorithm, whose main steps are as follows:
Step 1: Construct the similarity matrix W of the data samples.
Step 2: Construct the Laplace matrix L.
Step 3: Find the first K largest eigenvalues and the corresponding eigenvectors
Step 4: Consider each row of the eigenvector space V as a point in the space and cluster it into
This paper focuses on how to determine the number of clusters and how to select the eigenvectors for improvement. First solve the eigenvalue of the Laplace matrix of the network nodes to be divided, that is, Compute the adjacency matrix Construct the Laplace matrix Compute the eigenvalues and eigenvectors of matrix Calculate the eigenvalue Solve for Cluster the
In order to test the feasibility of the improved algorithm and the accuracy of the delineation results, the Karate Club relationship network (Zachary network), which is commonly used in the delineation of complex network associations, is selected for testing in this paper. For the karate club member relationship network, solving its Laplace matrix eigenvalues as well as
Eigenvalues of Laplacian matrix and
| Eigenvalue | Eigenvalue | Eigenvalue | ||||||
|---|---|---|---|---|---|---|---|---|
| 0.0043 | ─ | 0.0257 | 0.0001 | 0.0582 | 0.0013 | |||
| 0.0107 | 0.0064 | 0.0257 | 0.0000 | 0.0637 | 0.0055 | |||
| 0.0138 | 0.0031 | 0.0314 | 0.0057 | 0.0708 | 0.0071 | |||
| 0.0157 | 0.0019 | 0.0348 | 0.0034 | 0.0812 | 0.0104 | |||
| 0.0198 | 0.0041 | 0.0376 | 0.0028 | 0.0843 | 0.0031 | |||
| 0.0216 | 0.0018 | 0.0403 | 0.0027 | 0.0916 | 0.0073 | |||
| 0.0228 | 0.0012 | 0.0429 | 0.0026 | 0.1164 | 0.0248 | |||
| 0.0241 | 0.0013 | 0.0431 | 0.0011 | 0.1368 | 0.0204 | |||
| 0.0256 | 0.0015 | 0.0439 | 0.0008 | 0.1672 | 0.0304 | |||
| 0.0256 | 0.0000 | 0.0532 | 0.0093 | 0.2179 | 0.0507 | |||
| 0.0256 | 0.0000 | 0.0569 | 0.0037 | 0.2308 | 0.0129 |
Selected eigenvectors and node category
| Node | Eigenvector1 | Eigenvector2 | Node number | Node | Eigenvector1 | Eigenvector2 | Node number |
|---|---|---|---|---|---|---|---|
| 1 | -0.1038 | 0.0647 | 1 | 18 | -0.1001 | 0.1498 | 1 |
| 2 | -0.0405 | 0.0936 | 1 | 19 | 0.1632 | -0.0583 | 2 |
| 3 | 0.0231 | 0.0418 | 2 | 20 | -0.0128 | 0.0647 | 1 |
| 4 | -0.0527 | 0.1039 | 1 | 21 | 0.1539 | -0.0594 | 1 |
| 5 | -0.2876 | -0.1203 | 2 | 22 | -0.1000 | 0.1497 | 2 |
| 6 | -0.3178 | -0.1986 | 1 | 23 | 0.1579 | -0.0597 | 2 |
| 7 | -0.3194 | -0.2006 | 1 | 24 | 0.1546 | -0.0601 | 2 |
| 8 | -0.5219 | 0.1007 | 1 | 25 | 0.1528 | -0.0641 | 2 |
| 9 | 0.0504 | 0.0138 | 2 | 26 | 0.1489 | -0.0732 | 1 |
| 10 | 0.0915 | 0.0129 | 2 | 27 | 0.1863 | -0.0892 | 2 |
| 11 | -0.02769 | -0.1208 | 1 | 28 | 0.1176 | -0.0359 | 1 |
| 12 | -0.2117 | 0.7549 | 2 | 29 | 0.0948 | -0.0059 | 2 |
| 13 | -0.1093 | 0.1647 | 2 | 30 | 0.1634 | -0.0698 | 2 |
| 14 | -0.0139 | 0.0654 | 2 | 31 | 0.0726 | 0.0139 | 2 |
| 15 | 0.1576 | -0.0613 | 1 | 32 | 0.0976 | -0.0281 | 1 |
| 16 | 0.1643 | -0.0619 | 2 | 33 | 0.1194 | -0.0381 | 2 |
| 17 | -0.4218 | -0.3576 | 1 | 34 | 0.1173 | -0.0285 | 1 |
Figure 9 shows the results of the Zachary network division.The Zachary Karate Club network is a common experimental network used to evaluate the effectiveness of club division. The network consists of 34 points and 75 edges. Due to some reasons, the club forms joint small clubs centered on the superintendent and the principal respectively. Applying the algorithm proposed in this paper, this network was divided into 2 parts, and from the division results, it can be seen that the algorithm proposed in this paper can accurately and automatically determine the clustering categories. At the same time, the eigenvectors corresponding to the first two second-smallest eigenvalues are selected according to the formula, and the accuracy of the club node division results after applying the K-means algorithm for clustering analysis reaches 98.03%, which further indicates that the eigenvectors automatically selected by the algorithm in this paper are effective.

Grid division results of Zachary network
This paper investigates the geometric significance of eigenvalues and eigenvectors under the classification criteria of reversibility and irreversibility of matrix
The processing and dimensionality reduction of 2048 sets of interferometric data collected by the spatial aberration spectrometer are realized by using principal component analysis, and the first principal component and the second principal component are selected as the main feature components of monochromatic and continuous light sources, so as to realize the correction of the spatial aberration interferometric data. Taking the spectral data in rows 524, 596, and 974 as an example, the mean square error values of the corrected data are 0.134, 0.108, and 0.114, respectively, which are reduced by 75.55%, 77.31%, and 77.34% compared with the pre-correction. It shows the effective application of principal component analysis with eigenvalues and eigenvectors and their geometric significance as the main principle in the correction of spatial outlier interference data. The improvement of the spectral clustering algorithm in this paper successfully realizes the automatic determination of the number of clusters and the automatic selection of the eigenvectors in the division of complex networks. Using the improved algorithm in this paper, the karate club network is divided into 2 groups, and the accuracy of the association node division results is as high as 98.03%. Through the study of the real situation of karate club network, it explains that the network division results of this paper are highly consistent with the real situation, highlighting the effectiveness of the improved spectral clustering algorithm of this paper and the important value of eigenvalues and eigenvectors and their geometrical significance in data analysis.
