Feature selection for high-dimensional data based on scaled cross operator threshold filtering specific memory algorithm
Publicado en línea: 26 mar 2025
Recibido: 23 oct 2024
Aceptado: 17 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0805
Palabras clave
© 2025 Wulue Zheng et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
With the rapid development of the information age, big data has become one of the most representative symbols of this era. How to mine the most valuable information in the huge scale and complex structure of high-dimensional data is a hot concern for researchers and technical work [1-3]. Feature selection is an effective preprocessing method for high-dimensional data, aiming at removing redundant and irrelevant information from the data, reducing the data dimensionality and the computational complexity of learning algorithms, and it is a hot research topic in the basic theory of artificial intelligence [4-6].
Feature selection can be categorized into many methods such as embedded, encapsulated and filtered. Embedded feature selection integrates feature selection with the learner and automatically performs feature selection during the learner training process. This method lacks interpretability of feature selection [7-8]. Encapsulated feature selection selects a subset of features and trains the learner by continuously selecting a subset of features from an initial set of features and evaluating the subset based on the performance of the learner until the best subset is selected. Multiple training of the learner leads to high computational complexity in encapsulated feature selection methods [9-10]. Filtered methods select representative features by designing feature evaluation metrics or correlation statistics by thresholding or heuristic search methods [11]. Commonly used feature evaluation metrics are entropy, mutual information, information gain, Gini coefficient and so on. Feature selection methods based on rough sets, neighborhood rough sets and fuzzy neighborhood rough sets are an important class of heuristic feature selection methods, which are widely studied because they ensure that the selected feature subset maintains or even exceeds the classification ability of the original features by designing feature evaluation functions with good properties [12-14].
In the era of big data, with the increasing data size, it is increasingly difficult to construct machine learning models for classification tasks. Feature selection is an important technique in data preprocessing, which aims to reduce the error rate of the classification task and decrease the number of features required for training [15-16]. Especially in high-dimensional datasets, there are a large number of redundant features that tend to negatively affect the effectiveness of the classifier. Therefore, in the high-dimensional feature selection classification problem, it is particularly important to filter out features that are highly relevant to the classification labels and are not redundant with each other [17-18]. In recent years, scholars have proposed many feature selection algorithms for the classification problem, although they can obtain good classification results, their feature selection performance still needs to be further improved when facing high-dimensional feature classification [19-20].
This paper analyzes the coherence of features, finds the application value of feature selection, and sets up its general process as the basis for studying the feature selection problem in high-dimensional data. The basic principle of wavelet threshold filtering algorithm, the process of selecting threshold parameters and threshold functions, and the limitations of this algorithm are described. Optimize the process of feature selection by using the genetic algorithm based on scaling cross operator threshold filtering, and introduce the weighted attenuation memory factor to reduce the classification error rate of feature selection. Through comparative experiments, study the differences between the optimization method of this paper and other progressive algorithms in feature causality selection, analyze and compare the classification error rate of several algorithms, and verify the effectiveness of scaling crossover operator in reducing the classification error rate of the optimization method of this paper.
The evaluation criterion for feature selection focuses on the direction of assessing the coherence and redundancy between the features and the target variable, i.e., the ultimate goal of the research on feature selection methods is to remove the features that are incoherent or redundant with the target variable in the original dataset. The coherence of features is broadly categorized into three types, i.e., strong coherence, weak coherence, and incoherence.
Assuming that the entire set of features in the training data is denoted as Ω, where the The sufficient necessary condition for strong correlation with respect to the target information A sufficiently necessary condition for
and for ∃ A sufficiently necessary condition for
From the above definition of feature coherence, it can be seen that features with strong coherence to target information
Feature selection is to treat all possible feature combinations as a search object in the search space, the goal is to remove those features that are irrelevant to the target variable and contain redundant information, and then find a subset of features with the best discriminative ability to model the empirical data. The simplified dataset can effectively shorten the model training time and improve the accuracy of the constructed model. Additionally, by using simplified data to create models, researchers can gain greater assistance in comprehending and analyzing the original data.
The theoretical value of the definition of feature selection lies in the following three aspects: 1) the theoretical definition of feature selection is completed from the perspective of coherence and redundancy between features and target variables, which points out the direction of the research on the feature selection problem: 2) it clarifies the purpose of the research on the feature selection problem, i.e., searching for the best performance feature combinations from the original feature space, which unites the research ideas for researchers; 3) The definition of feature selection does not limit the optimization objective of the problem, but is guided by the discovery of the knowledge hidden behind the original dataset, which gives researchers more space for algorithmic choices in solving specific problems, and greatly expands the scope of application of the definition of feature selection.
Figure 1 shows the process of the feature selection algorithm, which contains five basic steps:
The feature selection algorithm starts with the initialization process. The initialization process is the first step of the feature selection algorithm, which is based on all the original features in the problem. For example, in PSO-based feature selection algorithms, the search space dimension is usually set to the total number of all available features for this process. The process of generating a subset of candidate features. This search process can start with no features, all features, or a random subset of all features. Many search techniques are applied in this feature subset search step to search for the best subset of features, including conventional methods and evolutionary computational techniques. Evaluation process, which is used to test the merit of the feature subset. The subset of features generated by the search process will be examined by the evaluation function to determine their merit. The evaluation function plays an important role in the feature selection algorithm as it helps guide the algorithm in searching for the optimal subset of features. Decide when to stop based on given criteria. The stopping criteria can be based either on the search process or the evaluation function. Stopping criteria based on the search process can be whether a predetermined number of features have been selected and whether a predetermined maximum number of iterations has been reached. Stopping criteria are determined by the evaluation process and include whether adding or removing any features does not produce a better subset and obtaining the best subset according to a certain evaluation function. The loop continues until the stopping criteria is satisfied. A validation process for testing whether a subset of features is valid. The validation process is not a part of the feature selection itself, but it is necessary to validate the effectiveness of the feature selection algorithm, so it is necessary to validate the pre-selected subset of features in a test set.

General process of feature selection
The two key factors in a feature selection algorithm are the search strategy and the evaluation criteria. Feature selection asks for a search space with 2
Evaluation criteria are used to form an evaluation function, which determines the merit of the selected features and guides the algorithm’s search. The optimal subset of features is always associated with an evaluation function. For the feature selection problem, the optimal subset of features selected using one evaluation function may be different from the optimal subset of features selected using another evaluation function. Due to the problem of feature interaction, the subset of features that can achieve the highest classification performance is usually a set of complementary features.
Evaluating good features will preferentially guide the algorithm to search for complementary feature subsets. And depending on whether a learning, classification algorithm is known to be used in the evaluation function or not, existing feature selection algorithms can be broadly categorized into two groups: filter methods and wrapper methods. Filter feature selection algorithms are independent of any classification algorithm, whereas wrapper in methods use a classification algorithm in the evaluation function. The wavelet threshold filtering algorithm described in the following section belongs to the filter method.
This part describes the wavelet threshold algorithm in the filter method, including the basic principle, threshold selection and threshold function selection and other parts, and based on the limitations of the wavelet threshold algorithm, the introduction of the genetic algorithm for optimization, and the design of the scaling crossover operator and threshold filtering part of the genetic algorithm.
Wavelet threshold filtering algorithm is a de-noising method that is simple to implement, small in arithmetic and good in filtering effect. The basic principle is: the noisy image is decomposed into wavelet coefficients of different amplitudes after wavelet transform, the energy of the image signal is concentrated in a few wavelet coefficients, which makes the wavelet coefficients of large amplitude, while the noise energy corresponds to the wavelet coefficients of smaller amplitude, according to the different sizes of the wavelet coefficients corresponding to them, a threshold is set to distinguish these coefficients, the wavelet coefficients of amplitude lower than the threshold is rejected, and wavelet coefficients of amplitude greater than the threshold is retained or shrinkage is done accordingly. The wavelet coefficients whose magnitude is lower than the threshold are eliminated, the wavelet coefficients whose magnitude is greater than the threshold are retained or shrinkage processed accordingly, and finally the remaining wavelet coefficients are utilized for image reconstruction, which realizes the denoising of the image and retains the detailed information of the image well.
Using the threshold method of filtering, there are two key points: the first is the selection of the threshold parameters; the second is the selection of the threshold function.
The wavelet threshold filtering algorithm relies on the threshold selection, and its size has a direct impact on image denoising. If the threshold is too large, it will make the useful high-frequency information (such as edges, details of texture information) is lost, resulting in image distortion, blur; if the threshold is too small, it will make too much noise residual and lead to poor filtering effect.
Currently commonly used image denoising thresholds are VisuShrink thresholds.VisuShrink thresholds are also unified thresholds or universal thresholds, and their threshold
In equation (5),
In Eq. (6),
Threshold function is also called the threshold contraction strategy, threshold function by thresholding the noise-containing high-frequency wavelet coefficients, it applies different processing strategies to wavelet coefficient modes greater than and less than the threshold, with the aim of maximizing the suppression of wavelet coefficients corresponding to the noise and retaining the coefficients corresponding to the image signal.
Let
The interval function in the ( −
Select the output measurements
Genetic algorithm is a kind of global search and optimization algorithm independent of the domain of the problem under study, with good parallel computing performance and global search capability. The basic process of genetic algorithm is shown in Fig. 2, with reference to the genetic phenomena of biological natural selection, gene crossover and mutation, the solution object is coded and expressed, and then a reasonable fitness function is set to select the good population individuals to be copied, and the group with greater fitness is inherited to the next generation, and with reference to the mutation phenomenon, the mutation operation is performed randomly on individual genes in order to expand the search space. In this way, continuous reproduction is carried out until the individual with optimal fitness is obtained or the set loop termination condition is satisfied.

Basic flow of genetic algorithm
The computational steps generally include:
Initialization: the initialization operation generally consists of solving for the binary, floating-point, or symbolic encoding of the parameters, setting the population size, and generating the initial population and evolutionary generations. Selection: on the basis of the criterion as fitness, genetically superior individuals are selected from the current population and reproduced for inheritance to the next generation. Crossover: Pairing within the optimized population after selection according to some probability, exchanging some information between better adapted individuals in order to reproduce individuals of the new population. Mutation: For individuals selected according to a certain probability, the genes at certain positions are changed according to the probability of mutation, and new individuals with broadened diversity are obtained.
In genetic algorithms, the crossover operator is the main feature that distinguishes them from other evolutionary algorithms. By simulating the biological hybrid reproduction process to perform gene crossover operations on individuals, it increases the diversity of individuals, enlarges the range of optimality search, and plays the role of reaching the global optimal solution in the solution space.
The specific operation is as follows: in the selected population, two individuals are selected according to the design pattern, and part of the genes of both are exchanged in a designed way according to the crossover probability
In this paper, floating point coding is used to express the gene values in each individual as a floating point number in the corresponding range, and the coding length of the individual is equal to the number of variables in the coding length. The single-point crossover, two-point crossover and uniform crossover applicable to binary can also be applied to floating-point coding, while arithmetic crossover operator, scaling crossover operator, etc. can also be used based on floating-point coding. Considering the large length of individuals, this paper uses the scaling crossover operator. After the initial population selection parameter values are randomly dispersed in the solution space, the general population size is large, after mixing the weightless selection operator, the existing size can be obtained less than or equal to the initial population size, followed by carrying out the selection operator operation in the existing population.
The scaling crossover probability is set up in such a way that after sorting the fitness size of the existing population individuals, two individuals are selected to generate new individuals by non-uniform linear combination with the proportion of their fitness as the benchmark. Assuming an arithmetic crossover between Individual
The following describes the selection strategy for the two crossover individuals, the first choice is to find the largest and smallest fitness values based on the fitness of the existing population after sorting, which account for
In the form of a roulette wheel as shown in Fig. 3, after selecting two individuals, the scaling crossover operator is operated according to Eq. (9), and at the end of the crossover step, the new individual is added to the population, so that the number of existing populations is

Roulette selects cross individuals
Since
Taking
In order to overcome the dispersion, it is necessary to reduce the value of
Using an exponentially weighted approach,
Replacing the data prior to moment
The initial value is
In the formula, the variable with “*” is the value of the variable at the
The threshold filtering dispersion criterion is
where
In order to suppress dispersion and increase the weight of the new measurement value,
In practice, since different data channels require different attenuation factors to achieve the effect of suppressing dispersion in the calculation process, multiple attenuation memory factors are added to realize the fading attenuation operation at different rates for different data channels, respectively, which can result in multiple attenuation memory factor filters with higher performance.
Equation (16) can be improved as
where
This part focuses on causal feature selection methods for high-dimensional fixed data. Feature selection is typically used to identify a subset of relevant features from a large number of features to construct better predictive models. In bioinformatics, feature (e.g., gene) selection identifies a small number of informative genes for disease prediction. In the era of big data, feature selection is more critical than ever because of the ubiquity of high-dimensional data in a variety of applications. For example, a gene expression dataset in cancer genomics may easily exceed 10,000 features. In recent years, causal feature selection has gradually attracted more attention in the fields of machine learning and causal discovery. The main application areas are bioinformatics, neuroscience, and intelligent systems, aiming at identifying the potential causal relationships (MB) of target features T in order to construct accurate predictive models. Under the fidelity assumption, the MB in a synthetic dataset (BN) is unique among the target features T. MB discovery is crucial for scalable BN structure learning and optimal feature selection. For example, by finding the MB of each feature in the dataset, the discovered MBs are used as constraints in this paper to develop efficient local-to-global structure learning methods. Meanwhile, according to the MB of the target feature T in the dataset, all the remaining features are conditionally independent of the target feature T; therefore, the MB of the category attributes is theoretically optimal for feature selection. Therefore, the goal of this paper is to achieve efficient and accurate MB discovery.
In this section, the efficiency and efficacy of the optimization method of this paper are compared with four MB discovery techniques, namely IAMB, MMMB, STMB, and HITON-MB. There are two sets of experiments: the first one is the resultant accuracy of selected features of MB, and the second one is the running time. To assess the accuracy G2 (discrete data) and Fisher z-test (continuous data).
All the experimental results were conducted on Windows 11 with Intel Core i7-8750H with 4.10 GHz CPU and 8-GIGABYTE (GB) memory. In the experiments, two statistical conditional independence (CI) tests were used, e.g., the G2 test for the discrete dataset and the Fisher z test for the continuous dataset, with significance levels of 0.02 and 0.06, respectively.In Table 1, five benchmark BNs with a sample size of 500 are depicted.These BNs have between 30 and 802 features (variables/nodes), and the benchmark BNs obtain the number of features (variables/nodes) for each of the network’s feature (variable/node) of the MB.
Benchmark BN’s (synthetic dataset)
| Bayesian Network | #Features | #Edges | Max In/Out Degree | Min/Max |PC set| |
|---|---|---|---|---|
| Child | 30 | 45 | 3/7 | 2/8 |
| Child10 | 204 | 128 | 2/7 | 2/9 |
| Pig | 445 | 591 | 4/39 | 2/41 |
| Gene | 802 | 973 | 3/10 | 1/11 |
| Hailfinder | 58 | 65 | 5/16 | 2/17 |
Table 2 shows the experimental results of the optimization method in this paper with IAMB, MMMB, STMB, and HITON-MB on five synthetic datasets with a sample size of 500. Figure 4 shows the runtime comparison of the optimization method in this paper with IAMB, MMMB, STMB, and HITON-MB on five synthetic datasets with a sample size of 500. The experiments use the F1, Precision, and Recall metrics, as well as classifiers from MATLAB’s built-in functions or tools, to evaluate the accuracy of the generated MBs with respect to the target MBs and to determine their similarity to real MBs.
Evaluation measures on different benchmarks BNs
| Network | Algorithm | F1 | Precision | Recall | Run-Time (s) |
|---|---|---|---|---|---|
| Child | IAMB | 0.79/0.68 | 0.87/0.68 | 0.78/0.79 | |
| MMMB | 0.75/0.69 | 0.88/0.78 | 0.72/0.71 | 0.25 | |
| STMB | 0.74/0.71 | 0.92/0.86 | 0.66/0.66 | 0.15 | |
| HITON-MB | 0.83/0.82 | 0.95/0.85 | 0.76/0.76 | 0.25 | |
| WTMB | 0.13 | ||||
| Child10 | IAMB | 0.63/0.63 | 0.61/0.62 | 0.66/0.65 | |
| MMMB | 0.68/0.67 | 0.71/0.68 | 0.64/0.65 | 2.41 | |
| STMB | 0.57/0.51 | 0.71/0.57 | 0.61/0.56 | 2.26 | |
| HITON-MB | 2.31 | ||||
| WTMB | 0.73/0.71 | 0.84/ |
0.65/0.61 | 2.22 | |
| Pig | IAMB | 0.73/0.74 | 0.77/0.81 | 0.71/0.73 | |
| MMMB | 0.82/0.81 | 0.81/0.81 | 0.82/0.82 | 13.01 | |
| STMB | 0.76/0.72 | 0.76/0.82 | 0.74/0.70 | ||
| HITON-MB | 0.92/0.74 | 0.86/0.62 | 0.91/0.91 | 14.77 | |
| WTMB | 11.76 | ||||
| Gene | IAMB | 0.62/0.67 | 0.61/0.63 | 0.67/0.67 | |
| MMMB | 0.71/0.71 | 0.67/0.67 | 0.77/0.77 | 23.16 | |
| STMB | 0.55/0.17 | 0.43/0.11 | 0.72/0.72 | 30.23 | |
| HITON-MB | 0.81/0.79 | 0.78/0.67 | 0.91/0.90 | 23.33 | |
| WTMB | 21.86 | ||||
| Hailfinder | IAMB | 0.21/0.22 | 0.24/0.23 | 0.16/0.17 | |
| MMMB | 0.22/0.23 | 0.28/0.27 | 0.16/0.17 | 0.31 | |
| STMB | 0.17/0.16 | 0.27/0.24 | 0.13/0.12 | 1.52 | |
| HITON-MB | 0.21/0.21 | 0.31/0.27 | 0.16/0.16 | 0.76 | |
| WTMB | 0.22 |

Running time (seconds) of the algorithm on five base BNS
Analysis of Table 2 and Fig. 4 reveals that in Child network, this paper’s optimization method is more accurate than IAMB, MMMB, STMB, and HITON-MB in terms of F1, Precision, and Recall rates, whereas the running time is faster than MMMB, STMB, and HITON-MB, but slower than IAMB. On Child10 network, the optimization method in this paper achieves higher accuracy than IAMB, MMMB, STMB in terms of F1 and Precision. In Recall, the optimization method in this paper outperforms IAMB, MMMB, and STMB.HITON-MB is more accurate than the optimization method in this paper in F1 and Recall, however, the accuracy of the optimization method in this paper is comparable to that of HITON-MB considering the accuracy of 0.06 significant level. The running time of the optimization method in this paper is higher than MMMB, STMB, HITON-MB and lower than IAMB.
The optimization method in this paper is more accurate and runs faster than MMMB, STMB, and HITON-MB on large networks like Pig and Gene networks with sample size of 500. In addition, the optimization method in this paper is more accurate than IAMB but runs slower. On dense networks like Hailfinder, the optimization method in this paper is faster than the other three algorithms but slower than IAMB. The optimization method in this paper outperforms all tested methods in terms of F1, Precision and Recall.
The running time of the optimization method in this paper is generally faster than MMMB, STMB, and HITON-MB, but slower than IAMB because the optimization method in this paper reduces the number of independence tests for the entire candidate MB using the target T, which are the conditional independence tests that are currently selected at each estimation. However, IAMB requires more data samples for each test because the number of data samples required is exponentially related to the size of the condition set. Thus, the IAMB learning technique is time-efficient but not data-intensive. When the sample size of the data set is insufficient, the IAMB algorithm is unable to reliably find MBs. due to the limited number of samples, the quality of the MBs learned by the IAMB algorithm is greatly affected in practice, and is not as effective as the optimization method in this paper overall.
Comprehensive experimental results show that compared with other existing algorithms, the optimization method in this paper generally has better performance in terms of F1, Precision, Recall, and Run-Time, and is able to find the MB of the target feature T more reliably.
In order to verify the superiority of the optimization method in this paper, we compare the feature classification error rates of several algorithms in the data set through experiments, and further compare the key “scaling cross operator” of the optimization method in this paper with other operators in terms of feature scales, so as to verify the effectiveness of the scaling cross operator.
In terms of parameter settings, this paper uses the default parameters of weka for the optimization methods in this paper. Table 3 shows the parameter settings of the other four algorithms. All algorithms use the performance of the classifier as the evaluation criterion for the feature subset. To be fair, KNN (k=1) is used as the classifier in this paper, and the maximum number of iterations and population size of all algorithms are set to 60 and 350, respectively. It should be noted that all the algorithms have both single-objective and multi-objective optimization algorithms, but the classification error rate is a common concern for all feature selection algorithms, so this paper selects the individual with the lowest training error rate from the final solution set of each algorithm at the last iteration for comparison.
Experimental parameter Settings
| Algorithm | Parameters |
|---|---|
| IAMB | Attenuation factor |
| MMMB | Mutation probability=0.2,Mutation retry number=2 |
| STMB | |
| HITON-MB |
After setting up the experimental parameters, this paper compares the classification error rates of the five algorithms in the datasets. In a total of 20 comparisons in 5 datasets, the optimization method in this paper won 16 times, drew 1 time, and lost 3 times. In terms of classification error rate, the optimization method of this paper ranks first in 4 out of 5 datasets. Therefore, the experimental results verify that the optimization method presented in this paper is effective in dealing with high-dimensional datasets. The advantage of this paper’s optimization method over other more advanced feature selection algorithms is mainly the combination of two search operators with causal relationships using the scaling crossover operator.
To further validate the effectiveness of the optimization method in this paper, Table 4 gives the results of comparing the nonparametric effect sizes of the optimization method in this paper with those of other algorithms using Cliff’s delta, which denotes the degree of overlap between the two sets of samples, ranging from -1 to 1. In general, when |
Comparison of nonparametric effect sizes using Cliff’s delta
| Dataset | Algorithm | δ |
|---|---|---|
| Child | IAMB | 0.76 |
| MMMB | 0.39 | |
| STMB | 0.61 | |
| HITON-MB | 0.75 | |
| WTMB | 0.80 | |
| Child10 | IAMB | 0.75 |
| MMMB | 0.43 | |
| STMB | 0.58 | |
| HITON-MB | 0.71 | |
| WTMB | 0.75 | |
| Pig | IAMB | 0.17 |
| MMMB | 0.39 | |
| STMB | 0.42 | |
| HITON-MB | 0.51 | |
| WTMB | 0.49 | |
| Gene | IAMB | 0.33 |
| MMMB | 0.54 | |
| STMB | 0.41 | |
| HITON-MB | 0.19 | |
| WTMB | 0.40 | |
| Hailfinder | IAMB | 0.61 |
| MMMB | 0.42 | |
| STMB | 0.37 | |
| HITON-MB | 0.55 | |
| WTMB | 0.76 |
In order to further validate the effectiveness of the scaling crossover operator proposed in the optimization method of this paper, three scaling crossover operators of the same class are used in this part for performance comparison, namely arithmetic crossover operator, selection operator and mutation operator. Figure 5 shows the performance of the scaling crossover operator, arithmetic crossover operator, selection operator and mutation operator on “Child10”, “Pig”, “Gene” and “Hailfinder” dataset, respectively.

Box diagram of feature dimensions obtained by four operators
As shown in Fig. 5, the scaled crossover operator in the optimization method of this paper shows a clear advantage in selecting the number of features compared with the other three operators. Specifically, the scaled crossover operator can obtain a smaller subset of features in the dataset. And the experimental results show that different selection probabilities of different operators lead to different performance in different datasets. The higher the selection probability, the smaller the size of the selected features, and the higher the error rate, and vice versa. This shows that the selection probability of the scaled crossover operator has a key impact on the performance of the optimization method in this paper on different datasets. A lower selection probability of the scaling crossover operator for selecting features and obtaining a smaller subset of features will also be able to improve the correct MB selection rate of this paper’s optimization method in the target feature T.
This paper investigates the available algorithms for the feature selection problem in high-dimensional data. The scaling crossover operator in the genetic algorithm is used to optimize the wavelet threshold filtering algorithm, and the weighted decay memory factor is introduced, and the optimized method and the scaling crossover operator are verified through comparative experiments such as feature selection of datasets, so as to verify the applicability of the optimized method in this paper. The optimized method of this paper and the other four algorithms are applied to five datasets for potential causality identification of target features, and the significance levels are 0.02 and 0.06, respectively.The experimental results show that the optimized method of this paper generally possesses higher potential causality identification accuracies than the other four algorithms in terms of F1, Precision, and Recall rate, and is relatively faster in terms of Run-time. It is also relatively faster in terms of runtime. The optimization method in this paper is able to identify the potential causality of target features more reliably.
The classification error rate of this paper’s optimization method is evaluated against other algorithms. In a total of 20 comparisons of 5 datasets, the optimization method of this paper won 16 times, tied 1 time, and lost 3 times, with an 80% winning rate. And in terms of classification error rate, the optimization method presented in this paper was ranked first in 4 out of 5 datasets. Using Cliff’s delta to compare the nonparametric effect sizes of the five algorithms, the |
Comparing the scaled crossover operator with several other operators in terms of feature subset scales confirms that the scaled crossover operator has a clear advantage in terms of the number of selected features. The scaled crossover operator has a lower probability of selecting features and can obtain a smaller feature subset, which plays a key role in improving the correctness of feature potential causality identification of the optimization method in this paper.
