Research on Employment and Entrepreneurship Potential Mining and Cultivation Mechanism of College Students Based on Decision Tree Modeling
Published Online: Sep 26, 2025
Received: Jan 14, 2025
Accepted: Apr 18, 2025
DOI: https://doi.org/10.2478/amns-2025-1080
Keywords
© 2025 Yan Kong, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
As the number of graduates grows sharply in China at this stage, the number of jobs available in the society grows slowly, thus creating a very serious employment conflict [1-2]. At the same time, the conflict between the marketization of graduate employment and the lagging employment concept of graduates appears to be very prominent, graduates to economically underdeveloped regions and grassroots employment trend has not yet formed, the structural contradiction between supply and demand of talents still exists, which affects the employment of college graduates [3-6], in this context, the mining of college students’ employment and entrepreneurship potentials and talent cultivation through the big data technology, is of great significance [7-9].
The cultivation of college students’ employment and entrepreneurship ability is an important task in current higher education, and the cultivation of college students’ employment and entrepreneurship ability requires a variety of paths and methods [10-12], including curriculum and teaching reform, entrepreneurial practice and internship opportunities, innovation and entrepreneurship education platform construction, as well as employment and entrepreneurship guidance and counseling services [13-16]. Through the implementation of these measures, it can effectively improve the competitiveness of college students in employment and entrepreneurship, cultivate their entrepreneurial spirit and practical ability, and lay a solid foundation for their future employment and entrepreneurship road [17-19]. The students’ self-improvement should also actively participate in various practical activities to improve their own ability. Only through the joint efforts of schools and students, can we better cultivate the employment and entrepreneurship of college students and contribute to the development of social economy [20-23].
In this paper, the improved decision tree C4.5 algorithm is used to construct a prediction model of college students’ employment and entrepreneurship potential, to make preliminary prediction of college students’ employment and entrepreneurship potential, and to pave the way for the guidance of college students’ employment and entrepreneurship. The main influencing factors of college students’ employment and entrepreneurship potential are extracted by factor analysis, and firstly, it is verified whether the data of college students’ related information used in this paper meet the standard of factor analysis. Then the number of relevant factors was determined using the principal component method of extraction and the gravel plot test criterion. In order to increase the variance gap of the factors so that they can be defined and interpreted, the factors were renamed using the rotation method. Finally, the scores of the factors were calculated to replace the original variables, thus completing the characterization of the employment and entrepreneurial potential of college students. The available data were analyzed to verify the effectiveness of the decision tree model and factor analysis model in this paper on the task of mining and cultivating students’ employment and entrepreneurial potential.
Data collection. This study mainly collects data from three aspects: basic information of college graduates, graduate performance information, and graduate employment information, and organizes them. Establish the data information form. Data integration and summarization. Obtain the original data of graduates and integrate them, delete the data with the same attributes, summarize the information, form the information summary table, and exclude the factors that have less impact on the employment of graduates. Data conversion. The above data attributes are valued to ensure that they can fall into a small, limited value space, which has a positive significance on the generation of decision trees. The 9 relevant attributes are normalized for taking values. The data conversion rules are shown in Table 1.
Property values and conversion value comparison
| Attribute classification | Attribute name | Attribute value | Conversion value |
|---|---|---|---|
| Basic attribute | Gender | Male and female | 1, 0 |
| Political identity | Party member | 1, 0 | |
| Source information | Eastern region, central region, western region | 1, 0, -1 | |
| Whether to be a student cadre | Yes, no | 1, 0 | |
| Comprehensive achievement | 85 points above 85 points | 1, 0 | |
| Job category | Teaching personnel and non-teaching personnel | 1, 0 | |
| Job matching | Match, mismatch | Y, N | |
| Predictive attribute | Employment situation | Employment and employment | 1, 0, -1 |
Correlation analysis of attributes affecting college graduates’ successful employment. In order to effectively construct the decision tree prediction model of whether graduates are successfully employed, it is necessary to analyze the correlation of the attributes affecting the employment of college graduates, obtain the test attributes, and ensure the accuracy of the decision tree prediction model. SPSS software is mainly used to conduct correlation analysis.
Table 2 shows the correlation analysis of the attributes affecting the employment of college graduates, and the four attributes of graduates’ comprehensive achievement, whether they are student cadres, birth source information and political identity are the test attributes affecting the successful employment of college graduates.
Analyzes the properties of college graduates’ employment
| Employment | Cadre | Source information | Grade | Political identity | Gender | ||
|---|---|---|---|---|---|---|---|
| Employment | Cor | 1.000 | 0.755 | 0.705 | 0.822 | 0.584 | 0.085 |
| Sig.2 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.156 | |
| N | 1200 | 1200 | 1200 | 1200 | 1200 | 1200 | |
| Cadre | Cor | 0.728 | 1.000 | 0.516 | 0.638 | 0.451 | 0.003 |
| Sig.2 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.566 | |
| N | 1200 | 1200 | 1200 | 1200 | 1200 | 1200 | |
| Source information | Cor | 0.705 | 0.511 | 1.000 | 0.584 | 0.412 | 0.035 |
| Sig.2 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.563 | |
| N | 1200 | 1200 | 1200 | 1200 | 1200 | 1200 | |
| Grade | Cor | 0.825 | 0.634 | 0.595 | 1.000 | 0.0477 | 0.035 |
| Sig.2 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| N | 1200 | 1200 | 1200 | 1200 | 1200 | 1200 | |
| Political identity | Cor | 0.559 | 0.454 | 0.412 | 0.485 | 1.000 | 0.042 |
| Sig.2 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| N | 1200 | 1200 | 1200 | 1200 | 1200 | 1200 | |
| Gender | Cor | 0.085 | 0.001 | 0.033 | 0.031 | 0.045 | 1.000 |
| Sig.2 | 0.158 | 0.981 | 0.566 | 0.000 | 0.000 | 0.000 | |
| N | 1200 | 1200 | 1200 | 1200 | 1200 | 1200 |
The C4.5 [24] algorithm is an improvement of the ID3 algorithm. Unlike the ID3 algorithm [25], the C4.5 algorithm selects attributes for each node of the tree based on the information gain rate. The algorithm selects the attribute with the highest gain rate as the test attribute for the current node. This attribute minimizes the amount of information needed to classify the samples in the resulting partition and reflects the minimal randomness or “impurity” of the partition. This theoretical approach minimizes the expected number of tests required to classify an object and ensures that a simple tree is found. For the sake of convenience, the concepts are explained below.
Definition 1: Let dataset
where
The entropy reflects the average uncertainty and purity of the sample set
Definition 2: Let an attribute
where
Definition 3 Information gain is defined as:
For sample set
Definition 4 Information gain rate is defined as:
where
Calculate the information gain rate for each attribute by using the above formula. The attribute with the highest information gain rate is selected as the test attribute for the given set
Algorithm description Assuming Flowchart of C4.5 algorithm According to the description of C4.5 algorithm, the flowchart of C4.5 algorithm is given, and the flow is shown in Fig. 1.

Flowchart of the C4.5 algorithm
Pruning a decision tree is to cut out the replaceable subtrees and replace them with leaves to simplify the decision tree. The algorithm is also used to reduce the prediction error to improve the quality of the classification model. If the expected misclassification rate of the subtree is greater than the error rate predicted by a single leaf, replacement needs to be performed.
The C4.5 algorithm utilizes a post pruning approach, which uses pessimistic pruning when evaluating the prediction error. The method uses the training sample set itself to estimate the error before and after pruning to decide whether to actually prune or not. The formulas used in the method are as follows:
Where
Determine the size of
To construct a decision tree prediction model for employment and entrepreneurship potential, the final classification result will be whether students are successfully employed, that is, employment is represented by Y, and employment is represented by N. After data preprocessing and attribute screening, the test attributes determined are “political identity”, “student source information”, “cadre situation”, and “grades”.
Calculate the amount of information of categorized attributes There are 2000 sample data in the training set, among which there are 150 data whose class is employment and 50 data whose class is to be employed, and the amount of information of the categorized attributes can be obtained according to the formula. The test set contains 1200 samples. Calculate the information entropy of each test attribute For the attribute “political identity”, there are two attribute values in this attribute, first of all, we need to calculate the information quantity of the subset divided by each attribute value, and then get the information entropy of this attribute on this basis. Calculate the information gain of the test attribute. Calculate the split information entropy of each test attribute. Calculate the information gain rate of each test attribute. Select the attribute with the largest information gain rate. There is 5) the information gain ratio of the attribute “Grade” is 0.0126, which is the largest, according to the idea of the C4.5 algorithm, the “Grade” attribute will be selected as the root node, and the “Grade” attribute has two attribute values, so the training set sample will be divided into two parts. Repeat steps 2)-6) to complete the division of each branch. The subset formed after each division is then classified according to the above computational ideas, until all the samples belong to the same category or traverse all the test attributes, the final decision tree model can be formed. In order to solve the cumbersome calculation brought by large data samples and many test attributes, this study designs and develops a simple college student employment prediction tool based on the core idea of C4.5 algorithm, using Python language and excel as the database for storing sample data. The tool mainly has the functions of constructing a prediction model, evaluating the prediction accuracy of the model, and making prediction applications. Based on the employment prediction tool of college students, the decision tree prediction model of employment and entrepreneurship potential can be constructed, click the “Select Training Set” button in the interface, load the data table named “Training Set 1” into the program, and click the “Generate Prediction Model” button to generate a decision tree prediction model of employment and entrepreneurship potential, as shown in Figure 2.

Decision tree prediction model of “Whether employment can be smooth”
CJ, GB, SYXX, ZZSF are the abbreviations of the attributes “grades”, “cadres”, “student information” and “political identity”, respectively, and the 0 and 1 on the directed arrows represent the values of each attribute respectively, and the specific meaning can be referred to Table 1 in Chapter 2, so as to complete the construction of the decision tree prediction model of employment and entrepreneurship potential. According to the final decision tree model, the corresponding classification rules can be obtained by traversing from the root node to each leaf node in turn, and there are 14 classification rules corresponding to the employment and entrepreneurship potential prediction model. Table 3 describes the details.
Rules for the classification of successful employment
| Number | Rule | Conclusion |
|---|---|---|
| 1 | Grades less than 85 points and cadres=No and political identity=Non-party and source=Central region | Employment |
| 2 | Grades less than 85 points and cadres=No and political identity=Non-party and source=Eastern region | Unemployment |
| 3 | Grades less than 85 points and cadres=No and political identity=Non-party and source=Western region | Employment |
| 4 | Grades less than 85 points and cadres=No and political identity=Party and source=Central region | Employment |
| 5 | Grades less than 85 points and cadres=No and political identity=Party and source=Eastern region | Employment |
| 6 | Grades less than 85 points and cadres=No and political identity=Party and source=Western region | Unemployment |
| 7 | Grades less than 85 points and cadres=Yes | Employment |
| 8 | Grade greater than 85 points and source of land =Central area | Employment |
| 9 | Achievement = greater than 85 points and raw land = east region and political identity = non-party member | Employment |
| 10 | Achievement = greater than 85 points and origin = eastern region and political identity = Party member and cadre = No | Employment |
| 11 | This cable = more than 85 points and origin = eastern region and political identity = party member and cadre = Yes | Unemployment |
| 12 | Achievement-greater than 85 points and origin = western region and cadre=No | Employment |
| 13 | Achievement = more than 85 points and the western region of the origin and, the cadre is and political identity = non-party member | Employment |
| 14 | Achievement = more than 85 points and origin = western region and, cadre is and political identity = party member | Unemployment |
After the decision tree C4.5 prediction model is constructed, it needs to be tested whether its accuracy meets the requirements. Using the randomly selected test set data in the previous section, we can evaluate the accuracy and applicability of the decision tree C4.5 model.
The correct rate of classification predicted by the decision tree C4.5 model is shown in Table 4. It can be seen that the training set has a correct prediction rate of 88.59%. The correct rate for the test set is 84.56%. The correct rate of both predicting the employment and entrepreneurial potential of college students is high.
The accuracy of graduation prediction
| Model | Sample set | Accuracy/% | Error rate/% |
|---|---|---|---|
| C4.5 | Training set | 88.59% | 11.41% |
| Test set | 84.56% | 15.44% |
The prediction accuracy of each classification of employment and entrepreneurship potential was calculated separately based on the test set as shown in Table 5. It can be obtained that the prediction results of employment and entrepreneurship potential of the decision tree C4.5 model are 82.53% accuracy for governmental organizations/institutions and state-owned enterprises, 85.03 % for further education, 79.15% for foreign-funded enterprises and private enterprises, and 84.41% for freelancing. It can be seen that the accuracy rate of employment and entrepreneurship potential fluctuates, and the accuracy rate of the prediction of further education is the highest, more than 85%.
The accuracy of graduation prediction
| Government agencies/institutions, state-owned enterprises | Promotion | Foreign enterprises, private enterprises | Freelancing | Acc/% | |
|---|---|---|---|---|---|
| Government agencies/institutions, state-owned enterprises | 241 | 15 | 17 | 19 | 82.53 |
| Promotion | 15 | 267 | 14 | 18 | 85.03 |
| Foreign enterprises, private enterprises | 27 | 24 | 262 | 18 | 79.15 |
| Freelancing | 14 | 15 | 12 | 222 | 84.41 |
The gain assessment curves for the prediction results of the Decision Tree C4.5 model are shown in Figure 3. The top curve is the optimal gain curve and the middle curve is the gain curve of the decision tree C4.5 model. The overall trend of the gain curve of the Decision Tree C4.5 model is relatively similar to that of the best curve, with a small difference, and the difference between the two gains never exceeds 15%. In summary, it can be seen that the gain assessment curve of the decision tree C4.5 model fits the optimal curve to a high degree, and has a good classification prediction effect on the prediction of college students’ employment and entrepreneurship potential.

Gain evaluation curve
The boosted assessment curves for the prediction results of the Decision Tree C4.5 model are shown in Figure 4. The rightmost curve in the figure is the optimal curve, and the following curve is the boosting curve for the decision tree C4.5 model. In the lifting curves corresponding to the range of 0 to 30 percentile, the actual lifting curve decreases, and then the curve becomes steeper when it reaches around the 30th percentile or so. And at this point, the optimal curve is always straight. In the greater than 30th percentile range, the boosting decreases continuously, but the trend of boosting in this range is similar to the optimal boosting curve. In summary, the lift assessment curve of the decision tree C4.5 model fits the optimal curve better, and the confidence level of the decision tree C4.5 model rule is high.

Improvement curve
Factor analysis [26] is a technique for data simplification, which mainly reflects the idea of data dimensionality reduction. By studying the internal interdependence between variables, a few abstract variables are identified that can synthesize the main information of all variables, which cannot be measured directly, and the abstract variables are usually called factors.
Factor analysis is similar to cluster analysis in that both types are classified as R and Q. R-type factor analysis analyzes variables and Q-type factor analysis [27] analyzes samples. In this paper, R-type factor analysis is chosen to analyze the variables, and the characteristics of R-type factor analysis are that the common factors in R-type cannot be observed intuitively, but they are common factors that exist objectively. Given a sample of
In the above equation
Among them:
is called the factor loading matrix, and
This mathematical model needs to satisfy the following four aspects:
The decomposition of the covariance matrix of the original variable
The smaller the value of
The loading matrix is not unique, let Statistical significance of factor loadings For the factor model:
The covariance between
If
Then, for the standardized Statistical significance of variable commonality. There are factor models known:
The common degree of quantity
If
Statistical significance of the variance contribution
is the contribution of public factor
First, the original data with high reliability and authenticity are selected, and such data are usually obtained based on the research of actual problems.
Second, standardize all the original variables to eliminate the influence of variables in the order of magnitude, and then obtain the correlation matrix based on the standardized data and convert it into the correlation between variables.
Third, the principal component analysis [28] is used to solve the common factor and the factor loading matrix is derived.
Fourth, in order to make the coefficients in the factor loading matrix more significant, the factor loading matrix can be rotated, and in this paper, the maximum variance orthogonal rotation method is utilized to maximize the relative sum of squares of the loadings, and the factors are interpreted by naming.
Fifth, calculate the component matrix scores of the factors.
Sixth, analyze the results and draw conclusions
The logic diagram of factor analysis is shown in Figure 5:

Logic diagram of factor analysis
In order to initially choose the measurement indicators to cover as many factors affecting the employment situation of graduates as possible, this paper establishes a relatively reasonable indicator system for the description of graduates.
Starting from the needs of enterprises, the factors affecting the employment quality of college students are shown in Figure 6. On the basis of a comprehensive understanding of the employment situation of fresh graduates of the information class, 15 measurement indicators are proposed following the five principles of indicator selection mentioned above.

Factors affecting the employment quality of college students
A sample survey was conducted on the graduates of the last three years from three universities, namely, University X, University Y and University Z, in the area of City B. A total of 620 questionnaires were distributed and 600 questionnaires were recovered, among which 600 questionnaires were valid, and the validity rate of the questionnaires was 100%.
This paper divides the salary grade according to the proportion difference between the individual sample salary and the average salary of the sample set as a criterion, and the division is shown in Table 6, in which St indicates the salary of a single sample and Sd indicates the average salary of all sample data.
The table of Income classification
| Evaluation index | Evaluation criteria | Evaluation level |
|---|---|---|
| Remuneration | St≥1.5Sd | A |
| 1.5 Sd > St≥1.2 Sd | B | |
| 1.2 Sd > St≥0.8 Sd | C | |
| 0.8 Sd > St≥0.5 Sd | D | |
| 0.5 Sd >St | E |
In this paper, 15 observable indicators are used as influencing factors for determining the salary level of college students’ employment, and these indicators are expressed in the form of variables: where X1 indicates GPA scores, X2 indicates English scores, X3 indicates scholarships won, X4 indicates course design scores, X5 indicates project experience, X6 indicates award-winning experience, X7 indicates award level, X8 indicates student cadre experience, X9 indicates cadre level, X10 denotes campus honors awarded, X11 denotes social situation, X12 denotes political appearance, X13 denotes part-time job experience, X14 denotes internship experience, and X15 denotes interview experience.
In order to make the sample model more objective in describing the characteristics of the sample, the data of all the observed variables are normalized to eliminate the gap of the data outline, and all the data are transformed to between [0, 1] for data transformation process.
Before doing the factor analysis, the correlation analysis of the 15 original variables was conducted using the standardized data, and the matrix of correlation coefficients is shown in Table 7. It shows that the correlation coefficients between most of the variables are greater than 0.3. Among them, X1, X2 and X3 have a certain correlation with each other, X4, X5, X6 and X7 have a certain correlation with each other, X8, X9, X10, X11 and X12 have a certain correlation with each other, and X13, X14 and X15 have a certain correlation with each other.
Correlation matrix
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | X12 | X13 | X14 | X15 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| X1 | 1.000 | 0.643 | 0.524 | 0.492 | -0.152 | 0.153 | -0.311 | 0.308 | -0.252 | 0.365 | -0.189 | 0.303 | 0.109 | 0.061 | 0.251 |
| X2 | 0.636 | 1.000 | 0.762 | 0.162 | 0.043 | -0.112 | 0.118 | 0.308 | -0.174 | 0.275 | -0.557 | 0.456 | -0.426 | 0.036 | 0.317 |
| X3 | 0.526 | 0.761 | 1.000 | 0.246 | 0.23 | 0.027 | -0.428 | -0.383 | 0.1 | 0.469 | 0.158 | 0.122 | 0.169 | 0.353 | 0.281 |
| X4 | 0.491 | 0.168 | 0.25 | 1.000 | 0.647 | 0.369 | 0.338 | 0.405 | 0.005 | 0.063 | -0.103 | -0.033 | 0.198 | 0.384 | 0.398 |
| X5 | -0.554 | 0.049 | 0.212 | 0.378 | 1.000 | 0.434 | 0.401 | -0.434 | 0.194 | -0.01 | 0.299 | -0.301 | 0.173 | 0.306 | 0.209 |
| X6 | 0.145 | -0.118 | 0.029 | 0.337 | 0.438 | 1.000 | 0.388 | 0.314 | 0.18 | -0.067 | -0.509 | 0.248 | 0.498 | 0.25 | 0.351 |
| X7 | -0.318 | 0.127 | -0.425 | 0.407 | 0.414 | 0.398 | 1.000 | -0.198 | -0.084 | -0.448 | 0.063 | -0.555 | 0.206 | 0.135 | 0.263 |
| X8 | 0.302 | 0.299 | -0.372 | 0.006 | -0.431 | 0.313 | -0.193 | 1.000 | 0.658 | 0.524 | 0.752 | -0.307 | 0.457 | 0.143 | 0.273 |
| X9 | -0.261 | -0.185 | 0.098 | 0.07 | 0.192 | 0.173 | -0.104 | 0.653 | 1.000 | 0.295 | 0.473 | 0.029 | 0.39 | 0.056 | 0.416 |
| X10 | 0.36 | 0.283 | 0.468 | -0.103 | 0.009 | -0.072 | -0.448 | 0.532 | 0.311 | 1.000 | 0.399 | 0.086 | 0.22 | 0.124 | 0.416 |
| X11 | -0.18 | -0.558 | 0.155 | -0.031 | 0.298 | -0.501 | 0.056 | 0.75 | 0.467 | 0.393 | 1.000 | -0.027 | -0.02 | 0.127 | 0.494 |
| X12 | 0.305 | 0.451 | 0.119 | 0.192 | -0.299 | 0.24 | -0.55 | -0.318 | 0.03 | 0.08 | -0.035 | 1.000 | 0.667 | 0.396 | 0.399 |
| X13 | 0.101 | -0.42 | 0.161 | 0.382 | 0.174 | 0.494 | 0.198 | 0.458 | 0.389 | 0.227 | -0.019 | 0.654 | 1.000 | 0.471 | 0.524 |
| X14 | 0.062 | 0.032 | 0.37 | 0.397 | 0.313 | 0.257 | 0.143 | 0.138 | 0.062 | 0.119 | 0.134 | 0.392 | 0.471 | 1.000 | 0.351 |
| X15 | 0.25 | 0.313 | 0.294 | 0.109 | 0.215 | 0.347 | 0.26 | 0.272 | 0.416 | 0.402 | 0.485 | 0.4 | 0.524 | 0.347 | 1 |
In order to further determine whether the data are suitable for factor analysis, KMO test and Bartlett’s test were performed on the data in this paper. Usually factor analysis requires that the KMO statistic is greater than 0.5 and the sig value of the Bartlett’s spherical test statistic is less than 0.05. In this paper, the KMO value is 0.771 and the sig value of the Bartlett’s test is 0.000, so it is able to do factor analysis.
In this paper, the principal component method is used to extract the common factors and determine the number of factors according to the eigenvalue criterion and the gravel plot test criterion. The eigenvalues are calculated using principal component analysis technique as shown in Table 8 and the gravel plot is depicted in the order of factor extraction as in Figure 7.
Eigenvalue and variance contribution rate
| Factor | eigenvalue | Variance contribution(%) | Cumulative(%) |
|---|---|---|---|
| 1 | 5.278 | 36.955% | 36.955% |
| 2 | 2.196 | 15.356% | 52.311% |
| 3 | 1.906 | 13.335% | 65.646% |
| 4 | 1.113 | 7.752% | 73.398% |
| 5 | 0.868 | 6.083% | 79.481% |
| 6 | 0.798 | 5.602% | 85.083% |
| 7 | 0.685 | 4.750% | 89.833% |
| 8 | 0.473 | 3.287% | 93.120% |
| 9 | 0.350 | 2.451% | 95.571% |
| 10 | 0.263 | 1.803% | 97.374% |
| 11 | 0.161 | 1.189% | 98.563% |
| 12 | 0.095 | 0.657% | 99.220% |
| 13 | 0.077 | 0.541% | 99.761% |
| 14 | 0.022 | 0.123% | 99.884% |
| 15 | 0.021 | 0.116% | 100.000% |

Scree plot
There are four factors with eigenvalues greater than 1 and at the steep slope of the gravel plot. The explained variance of factor 1 is 36.955%, which means that factor 1 is able to explain 36.955% of the information of the original variable alone. Factor 2 has an explained variance of 15.356%, meaning that factor 2 is able to explain 15.356% of the information of the original variable alone. The cumulative variance of Factor 2 is 52.311%, implying that Factor 1 and Factor 2 are able to jointly explain 52.311% of the information about the original variable. The explained variance of Factor 3 is 13.335%, meaning that Factor 3 is able to explain 13.335% of the information of the original variable alone. The cumulative variance of Factor 3 is 65.646%, implying that Factor 1, Factor 2, and Factor 3 are able to collectively explain 65.646% of the information about the original variable. By analogy, the explained variance of factor 4 is 7.752%, implying that factor 4 is able to explain 7.752% of the information of the original variables alone. From the table, it can be seen that the first 4 common factors contributed 73.398% of the total variance, which is sufficient to represent most of the information of the 15 original observed variables. Therefore, it was decided to retain the 4 factors and the initial factor loading matrix can be obtained.
The initial loading matrices of the factors are shown in Table 9. The variance of the factors before rotation does not differ much on different original variables, which makes it impossible to interpret and redefine the factors. Therefore, it is necessary to make the factor loadings approximate to 1 or 0 according to certain rules through the factor rotation technique. In this paper, the variance maximization orthogonal method is used to achieve factor rotation, and the rotated factor loading matrix is shown in Table 10.
Component matrix
| Measuring factor | Public factor | |||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| X1 | 0.526 | 0.480 | 0.255 | -0.304 |
| X2 | 0.345 | 0.201 | 0.266 | -0.019 |
| X3 | 0.429 | 0.385 | 0.089 | 0.358 |
| X4 | 0.170 | 0.256 | 0.202 | 0.209 |
| X5 | 0.388 | 0.515 | 0.468 | 0.189 |
| X6 | 0.423 | 0.421 | 0.210 | 0.393 |
| X7 | 0.515 | 0.463 | 0.462 | -0.122 |
| X8 | 0.495 | -0.322 | 0.674 | 0.387 |
| X9 | 0.457 | 0.427 | 0.360 | 0.296 |
| X10 | -0.389 | 0.268 | 0.294 | 0.193 |
| X11 | 0.301 | 0.315 | 0.291 | 0.309 |
| X12 | -0.138 | -0.105 | 0.292 | 0.312 |
| X13 | 0.205 | 0.289 | 0.360 | 0.404 |
| X14 | -0.210 | 0.419 | 0.309 | 0.407 |
| X15 | 0.289 | 0.342 | 0.210 | 0.355 |
Rotated component matrix
| Measuring factor | Public factor | |||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| X1 | 0.926 | 0.078 | 0.126 | 0.078 |
| X2 | 0.728 | 0.197 | 0.157 | -0.022 |
| X3 | 0.937 | 0.123 | 0.152 | 0.052 |
| X4 | 0.565 | 0.735 | 0.073 | 0.448 |
| X5 | 0.322 | 0.798 | 0.166 | 0.219 |
| X6 | 0.277 | 0.852 | 0.051 | 0.131 |
| X7 | 0.142 | 0.865 | 0.149 | -0.059 |
| X8 | 0.162 | -0.068 | 0.902 | 0.114 |
| X9 | 0.109 | 0.326 | 0.774 | 0.134 |
| X10 | -0.030 | 0.185 | 0.856 | -0.009 |
| X11 | 0.084 | 0.138 | 0.781 | 0.101 |
| X12 | -0.077 | -0.496 | 0.728 | 0.007 |
| X13 | 0.185 | 0.128 | 0.255 | 0.814 |
| X14 | -0.091 | 0.109 | 0.017 | 0.806 |
| X15 | 0.420 | 0.215 | 0.006 | 0.640 |
According to the factor rotation component matrix table, it can be summarized that the four public factors mainly contain the following variable indicators:
Factor 1: can be interpreted and named as students’ “learning ability”. This factor contributes 36.955% of the total variance of all initial variables and 36.955% of the total variance explained. This also indicates that the most critical factor in deciding whether or not to hire a student is whether or not the student’s demonstrated learning ability can meet the requirements of the company.
Factor 2 can be interpreted and named as students’ “practical ability”. The contribution of this factor to the variance of all initial variables is 15.356% of the total variance explained. The factor loadings of the factor on the measures describing awards, i.e., award experience and award level, are larger, which also indicates that students should participate in more competitions both inside and outside the university during their college years, so as to put what they have learned into practice and improve their personal practical ability.
Factor 3: can be interpreted and named as students’ “interpersonal skills”. It contributes 13.335% to the total variance of all initial variables and accounts for 13.335% of the total variance explained.
Factor 4: can be interpreted and named as “vocational ability”. It contributes 7.752% to the variance of all initial variables and 7.752% to the total variance explained.
In order to further simplify the raw data for subsequent analysis to calculate the factor scores and to use the public factors as input variables for the classifier, the factor score values can be calculated based on the factor score function. The regression method was used to inversely represent the public factors as a linear combination of the observed variables, and the factor score matrix was calculated as shown in Table 11.
Component score coefficient matrix
| Measuring factor | Public factor | |||
|---|---|---|---|---|
| 1 | 2 | 3 | 4 | |
| X1 | 0.088 | -0.019 | 0.056 | 0.047 |
| X2 | -0.268 | 0.019 | 0.282 | -0.026 |
| X3 | 0.099 | -0.039 | 0.172 | -0.102 |
| X4 | -0.157 | -0.040 | 0.360 | -0.016 |
| X5 | 0.067 | -0.272 | -0.003 | 0.065 |
| X6 | 0.071 | -0.258 | 0.009 | 0.025 |
| X7 | -0.080 | 0.055 | -0.212 | 0.014 |
| X8 | -0.009 | 0.080 | 0.199 | -0.103 |
| X9 | -0.011 | -0.120 | 0.253 | 0.124 |
| X10 | -0.003 | 0.157 | 0.175 | -0.179 |
| X11 | 0.266 | 0.243 | -0.084 | 0.100 |
| X12 | 0.128 | -0.204 | -0.074 | 0.007 |
| X13 | 0.136 | 0.021 | 0.122 | 0.202 |
| X14 | 0.107 | 0.072 | -0.047 | 0.070 |
| X15 | 0.005 | 0.120 | 0.031 | 0.137 |
After obtaining the factor score function, the graduate description model can be represented by public factors instead of the original variables. The four extracted public factors, i.e., learning ability, practical ability, interpersonal handling ability, and vocational ability, are used as the main influencing factors for exploring students’ salary ratings, i.e., as the input variables of the college students’ salary prediction classifier. The feature vector model describing college students can be expressed as shown in the following equation, where
The improvement of college students’ employment and entrepreneurship ability requires the continuous improvement of their comprehensive quality. In the process of their own ability to cultivate, actively absorb professional knowledge, enthusiastically participate in social practice, to adapt to the enterprises and institutions on the demand for talent, and constantly improve their own quality level, and ultimately to achieve the purpose of employment choice and entrepreneurship.
Career planning, also called career design, combines the individual and the organization, comprehensively considers and summarizes the subjective and objective conditions of the assessor’s career, weighs the assessor’s personal interests, abilities, external evaluations and social relations, combines with the needs of the DU and the assessor’s personal career inclination, determines the most reasonable career goal, and makes scientific and reasonable arrangements for the realization of this goal. Cultivating employment and entrepreneurship is a long and systematic project, and the prerequisite and foundation is scientific and reasonable career planning. The phenomenon of lack of career planning is especially prominent among college students in independent colleges, which largely affects the quality and level of employment and entrepreneurship of college students in independent colleges.
Only by highlighting the marketization and internationalization of the school, strengthening the construction of characteristics, and running the school independently according to the law for the society, can the school adapt to the trend of education development, survive and develop in the increasingly fierce competition, and it is also the fundamental way out and the inevitable choice to cope with the challenge of global economic integration. With the deepening of the new industrial revolution, the development of knowledge-based economy and economic globalization, the trend of internationalization of higher education has come, as a close link with the market economy of the school can not ignore the development of education to send a trend. In the face of intensifying competition, schools can only adapt to the new trend with distinctive schooling characteristics. Can only rise to the challenge, actively participate in the market, international competition, hard work, accurate positioning of their own, school mode should have characteristics, and actively and market supply and demand, in order to turn the crisis into an opportunity for students to create good conditions for employment and entrepreneurship, in order to win for the school’s survival and opportunities for development and growth.
Public service function is one of the important tasks of the modern government, as the provider and manager of this function, whether it is guiding the employment and entrepreneurship activities on the micro level or regulating the employment and entrepreneurship activities on the macro level, it plays a key leading role in solving the employment and entrepreneurship problems of college students.
The government should take the initiative, strengthen the sense of service, and continuously introduce policies and measures to promote the employment and entrepreneurship of college students according to the reality. First, to create a high-quality market economic environment, and work together to create a capital market for college students’ entrepreneurship. Financing difficulties is one of the biggest difficulties faced by college students in the process of entrepreneurship, to solve the financing difficulties of college students’ entrepreneurship, mobilize the enthusiasm of college students’ entrepreneurship, and solidly promote the progress of college students’ entrepreneurship. In addition, we can join hands with enterprises, banks and even government departments to build a capital market for college students’ entrepreneurship, so as to solve the difficulty of starting capital for college students’ employment and entrepreneurship. Second, the government should build a social support and guarantee system for student entrepreneurship. The government should coordinate the relationship between the relevant ministries and departments, increase the policy training, and promote the implementation of the policy among college students. The relevant ministries and departments can set up entrepreneurship guidance organizations to provide employment and entrepreneurship training directly to college students or the general public, and they can also w guide and supervise universities to offer relevant employment and entrepreneurship courses. The government should introduce more and more favorable policies to encourage and support the entrepreneurial activities of university students. For example, it should reduce or waive tuition fees, provide tax incentives, simplify registration procedures, and open entrepreneurial parks so that university students can carry out their entrepreneurial activities without too many worries.
In this paper, the decision tree C4.5 algorithm was used to construct a prediction model of college students’ employment and entrepreneurship potential, and the classical factor analysis model was applied to find out the most significant factors in the influence on college students’ employment and entrepreneurship training, so as to provide suitable employment decisions for the employment guidance departments of colleges and universities.
The prediction accuracy of the decision tree prediction model of college students’ employment and entrepreneurship potential is evaluated, and the prediction accuracy of the model is 88.59% and 84.56% on the training set and the test set, respectively, which are above 80%, and it can complete the prediction of college students’ employment and entrepreneurship potential better.
From the perspective of enterprises, the article chose 15 measurement indicators to explore the main influencing factors of college students’ employment and entrepreneurship, established the salary rating scale of college students’ employment and entrepreneurship situation, and extracted four determinants of college students’ employment and salary among multiple factors through factor analysis, i.e., learning ability, practical ability, interpersonal processing ability, and vocational ability, and the total variance explanation of the four factors accounts for 36.955%, 15.356%, 13.335%, 7.752%, these four factors are the main influencing factors of college students’ employment and entrepreneurship potential.
Through the excavation of the main influencing factors of college students’ employment and entrepreneurship potential, this paper establishes the relevant cultivation mechanism of college students’ employment and entrepreneurship from the three levels of individual students, internal construction of schools and macro-control of the government, which is of practical significance to improve the employment and entrepreneurship ability of college students.
