Design and Construction of Recruitment Screening Model in Personnel Management System Based on Decision Tree

Under the background of the Internet era, the intelligent transformation of human resource management in traditional enterprises plays an important role in helping enterprises adapt to the current market economy and promoting the sustainable development of enterprises. Human resource management system is a functional department in the enterprise organizational structure with the integration of all human resources of the enterprise, which is extremely important for the healthy development of the enterprise in the future [1-3]. If the enterprise’s human resource management has not kept pace with the development of the enterprise and the pace of the times, then it will produce the bucket effect, which restricts the development of the enterprise [4]. In the Internet era, as an enterprise manager should have interconnected thinking, understand and master the development trend of the digital economy era, promote the wisdom of the traditional enterprise human resource management transformation, so as to be able to realize the human resources within the enterprise to the strategic objectives of the rapid response, and ultimately realize the sustainable development of the digital enterprise [5-8].

Talent recruitment is a very important research topic for talents, enterprises and even the national level, and the efficient matching of talents and company positions is conducive to the common progress and growth of talents and companies, and helps the national level to deeply implement the strategy of strengthening the country with talents in the new era [9-11]. Traditional recruitment channels, such as newspapers, online recruitment advertisements and internal employee recommendations, have obvious limitations and cannot meet the demand for talent in today’s society [12-13]. With the development of artificial intelligence technology, the efficiency and accuracy of talent recruitment has far exceeded the traditional recruitment mode, and the use of big data and artificial intelligence algorithms to achieve this goal will become the future development trend [14-17]. The computer is used to replace the manual in-depth analysis, so as to improve the efficiency and accuracy of the screening of the huge number of application resumes.

Allal-Chérif, O. et al. described the process of e-recruitment, with initial identification of candidates on social networks, followed by a serious gamification of the recruitment and interviewing sessions through chatbots, and finally the use of artificial intelligence to screen the candidates with the highest degree of match to the position [18]. Ore, O. et al. analyzed the advantages of working with an artificial intelligence recruiting system along with the risks that it faces. Intelligent recruitment system facilitates the effective execution of repetitive daily tasks, but it can also lead to fear and mistrust of job seekers during the application process [19]. Mohamed, A. et al. designed an intelligent recruitment system based on a ranking algorithm, which assigns ranking points to the recommended candidates to achieve intelligent recruitment results based on the ranking by semantically extracting and recognizing the requirements of the position and the content of the resume, which not only increase the degree of matching between talents and jobs, but also effectively reduce the cost of enterprises [20]. Najjar, A. et al. developed an intelligent decision support system for talent recruitment using machine learning and natural language processing techniques to sort candidates based on the semantic similarity between resume content and job description, and experiments show that the proposed intelligent recruitment system has a high recognition accuracy and performance [21]. Pessach, D. et al. proposed a two-stage human resource recruitment framework, in the first stage using a variable-order Bayesian network model to process the recruiter information to obtain the interpretability of the data, and in the second stage, a global optimization model is designed to predict the candidates who are qualified for the position, and experiments show that the proposed recruitment framework can improve the diversity of recruiting and the success rate [22]. Mughaid, A. et al. developed an intelligent recruitment system based on user’s geolocation to match the exact candidate for a job position by mining the geolocation data of the job seeker’s social media network [23].

In this paper, based on data mining technology, user image technology and CART decision tree algorithm, the intelligent recruitment screening system model in personnel management system is constructed. Firstly, we establish the talent label big data statistical analysis model, design the basic label assignment system and aggregation rules in line with the talent resource management, combine the expert model and machine learning method, get the association rule clustering function, and realize the optimization design of talent label generation through the identification of talent labels and big data fusion clustering analysis. Then, the construction and analysis of a talent portrait is carried out using the recruitment of editorial positions as an example. Then, the CART algorithm is used to construct an intelligent recruitment screening model to optimize the design of a talent recruitment system. Finally, the training and application analysis of the model is carried out to test the practical application results of the methods described in this paper.

2

Talent label generation method for user profiling based on big data mining

In order to enhance the screening ability of the recruitment model for talents in the personnel management system and improve the precision of talent label recognition, this chapter proposes a talent label generation method based on big data mining and user profiling technology.

2.1

Talent labeling big data mining and feature extraction

2.1.1

Talent Labeling Big Data Mining

In order to obtain the edge profile characteristics of talent user portrait in big data, the comprehensive input model of internal and external factors is used to establish the statistical analysis model of talent label big data information, and the fuzzy decision-making method is used to represent the data sample collection of business scenario talent labels with A = {(a₁,a₂,⋯,a_n)}, in which n is the number of samples, {A₁,A₂,⋯,A_c} represents the corresponding labeling class of the samples, and the affiliation degree of sample a_i for class A_k is denoted as u_k(a_i), which is abbreviated as u_ik. In the talent behavior space, the fuzzy clustering statistical feature information of business scenario talent labels is obtained from the affiliation characterization of A, and then the two-dimensional statistical information distribution of business scenario talent label recognition is obtained as follows: (1) $H_{b} = \sum_{i = 1}^{n} \sum_{k = 1}^{c} {(u_{i k})}^{b} {(d_{i k})}^{2}$

Where, d_ik represents the Euclidean distance, which is used to measure the distance between the ind sample a_i and the center point of the kth class clustering. b is the class attribute parameter of the talent label at the decision moment, if b = 0, it means that class k business rejects the label attribute of this class of talent, if b = 1, it means that class k talent flow and information access, through the integrated input of internal and external factors, we get a belongs to the set of talent label attributes corresponding to A, in the talent label state behavior data distribution space A_x, the human resource basic data mining results can be expressed as: (2) $μ_{i k} = \sqrt{\sum_{i = 1}^{n} {(a_{i} - d_{i k})}^{2}}$

Under the constraint of maximizing the benefits of talents, according to the needs of various HR business application scenarios, such as person-post matching, selection, education, employment and retention, the gain function r(a_i,A_k) is obtained to represent the return value under a specific state a ∈ A and a specific behavior A_k of HR business application. For the talent label attribute feature evolution distribution, the maximum benefit function is obtained as: (3) $r (a_{i}, A_{k}) = \sum_{k = 1}^{c} \times \sum_{i = 1}^{n} {(\frac{μ_{i k}}{w_{i k}})}^{\frac{b - 1}{2}} / 1$

where w_ik is used to depict the revenue weights of application labeling attributes for specific business scenarios. Based on the above analysis, the talent label big data mining model is obtained, combined with fuzzy state evolutionary clustering and feature recognition, to realize talent label big data mining and user portrait outlining, and to improve the accurate identification of talent labels based on the results of edge profile feature extraction.

2.1.2

Talent label attribute feature extraction

In order to achieve optimal identification of talent labels, based on the results of talent label big data mining, combined with user portrait feature analysis and big data evolutionary clustering analysis methods, construct a comprehensive feature analysis model of big data for talent labels in specific business scenarios, and extract the relevance feature information of talent labels in specific business scenarios. Set the maximum interference of the main user as T_n, and under the condition of maximizing the effectiveness of talent, get the feature state distribution of the talent label of the business scenario x ∈ X, and in the specific business scenario, the feature component of the behavioral state distribution of the talent label identification can be expressed as: (4) $v_{i k} = \frac{\sum_{i = 1}^{n} {(u_{i k})}^{b} x_{i}}{\sum_{i = 1}^{n} {(μ_{i k})}^{b} T_{n}}, k = 1, 2, \dots, c$

where x_i is the talent label attribute state of the ind sample. Then the minimum conditional probability density function for transferring from talent label attribute state x to state y can be expressed as: (5) $y = \exp [- \frac{{(x - x_{i})}^{T} (x - x_{i})}{2 σ^{2}}]$

Where σ denotes the decision model for each type of talent attribute. Starting from the professional human resources and psychology methodology, the label fusion feature clustering analysis method is adopted, so that $P_{n}^{*}$ , $P_{v h}^{*}$ and $P_{h h}^{*}$ represent the maximum allowable value of talent demand to meet the market constraints under each type of human resources business application scenario, respectively, and through the fuzzy degree of optimization search, the amount of relevance statistical features under each type of human resources business application scenario is obtained as O_n, O_vh and O_hh, which represent the priority of talent labeling under the three different labeling attributes, respectively. The priority of feature scheduling. According to the priority of talent label scheduling under various HR business application scenarios, the feature extraction and correlation analysis of talent labels under different application scenarios are realized, and the size relationship of the correlation distribution term is obtained as w_hh ≥ w_vh ≥ w_n. In this case, the probability density feature distributions of $P_{C_{1}}^{*} \leq P_{C_{2}}^{*} \leq P_{C_{K}}^{*}$ and w_c₁ ≥ w_c₂ ≥ w_{c_k} are obtained, and then combined with the specific business scenario requirements for the identification of basic labels, and the feature weight η_ik is obtained to satisfy the exponential distribution, so that the talent labels can be maximized. Maximizing revenue utility can be approximated as: (6) $S_{i} = \sum_{k = 1}^{c} \sum_{i = 1}^{n} y η_{i k}$

Under the control of the equilibrium game, the talent label attribute feature mining output is: (7) $f (x, y) = \frac{1}{v_{i k} {(2 π)}^{\frac{n_{i k} + 1}{2}} σ^{η_{i k} + 1}} \exp [- \frac{{(x - x_{i})}^{T} (x - x_{i})}{2 σ^{2}}]$

Based on the above analysis, a three-level label system of talent resource label including basic label, feature label and high-level label is constructed, and the edge profile feature extraction and pixel fusion methods of user image are adopted to realize label generation and attribute clustering.

2.2

Optimized Design for Talent Label Generation

2.2.1

Talent Labeling System Convergence Model

Based on the results of talent tag big data mining and attribute feature extraction, the equivalent feature component of talent tag node k is obtained by using edge rotational feature extraction and pixel fusion method of user image: (8) $\hat{Y} = \frac{\int_{- \infty}^{\infty} y f_{k} (x, y) d y}{\int_{- \infty}^{\infty} p_{i} (x, y) d y}$

where f_k denotes a multi-label seed node and p_i denotes the set of association distributions indicative of a three-level labeling system. The base labels are transformed by feature engineering through split-box components, and the talent label attribute quantization level optimization problem is obtained in cognitive cooperation mode can be expressed as: (9) $F_{i} = \frac{(1 + S_{i}^{2}) Γ}{μ_{i k}^{2} \hat{Y} - y}$

Where $Γ = \frac{- \ln (5 B_{k})}{1.5}$ represents the output feature quantity encoded by the user through the three-level labeling system, and B_k represents the matching feature quantity of talent construction demand combined with specific business scenarios.

Adopting the two-by-two planning model [24], we construct the fuzzy correlation index distribution set of talent resource label identification, and get the fusion relationship of talent resource labels as: (10) $E = \frac{y - \hat{Y}}{F}$

The fusion relationship model under the global optimal talent constraints is obtained by applying the user profile support technique to each sub-talent attribute as: (11) $M_{i} = \frac{1}{n} \sum_{i = 1}^{n} {(\frac{F_{i} - \hat{Y}}{E})}^{2}$

According to the construction and application system model of the talent label system, the basic label recognition method is adopted, combined with split-box component detection, to realize the label information fusion and the edge profile feature detection of the human figure, so as to realize the recognition and optimization of the construction of the talent labels of the specific business scenarios.

2.2.2

Talent labeling big data clustering and business scenario matching

Design the assignment system and aggregation rules of the basic label in line with the talent resource management, based on the results of talent label recognition in different business scenarios, combined with expert models and machine learning methods, we get the association rules [25] clustering function as: (12) $Z = \prod_{i = 1}^{n} \frac{E + η_{i k}}{\sum_{k = 1}^{c} (y - μ_{i k})}$

Higher-order label prediction method is used to analyze the fuzzy correlation factors of talent resource management, and analyze the correlation attribute feature components of talent labels to get the label and weight distribution information as: (13) $U = \lim_{n \to \infty} \frac{1}{n} E {\sum_{k = 1}^{c} \int_{0}^{\infty} a_{i k} \cdot f (x, y) d x}$

The base label will be feature transformed to extract the feature commonality of the same type of talent through the split-box component, and the uploaded label information clustering matching result can be obtained as: (14) $L = \sum_{k = 1}^{c} ρ_{k} \log_{2} (1 + \frac{{| v_{i k} |}^{2} Z}{\sum_{k = 1}^{c} {| η_{i k} |}^{2} M_{i} + Γ σ^{2}}) - U$

where ρ_k is the amount of detection statistics features. The result of talent resource big data fusion clustering analysis is obtained as: (15) $p_{k, n} = {[Δ_{k, n} - \frac{\sum_{k = 1}^{c} {| L |}^{2} β_{k, n} + Γ σ^{2}}{{| μ_{i k} |}^{2}}]}^{*}$

Among them: (16) $Δ_{k, n} = \frac{p_{i}}{\ln 2} (\frac{{| U |}^{2}}{B_{k}} + \frac{Γ σ^{2}}{{| Z |}^{2} B_{k}})$

According to the above analysis, the base label is transformed to extract the feature commonality of the same type of talent through the feature engineering transformation of the split-box component to realize the optimized design of talent label generation.

2.3

Construction and Application of Talent Portrait

In order to verify the effectiveness of the designed user portrait talent label generation method, this paper takes the recruitment text of editorial positions as the object, and constructs the talent portrait model of editorial positions from the aspects of professional knowledge demand and professional skills demand.

2.3.1

Characterization of expertise requirements

Editorial positions are often recruited for multiple disciplines at the same time. In order to highlight the differences between disciplines and specialties, this paper groups the nearly 40 disciplines and specialties involved in the recruitment text into four categories: editing and publishing, science and technology, biology and medicine, language and writing, and humanities and social sciences (excluding editing and publishing, language and writing). Through the fine-grained labeling and name normalization of the description text on discipline specialty requirements in the post requirements of 572 editing posts (some posts contain multiple specialty requirements), we have obtained the discipline specialty requirements of editing talents as shown in Table 1. The editorial positions are categorized into 7 categories, including copyright editor, planning editor, art editor, journal editor, digital editor, book editor, and new media editor. These positions involve the 4 major disciplines mentioned above. Among them, the demand for editing and publishing professionals accounts for the smallest proportion, while the demand for language and writing professionals is relatively high.

Table 1.

Editorial talent discipline professional requirements

Professional requirements	Editing job breakdown							Total
Professional requirements	Copyright editor	Planning editor	Art editor	Journal editor	Digital editor	Book editor	New media editor	Total
Editing and publishing	1	3	1	5	2	15	19	46
Medical science and engineering	0	12	0	19	2	34	68	135
Language and writing	15	12	0	4	2	35	112	180
Humanities and Social Sciences (excluding editing, publishing, language)	3	9	9	12	2	21	134	191
No requirement for major	3	18	10	8	1	18	112	170
Total	22	54	20	48	9	123	445	722

7 types of editorial positions in the recruitment requirements of subject specialties for the “specialty is not limited” accounted for 23.55%, there is an overall trend of “broad caliber”. New media editing and book editing are two types of positions that require a large span of disciplines and specialties, and the disciplines and specialties involved are often more relevant to the specific work of the position. The work of copyright editing positions mainly involves the introduction of copyright and copyright output of books, and there are more exchanges with foreign publishers and publishing agencies, which requires a high level of English foundation. Moreover, as more and more books are exported to small-language countries, the demand for German, French, Spanish, and other small-language copyright editors in publishing organizations is increasing. Planning and editing positions have high requirements for topic selection, communication skills, user-centered thinking, and market awareness. Book editing positions have high requirements for manuscript processing, editing, proofreading skills, and writing skills. Journal editing positions require specialized knowledge in disciplines related to the content of the publication, and the ability to carry out academic gatekeeping, manuscript organization, review and editing processing of manuscripts, with certain requirements for professional background and academic level. Art editing positions are responsible for typesetting and overall art design, etc., with high requirements for art skills and aesthetic ability, and the demand for disciplines is dominated by humanities and social sciences, such as art. The surge in demand for new media editing positions is closely related to the development of media digitization, which requires content creation, content operation, data analysis and user interaction according to the characteristics of the platform and user needs, and has higher requirements for digital innovation ability and digital platform operation thinking. Overall, there are obvious differences in the demand for specialized knowledge of talents for different editorial positions.

2.3.2

Characterization of professional skills needs

In order to understand the overall demand for the skills of publishing talents in the survey sample, this paper selects the “job requirements” texts of 572 editorial positions as the text set, conducts semantic analysis of the “job requirements” texts in the recruitment information, extracts the feature words related to skill needs, and uses the LDA topic model to classify a large number of skill feature words, and generates the skill demand category labels for publishing talents.

First, the keywords about skills are extracted. The TF-IDF model is used to extract keywords from the text information of “job requirements”, and the top 50 keywords are ranked as shown in Table 2.

Table 2.

Key 50 keyword label

Serial number	Key words	Serial number	Key words	Serial number	Key words	Serial number	Key words
1	Experience	14	Spirit	27	Office software	40	Conscientiously
2	Majors	15	Cooperation	28	Japanese	41	Literacy
3	Communication	16	Writing ability	29	Downness	42	Copywriting
4	Text work	17	Learning	30	Topic selection	43	Meticulous
5	Undergraduate	18	News	31	Professionalism	44	Knowledge
6	Love	19	Language	32	Collaboration	45	Certificate
7	Team	20	Compressive resistance	33	Qualification	46	Product
8	Book	21	Writing	34	Coordinate	47	New media
9	Master	22	Chinese	35	Merit base	48	Organizational coordination
10	Plan	23	Text	36	Certificate	49	Study abroad
11	English	24	Software	37	Design	50	Independence
12	Solid	25	Dedication	38	Comic book
13	Responsibility	26	Expressive power	39	Execution force

As can be seen from Table 2, recruiters attach great importance to professional qualities and professionalism such as “experience”, “communication”, “love”, “teamwork”, “sense of responsibility”, “stress resistance” and “solidity”, and also attach importance to vocational skills and digital literacy such as “writing skills”, “planning”, “design”, “expression ability”, “software”, “copywriting” and “new media”, and generally pay attention to talents’ communication ability and problem-solving ability in soft skills. The demand for publishing talent skills is characterized by a focus on professionalism and digital literacy, as well as a special emphasis on vocational skills and digital literacy.

Second, the skill demand categories for editorial positions are delineated. In this paper, the optimal number of topics is determined based on the charts generated by pyLDAvis, and the maximum value of K corresponding to the charts generated by pyLDAvis without overlapping circles is taken as the optimal number of topics. Comparing different pyLDAvis charts by running iterative optimization of the LDA model, it is found that there are no overlapping circles on the charts when the number of categorical topics K reaches 4. When K reaches 5, there are two overlapping circles on the charts. When K exceeds 5, there are more than two overlapping circles on the chart. Therefore, this paper determines that the optimal number of topics is 4.

The results of visualization and analysis of various types of topic labels using pyLDAvis are shown in Figure 1. The distribution of bubbles in Figure 1(a) indicates different themes and their relationships. The size and number of bubbles indicate the frequency of theme occurrence, and the smaller the number of numbers, the greater the frequency. The positional proximity between bubbles indicates the proximity between themes. If there is an overlap of bubbles, it means that the feature words of these two themes co-occur. Figure 1(b) shows the top 30 feature words within the theme (Theme 1) that are highly related to the theme and can be used to explore the features that summarize the theme. The bars in the figure indicate the frequency of the corresponding words in the text set, the dark cyan part indicates the estimated frequency of the corresponding words in the theme, and the purple symbols indicate the frequency of the words in the text set. The label “27.04% of tokens” in the figure indicates that the weight of theme 1 in the text set is 27.04%, reflecting its popularity and attention in the text set of “Requirements for the post of editor”.

Theme 1 contains words such as “love”, “responsibility”, “patience”, “carefulness” and “resistance to stress”, which tends to require professionalism. Theme 2 contains words such as “communication”, “team”, “planning”, “design”, “presentation ability” and “short video”, which tends to require basic professional skills. Theme 3 contains words such as “study”, “collaboration”, “solid”, “graduate”, “analysis”, “information”, “qualification” and “computing”, and tends to require professional development skills. Topic 4 contains words such as “undergraduate”, “writing skills”, “English”, “content”, “creativity”, “writing” and “language”, which tends to require knowledge literacy. Therefore, the demand for editorial talent skills can be interpreted into four themes: professionalism, professional basic skills, professional development skills and knowledge literacy. In the text collection composed of 572 “job requirements” for editorial positions, topics 1, 25.37%, 24.62% and 4 accounted for 27.04%, 25.37%, 24.62% and 23.47% respectively. In terms of the weight distribution of topics, the importance of the four themes is basically the same, and professionalism is the most important factor for editorial positions.

3

CART-based recruitment screening model design for personnel management systems

This chapter uses the CART decision tree algorithm to optimize the recruitment screening model of the personnel management system, and it also demonstrates the design and functionality of the talent recruitment sub-system.

3.1

Intelligent Recruitment Model Based on Decision Tree CART

3.1.1

Categorical Regression Trees (CART)

A decision tree is a tree-like decision diagram with probabilistic outcomes attached, and is a graphical method that intuitively uses statistical probability analysis. Decision tree is a prediction model in machine learning, which represents a mapping relationship between object attributes and object values, each node in the tree is a judgment condition that represents an object attribute, and its branches represent the objects that meet the conditions of the node, and the leaf nodes of the tree represent the prediction results to which the object belongs. The decision tree algorithm includes two steps: decision tree generation and pruning, which predict class label Y based on the input feature X.

The more commonly used decision tree algorithms are ID3, C4.5, and CART (Classification and Regression Tree). CART generally outperforms the other two in terms of classification and is capable of handling both continuous and discrete variables [26]. Thus in this paper we use the CART decision tree algorithm. .

Generating a decision tree is performed by the following steps:

First, a splitting attribute is determined, i.e., it is determined from which feature the sample data is to be divided. Determining the optimal split attribute is a key part of the decision tree, and the selection of the optimal split attribute is based on the goal of maximizing the “purity” of the data at each node after the split. So try to make the data contained in the branch nodes classified by the feature belong to the same type. When a suitable feature is selected as a judgment node, classification can be performed quickly, which reduces the depth of the decision tree.

Secondly, the determination of the threshold value: choose the appropriate threshold value to minimize the classification error rate.

3.1.2

Feature selection for decision tree CARTs

The CART algorithm uses the Gini coefficient minimization criterion to perform feature selection and generate a binary decision tree. The Gini value reflects the probability that two randomly selected samples from dataset D will have inconsistent category labeling. Thus a smaller Gini value indicates a higher purity of sample set D.

In a classification problem, assuming that there are k categories, the probability of the kth category is p_k and its expression is: (17) $G i n i (p) = \sum_{k = 1}^{K} p_{k} (1 - p_{k}) = 1 - \sum_{k = 1}^{K} p_{k}^{2}$

If we are dealing with a binary classification problem, the first sample output probability is p and the Gini coefficient expression for its probability distribution is: (18) $G i n i (p) = 2 p (1 - p)$

For sample D, the number is |D|, assuming K categories, and the number of the K th category is |C_k|. The expression for the Gini coefficient for sample D is: (19) $G i n i (D) = 1 - \sum_{k = 1}^{K} {(\frac{| C_{k} |}{| D |})}^{2}$

Let the training dataset of the node be D and the number of nodes be |D|. For each feature A, for each of its possible values a, dividing D into |D₁| and |D₂|, the expression for the Gini coefficient for dataset D, conditional on feature A, is: (20) $G i n i (D, A) = \frac{| D_{1} |}{| D |} G i n i (D_{1}) + \frac{| D_{2} |}{| D |} G i n i (D_{2})$

3.1.3

Pruning of decision trees

In decision tree algorithms, in addition to the correctness of feature selection, the simplicity of the decision tree also needs to be considered. In addition, since decision trees can easily exhibit overfitting in the training set, leading to poor generalization, they are pruned, i.e., regularized similar to linear regression. Pruning includes pre-pruning, post-pruning, and some subsequent methods.

Pre-pruning: In the process of decision tree generation, each node is estimated before division, if the division of the current node can’t bring about the improvement of the decision tree generalization performance, then stop the division and mark the current node as a leaf node. Pre-pruning can effectively reduce the risk of overfitting, and significantly reduce the training time overhead and testing time overhead of the decision tree.

Post-pruning: first generate a complete decision tree on the training set, and then examine the non-leaf nodes from the bottom upwards, and if replacing the corresponding subtree of the node with a leaf node brings an improvement in the generalization performance of the decision tree, the node will be replaced with a leaf node.

CART uses the post pruning method, i.e., generating a decision tree first, then generating all the pruned CART trees, and then using cross-validation to examine the effect of pruning and selecting the pruning strategy with the best generalization ability.

3.2

Optimized design of talent recruitment system

This paper uses the CART algorithm to optimize the design of the recruitment screening model of the personnel management system, and the workflow model of the talent recruitment system obtained after optimization is shown in Figure 2, which mainly includes job posting, online delivery of resumes by candidates, establishment of Hive data warehouse, generation of Hive-based CART decision tree, and intelligent resume screening in a few core parts.

Position release: mainly includes setting the post name, education, professional and technical qualifications, work experience and other modules.

Candidates online resume delivery: according to the requirements of each job, candidates need to fill in different information.

Intelligent resume screening: Based on the resume data of the original recruitment system, the talent data warehouse is established. First of all, the training set data is extracted from the data warehouse by selecting attributes such as gender, date of birth, place of origin, marriage, date of joining the workforce, professional and technical job level information, number of papers, etc., of which the training set data is categorized by whether or not it enters the written test. From the extracted data, 80% of the total amount is selected as the training set and 20% as the test set to improve the correctness of the decision tree.

4

Application and analysis of recruitment screening models

4.1

Model training

In order to explore the practicality of the proposed recruitment screening model and talent recruitment system based on the CART decision tree algorithm, this paper will implement the training with the linear model and CART decision tree respectively. The fitting results of the predicted and true values of the linear model and the CART model are shown in Fig. 3(a) and Fig. 3(b), respectively, where the shaded part indicates the difference between the predicted and true values.

Comparing the 2 figures, we know that the predicted value and the true value of the CART model are very close to each other, and the fitting effect is obviously better than the linear model, i.e., the CART model used in this paper has a better performance and is more suitable for the talent recruitment screening task.

4.2

Feature Classification and Mining

If a candidate’s intention to join the organization after the initial interview is greater than 80%, the candidate will receive a 1 label. If the candidate’s intention to join the firm is less than or equal to 80%, the candidate will receive a 0 label. The confusion matrix is also utilized to observe the TP, FN, FP, and TN metrics. Call multiple models for comparison and observe the indicators Precision, Recall and F1 values.

1)

Implementing logistic regression

The results of implementing logistic regression on the test set and training set are shown in Fig. 4(a) and Fig. 4(b), respectively. From the classification results, it can be seen that the precision and recall of implementing logistic regression are 0.875 and 0.750, respectively, which results in an F1 value of 0.808. Observing this confusion matrix, it can be learned that there are 39 incorrectly scored samples in the training set samples, and 80 people want to join the company. On the test set, there are 10 samples that were mis-scored.

2)

Implementing support vector machines

The results of implementing support vector machine on test set and training set are shown in Fig. 5(a) and Fig. 5(b) respectively. The precision, recall, and F1 values of implementing support vector machine are 0.950, 0.731, and 0.826, respectively. Observing this confusion matrix it is known that on the training set samples, there are 30 wrongly-scored samples, and there are 87 people who want to join the company. On the test set there are 8 mis-scored samples.

3)

Implementation of Simple Bayesian Algorithm

The results of implementing the plain Bayesian algorithm are shown in Fig. 6, with precision, recall, and F1 values of 0.833, 0.862, and 0.847, respectively. Observation of this confusion matrix reveals that there are 56 mis-scored samples on the training set samples, and there are 89 people who want to join the company. On the test set there are 9 mis-scored samples.

4)

Implementation of CART decision tree

The results of implementing CART decision tree on test set and training set are shown in Fig. 7(a) and Fig. 7(b) respectively Precision, recall and F1 value of implementing CART decision tree are 0.857. From the confusion matrix, it can be seen that there are 30 and 8 mis-scored samples on the training and test sets, respectively, and there are 94 people who want to join the company on the training set.

5)

Implementation of K Nearest Neighbor Classifier

The results of implementing K Nearest Neighbor Classifier are shown in Fig. 8 with precision, recall, and F1 values of 0.950, 0.679, and 0.792, respectively. By looking at the confusion matrix, it can be seen that there are 39 mis-scored samples on the training set samples, and there are 77 people who want to join the company. On the test set there are 10 mis-scored samples.

Comparing logistic regression, support vector machine, plain Bayes, CART decision tree, and K-nearest neighbor classifier, it can be learned that all types of algorithms have good prediction results. However, compared to the other algorithms, the F1 value of the CART decision tree algorithm used in this paper is larger, indicating that it has the best prediction effect.

4.3

Model results and analysis

The above training results for each model can be found: due to the collected sample dataset being unbalanced data, there are extremes in both accuracy and recall results, and the overall precision is high. Due to the classification modeling problem conducted in this paper, the focus is on obtaining the prediction scores, i.e., the prediction probabilities, of the classification models. Therefore, the balance of correctness between categories is not strictly adjusted, and the discrimination thresholds are not adjusted here to preserve the prediction performance of each model. To measure the prediction results of each model, ROC and AUC values are used for evaluation. Among them, AUC can reflect the model performance for different boundary values between classes. Combining the four algorithmic models of logistic regression (LR), plain Bayes (NB), support vector machine (SVM), CART decision tree, and K-nearest neighbor classifier (KNN), the AUC value is used as an evaluation criterion for the prediction models of whether the talent is delivered and whether the job recruiter is satisfied, and the ROC curves of the models are shown in Fig. 9, and the AUC value is used as an evaluation criterion. The ROC curves of each model are shown in Fig. 9(a) and Fig. 9(b).

According to the ROC curves of the two types of models shown in Figure 9, the corresponding AUC value is shown in Table 3.

Table 3.

AUC values of each model

Model	AUC (Delivered)	AUC (Satisfied)
KNN	0.511	0.543
LR	0.547	0.595
SVM	0.582	0.602
NB	0.596	0.678
CART	0.704	0.787

According to Table 3, it can be seen that: in the model of whether the talent is delivered (Delivered), the AUC value of the CART decision tree model (0.704) is significantly higher than that of other models, and there is not much difference in the performance of the four types of models, namely, logistic regression, plain Bayes, support vector machine, and K-nearest neighbor classifiers. In the model of whether the job recruiter is recognized (Satisfied), the AUC value (0.787) of the CART decision tree model is also higher than the other four types of models, i.e., the training effect is obvious and the performance is the best. And on the whole, the training effect of the CART decision tree model in the Satisfied model is slightly better than that in the Delivered model (0.787>0.704), indicating that there is still some room for improvement in the model of whether the talent is delivered.

5

Conclusion

In this paper, we designed a talent labeling system generation method based on big data mining technology and user portrait technology, and constructed a system model for intelligent talent recruitment screening based on CART decision tree algorithm.

First, the recruitment text for editorial positions was selected as the object, and the construction and characterization of the talent portrait were studied. In terms of professional knowledge requirements, 23.55% of the recruitment requirements for seven types of editorial positions are “professional unlimited”, and there is an overall trend of “wide caliber”, and there are significant differences in the demand for professional knowledge of talents in different editorial positions. In terms of professional skills demand, the TF-IDF model is used to extract keywords from the text information of “job requirements”, and the results show that the recruiters pay great attention to professionalism and professionalism, as well as vocational skills and digital literacy, and generally pay attention to the communication and problem-solving abilities of talents in terms of soft skills. From the weight distribution of the four extracted themes, professionalism is the most important factor for editorial positions.

Secondly, an intelligent recruitment screening model based on CART is applied and analyzed. Compared to the linear model, the CART model is very close between the predicted value and the true value, and the fitting effect is obviously better, which is more suitable for the talent recruitment screening task. As for recruitment feature classification and mining, the precision, recall and F1 value of the CART model are all 0.857, and its prediction effect is better than that of the algorithmic models such as logistic regression, plain Bayes, support vector machine, CART decision tree, and K nearest neighbor classifier. Continuing the prediction performance comparison, the AUC values of CART model in the prediction models of whether the talent is delivered (Delivered) and whether the job recruiter is recognized (Satisfied) reached 0.704 and 0.787, respectively, which are both significantly better than the other models, which indicates that the proposed recruitment screening model is able to complete the intelligent recruitment screening task better.

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

Design and Construction of Recruitment Screening Model in Personnel Management System Based on Decision Tree

Yingying Cui

Publié en ligne: 21 mars 2025

Reçu: 27 oct. 2024

Accepté: 24 févr. 2025

DOI: https://doi.org/10.2478/amns-2025-0596

Mots clésUser profiling, Data mining, CART decision tree, Recruitment screening

© 2025 Yingying Cui, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
User profiling, Data mining, CART decision tree, Recruitment screening