Study on the Enhancement of Personalized Borrowing Experience of Smart Library Users Based on Reinforcement Learning Framework
Publicado en línea: 26 sept 2025
Recibido: 03 feb 2025
Aceptado: 10 may 2025
DOI: https://doi.org/10.2478/amns-2025-1045
Palabras clave
© 2025 Haiying Sun and Mingzhi Fan, published by Sciendo.
This work is licensed under the Creative Commons Attribution 4.0 International License.
Smart library is a library management system based on Internet of Things (IoT) technology, which realizes the intelligent management and service of the library through the interconnected network of sensors, IoT devices and Internet technology, and brings better personalized borrowing experience for the readers, which can be enhanced by the reinforcement learning framework [1-4].
The personalized service of smart library is mainly reflected in the following aspects. Self-service borrowing and returning: Through the self-service borrowing and returning machine, readers can borrow and return books independently without human intervention. Readers only need to brush the card or scan the code to complete the borrowing or return operation. This service mode saves readers’ time and improves borrowing efficiency [5-8]. Unattended library: The intelligent library adopts automated equipment and intelligent system to realize the unattended state. Readers can through the self-service lending machines, self-service query terminals and other equipment for borrowing, return and query operations. At the same time, the library is also equipped with video surveillance system to ensure the safety of the library [9-12]. Personalized Recommendation Service: By analyzing the borrowing records and reading preferences of readers, the intelligent library uses artificial intelligence technology to provide readers with personalized book recommendations. The library system will recommend relevant books, periodicals and newspapers according to readers’ reading habits and interests, providing more accurate reading recommendations [13-16]. Digital resource service: Intelligent libraries digitize paper resources such as books, periodicals and newspapers to establish a digital library platform [17-18]. Readers can access digital libraries through electronic devices, such as computers and cell phones, to read e-books, journals and newspapers. The digital resource service facilitates readers’ reading and saves the space and cost of libraries [19-22].
Literature [23] is based on evaluating readers’ preferences to test the feasibility of the self-service model of smart bookcase. It shows that the convenience of borrowing books and openness of information are factors that affect readers’ experience. The research results are favorable to promote the development of public cultural reading services. Literature [24] reveals the deficiencies of current libraries, and based on the Internet of Things information technology, puts forward measures such as the establishment of specialized institutions for information resource sharing and the establishment of a sound investment mechanism for information resource sharing. Literature [25] explored the impact of Al and ML on libraries, pointing out the positive impacts including automation and personalization, as well as the existence of disadvantages such as large investment and ethics. Literature [26] reveals that in the context of the gradual intelligence of university libraries, innovating the content of library information services, constructing diversified information services, and so on, requires innovation and development on intelligent library service models. Literature [27] emphasized the important role played by information technology such as AR and VR in enhancing library user experience. It also outlines the impact of emerging technologies on the multifunctional transformation of libraries. Literature [28] explored the design of a Java-based book search system aimed at reducing the work pressure of library managers and enhancing the borrowing experience of patrons. It revealed that it not only facilitates the classification and organization of book information, but also provides personalized recommendations for readers. Literature [29] examines the application of Al, loT technologies in library services and proposes a unified framework aimed at improving efficiency, security and user satisfaction. And shows the effectiveness of the proposed approach. The results of the above studies reveal the wide application of intelligent libraries, which contribute to the improvement of library management and patrons’ borrowing experience. However, it is not difficult to see that the current application of reinforcement learning framework is relatively small, and the application in the personalized borrowing experience of smart library users is basically absent. Therefore, it is of great significance to study the enhancement of personalized borrowing experience of smart library users based on reinforcement learning framework.
In order to study the path of personalized borrowing experience enhancement in smart libraries, this paper firstly proposes an improved reinforcement learning recommendation algorithm based on the basic idea of reinforcement learning, combined with dynamic user group clustering. Embedding logic is introduced into the multi-information representation layer to obtain user and item feature vectors. The Euclidean distance between the feature vectors is calculated, k-means clustering of users is performed, and the cluster allocation is updated according to the feedback. A multi-level item filtering module is designed to divide action groups and provide choices for agents. Based on the preference characteristics of the user groups, combined with a multi-arm slot machine algorithm, the arm with the highest expectation is selected for maximum gain. A joint training method combining backpropagation and reinforcement learning is designed to capture the dynamic user preferences and adjust the revenue maximization strategy in real time to complete the personalized recommendation of borrowed content. Taking the improved reinforcement learning algorithm as a framework, we analyze the needs of reader management, book borrowing and statistical analysis of borrowing data in the process of personalized borrowing, and design a personalized borrowing management service system in combination with the borrowing process, and analyze the effect of the system’s function implementation and the actual borrowing experience through experiments.
Reinforcement learning (RL) is a field of machine learning which is a machine learning method that maximizes cumulative rewards through continuous experimentation by an intelligent body in its environment and it is also one of the three basic machine learning paradigms, alongside supervised and unsupervised learning. The main difference between Reinforcement Learning and Supervised Learning is that Supervised Learning uses the difference between the output and the label as a feedback, as a way of correcting the parameters of the system to achieve error minimization, whereas Reinforcement Learning only needs to tell the system about the right and wrong outputs, and based on the right and wrong, it gets the rewards and penalties in order to get the converged model, which focuses on finding a balance between exploring the unknown domains and utilizing the existing knowledge [30].
The theoretical basis of reinforcement learning is the Markov Decision Process (MDP), which contains the state space
Policy-Gradient algorithm belongs to the probability-based reinforcement learning model, and since the probability distribution is characterized by continuity, Policy-Gradient algorithm is commonly used to solve the policy problem in continuous action space. It is a method to update the policy parameters by gradient ascent, specifically, the objective function to maximize the expected return is defined first:
where
where the policy parameter at the
where the logarithm of the probability of choosing action behavior
In the Policy-Gradient algorithm, the gradient of the objective function
where the total number of sampled trajectories is
The model in this paper can be divided into several main parts including multi-information representation module, dynamic clustering module, rating-based multilevel item filtering module and joint training module and the model is named as DJT-RS. The overall framework of this work is shown in Fig. 1.

DJT-RS model structure
In order to make the raw data more informative in terms of characterization, a basic embedding logic is introduced in the multi-information representation layer, and the feature vectors are dynamically updated in coordination with other modules. The update produces a loss function defined as
Where,
The raw features of the users as well as the raw features of the items are fed into the embedding layer to obtain additional user as well as item embeddings. The matrices obtained through matrix factorization and the feature matrices obtained from the Embedding layer are concatenated to obtain the user as well as item embedding vectors.
The user and item embedding vectors are expanded into several dimensions, then softmax is used to determine the importance of each weight and summed up, and the average is used to obtain the final interpretable item features, and the final feature values for each feature domain are synthesized as Eq:
A k-means clustering of the user feature vectors is performed, and the parameters of each user cluster are represented by the average of the user feature vectors within that cluster. The closest users are linked together, and the parameter of each user is the previously computed user feature vector, with the average parameter of each user group:
Personalized recommendations are made based on the user feature vectors and user clusters, and the distance between the user and each cluster is recalculated based on the feedback received to update the cluster assignments.
User feature vectors are updated after each recommendation to capture changes in user preferences and provide more accurate recommendations. A gradient descent algorithm is used to synthesize and update the user and item embedding information to obtain updated user synthesis feature vectors:
The updated user feature vectors are merged into the dynamic user group module to regroup the current users again. Assuming
For each cluster
Create a filtering mechanism to better categorize projects to get different sets of projects recommended to users. First, a set of highly rated items is collected to obtain a set of highly rated item recommendations. Then the interaction logs of the current recommended users are analyzed and the categories of the highly rated items in the interaction logs are used as connecting points to find items that are the same as these categories, which are evenly divided into groups of items, and the number of items in the group can be modified as a parameter. Item sets were made based on the ratings and the categories of the items in the interaction logs, and the smaller sets of the two larger item sets were simultaneously used as candidate recommendation arms for the agent to choose from.
In this paper, the whole recommendation process is modeled as a Markov decision-making process, and a multi-arm slot machine algorithm is used to select the largest arm for output to obtain the maximum benefit.
Agent. In rating prediction, different item categories are classified based on labels and each category is used to characterize the externally exposed item feature vector. Each item in each item group is traversed using an agent and the selection formula is:
After traversing all the items in all the item groups to get the recommended value of the current item group, traverse all the actions in turn to get the recommended value of all the arms, and take the arm corresponding to the largest recommended value as the recommended list of items.
State The state is affected by the actions taken by the agent. In the scenario context, each element in the set represents a list of recommendations, and the state at any moment is as follows:
Actions The agent influences the environment by selecting actions. At each moment, the system selects an action that represents an arm, groups all items in the candidate pool according to the user’s preferences, and generates a list of recommendations using a multi-level item filtering module:
where Reward Obtaining rewards provides the necessary information for the intelligence to make informed decisions in subsequent steps. If the target user’s rating value of the item to be recommended in the current round is equal to the second threshold, the reward value of the item to be recommended is the first value. If the rating value of the target user for the item to be recommended in the current round exceeds the second threshold, the reward value of the item to be recommended needs to be additionally added to the difference between the current rating and the second threshold in addition to the first number of fingers.
Reinforcement learning algorithms are introduced to maximize returns while using changes in the embedding vectors representing users and items to reflect the evolution of user preferences. In this paper, the algorithm trains the representation layer of users and items, and when it converges rapidly, the whole system is launched to update the user and item embedding vectors based on the recommendation results, while the parameters of users and user groups in reinforcement learning are adjusted to realize joint training.
This optimization process not only focuses on the predictive accuracy of the model, but also considers the performance of the model in user interactions to provide more personalized and effective recommendations.
This subsection describes the requirements analysis of the personalized lending system from two aspects. On the one hand, it describes the analysis points that need to be taken into account in the business aspects of this research, while on the other hand, it focuses on the analysis of the development and management aspects of the whole system.
Reader management requirements. It is mainly the management and maintenance of the basic information of the readers and users themselves, which mainly includes the addition, deletion, modification and checking of the basic information. In addition, according to the reader’s educational level (i.e., the type of reader-user), the total number of books borrowed at the same time, the borrowing and returning period is set up accordingly to give full play to the value of the books, so as not to allow highly educated people to return the books in a hurry because of the expiration of the books, and not to allow some readers to forcibly occupy the book resources.
Book management needs. The demand in this area mainly focuses on the management and maintenance of basic information of books, similar to the above requirements, and needs to complete the operation of adding, deleting, changing and checking its basic information. At the same time, according to the Chinese library classification method for book classification, the general direction of the same type of books is the same, and its similarity is relatively high, so that the system can recommend books according to the principle of similarity of items.
Book lending requirements. This part focuses on the circulation link, because the books and readers of the information is relatively fixed, generally only at a particular time for unified organization and maintenance, while the demand for book borrowing and returning is a dynamic process, including borrowing, returning, renewing, punishing and so on a number of modular operations, which produces data in the book management is also very important information. Not only to save the reader and the book of the many-to-many relationship, but also to record the time of the book borrowing and returning, to facilitate the recommendation of the book.
System setup requirements. This part is mainly for the maintenance of the whole system platform, including function maintenance, database maintenance, system tuning, etc., setting system parameters, calculating the recommended sequence and other functions.
Data security issues. All the basic data information about readers, managers and books used in this thesis should be protected by a strict protection mechanism, which encrypts the user’s private information and the data that can’t be known by other people before storing them in the MySQL database system, and the various operations of the database need to be checked to see whether they have the database operation privileges, and different modules are set up according to the roles of the different users. This is a data security issue that the system must take into account.
Statistical analysis of loan data. According to the large amount of data accumulated in history, the system administrator can carry out large-scale correlation analysis of the collection of books and readers’ information, and generate statistical results based on the historical data, so as to intuitively understand the borrowing and reading situation of the books through the visualization environment, which is convenient for the superiors to view and analyze. Such as encountering popular books to consider whether to increase the number of copies of the library collection, whether the cold books should be withdrawn from the library or to organize activities to let more people know, the handling of overdue book borrowing and so on.
Reader borrowing process: first find the book, if the book is not in the library, the book search fails, and at the same time will send the relevant records to the database background, so that administrators can choose whether to make up for the book according to the book’s hot search degree. After finding the book successfully, check the legitimacy of the reader’s identity, and if there are overdue books, you need to carry out the overdue processing operation first. After the identity is legal, within the total number of books borrowed by the reader, the process can be completed, and the borrowing information will be added to the database at the same time.
The reader’s book return process is relatively simple: through the detection of borrowing information database table, combined with the reader’s identity, the calculation of the book should be returned to the date, to determine whether the overdue, if in the return period, then return the book successfully. If it is overdue, the book will be returned to the overdue processing.
The task of library information management sub-module is to add, delete, change and check the basic information of books, which also includes regular backup and maintenance of the book database. The reader management module is similar to the book management module, providing operations such as adding, deleting, changing and maintaining reader information. The lending management module mainly includes operations such as book borrowing registration, book return registration, renewal, overdue processing, etc. The important part is the lending recommendation submodule, which contains the main research content of this paper, and realizes the functions of recommending books to readers according to readers’ needs and recommending books to readers according to the book information by means of historical data.
Among them, the architecture of the lending recommendation part is similar to the MVC design pattern. The interface layer is similar to the View layer, which is mainly used for human-computer interaction between the user system and the user. The reader inputs relevant request information, and the system returns a list of request data after calculation, which is then displayed to the user through the interface layer. Personalized service layer is somewhat similar to the Controller layer, the layer according to the user’s different requests for different processing processes, in addition to the system automatically run the information collection module and the system recommendation module, the layer contains the main recommendation algorithm implementation. As the bottom layer of the basic database and Model layer is similar to the direct dealings with the database, mainly including database additions, deletions, changes and checks, providing data interfaces for the upper layer, while the efficiency of the database operation needs to be tuned in this layer.
According to the functional module design of the book lending and recommending system, its administrator’s business operation flow is shown in Figure 2.

System business flowchart
In this section, experiments will be carried out on the optimization part of the reinforcement learning model of the article, and the experimental results will be analyzed. The experiments related to the joint training mechanism are mainly centered on the two indicators of time spent as well as cumulative gain, the effectiveness of the parameter update method is verified by comparing the loss function descent curve, and finally, the performance of this article’s model on two datasets is experimentally compared with that of the mainstream model.
In this paper, we experiment the model on MovieLens dataset and Netflix RL4RS dataset oriented to reinforcement learning recommender system. The data of MovieLens originated from the website
NetEase Fuxi Lab released a specialized dataset to solve the problem of reinforcement learning-based recommendation system: the RL4RS dataset. The data in this dataset comes from a product recommendation scenario of NetEase game, and contains two sub-datasets, Dataset A and Dataset B. Dataset A only considers how to make a single recommendation to the user, while Dataset B considers a continuous recommendation scenario, which makes multiple recommendations to the user. The RL4RS dataset has a total of 186,841 users, 298 item quantities, and 153,87489 interaction records.
In this paper, precision rate and normalized discount cumulative gain (NDCG) are used as evaluation metrics. The precision rate refers to the proportion of true positive items among all items judged to be in the positive category, and is calculated as follows:
where
The Normalized Discount Cumulative Gain (NDCG) is calculated as:
The numerator part of the formula is the Discount Cumulative Gain (DCG), where a larger
Precision@k does not consider the ordering of the item as long as it is in the recommendation list. NDCG@k, on the other hand, takes into account the ordering of the items, which is a factor to focus on when the recommendation sequence is important. Larger values of Precision@k and NDCG@k metrics mean better performance of the model.
This section analyzes the effect of the model before and after the optimization of the sample sampling rule, and first investigates the time spent problem. Based on the original D-RS model, we use the joint training mechanism to train the model, which is called DJT-RS, and Table 1 shows the comparative experimental results of the time spent. It can be found that although the original data has been characterized as a more convenient to find the user-item scoring matrix, adding the joint training will still greatly increase the training time of the model, the training of an episode of the time spent on different datasets increased by 31.6% and 78.2%, respectively, the larger the dataset size, the joint training mechanism to bring the more time spent on the increase. This is due to the fact that joint training requires updating the user and item embedding vectors based on the recommendation results, as well as real-time tuning of the parameters of users and user groups in reinforcement learning, which brings about an increase in the time spent.
Results of comparative experiments on time spent
| Model | Time(s) | |
|---|---|---|
| MovieLens | RL4RS | |
| D-RS | 95 | 308 |
| DJT-RS | 125 | 549 |
The effect of the joint training mechanism on the cumulative reward is then investigated, and Figure 3 illustrates the experimental effect of the joint training mechanism. Where the horizontal axis represents episodes and the vertical axis represents overall rewards. JT stands for joint training mechanism and T stands for normal training mechanism. Comparing the two curves, it can be found that the reward fluctuation of the model using joint training is significantly smaller than that of the pre-optimization model, which implies that the joint training mechanism can make the training of the model more stable. Episodes from 65 onwards, the co-trained model has been able to learn (intelligences receive rewards > 0) and as episodes increase, the cumulative reward becomes progressively larger and converges slightly faster than the normal training mechanism.

Experimental effects of the join training mechanism
To analyze the reason, the joint training mechanism has been trying to find the maximum output value in reinforcement learning by taking into account the constant changes in user preferences and behaviors based on emotions, so it speeds up the convergence speed, but the process of searching is more time-consuming.
In summary, using the joint training mechanism can somewhat improve the convergence speed and reduce the episodes needed for training, but it will increase the length of each round of episode training, which needs to be weighed according to different recommendation tasks.
In this section, the effectiveness of parameter updating methods for user groups is verified, and the comparison methods include the most commonly used optimization algorithms for reinforcement learning, adam, radam, and the gradient descent algorithm used in this paper. Figure 4 shows the loss function descent image of the policy network with different algorithms applied. When the adam optimization algorithm is applied to update the network parameters, the loss function descent process is less smooth and has large fluctuations. radam adds the rectification function on the basis of adam, and the loss function descent process becomes relatively smooth. And the gradient descent algorithm’s loss function descent process is very smooth and has a fast convergence speed.

Comparison of parameter optimization methods
Optimization was performed using the adam optimization algorithm and the loss function eventually dropped to 0.1157, the radam optimization algorithm dropped to 0.0098, and the gradient descent algorithm minimized the loss function to 0.0032. This demonstrates the feasibility and effectiveness of the gradient descent algorithm used in this paper for recommender system tasks.
DJT-RS was compared with classical methods to verify the effectiveness of the model. The selected methods for comparison include: popularity-based recommendation algorithm (POP), Bayesian analysis-based recommendation algorithm (BPR), matrix decomposition-based recommendation algorithm (FISM), recommendation algorithm based on DQN framework (DQN), recommendation algorithm based on Actor-Critic framework (RaCT), recommendation algorithm based on graph neural network (NGCF), lightweight of graph convolutional recommendation algorithms (LightGCF), recommendation algorithms based on graph unwinding modules and intention graphs (DGCF), and recommendation algorithms based on VAE framework (RecVAE).
The results of the comparison experiments on MovieLens and RL4RS datasets are shown in Tables 2 and 3, respectively, where the suboptimal results are marked in black font and the optimal results are marked with *. The experiments prove that the model proposed in this paper performs well on the two datasets and the four experimental metrics. Except for the NDCG@5 metric on the RL4RS dataset, where the performance is slightly inferior to RecVAE, the model in this paper possesses higher accuracy rate and normalized discount cumulative gain in all evaluation metrics on the other datasets.
Results of comparative experiments on MovieLens dataset
| Method | Precision@5 | NDCG@5 | Precision@10 | NDCG@10 |
|---|---|---|---|---|
| POP | 0.1232 | 0.1397 | 0.1032 | 0.1304 |
| BPR | 0.1520 | 0.1647 | 0.1252 | 0.1421 |
| FISM | 0.1624 | 0.1834 | 0.1505 | 0.1718 |
| DQN | 0.1924 | 0.2029 | 0.1762 | 0.2139 |
| RaCT | 0.3224 | 0.1154 | 0.3747 | 0.1698 |
| NGCF | 0.4368 | 0.3292 | 0.4728 | 0.3706 |
| LightGCF | 0.4110 | 0.2793 | 0.4746 | 0.3281 |
| DGCF | 0.4423 | 0.3386 | 0.5040 | 0.3554 |
| RecVAE | ||||
| Ours | 0.4763* | 0.3706* | 0.5471* | 0.3863* |
Results of comparative experiments on MovieLens dataset
| Method | Precision@5 | NDCG@5 | Precision@10 | NDCG@10 |
|---|---|---|---|---|
| POP | 0.0919 | 0.0532 | 0.1075 | 0.0600 |
| BPR | 0.1507 | 0.0793 | 0.1428 | 0.0645 |
| FISM | 0.1700 | 0.1036 | 0.1803 | 0.0945 |
| DQN | 0.2328 | 0.1083 | 0.2285 | 0.1199 |
| RaCT | 0.2876 | 0.1143 | 0.3165 | 0.1483 |
| NGCF | 0.3231 | 0.1490 | 0.3861 | 0.1902 |
| LightGCF | 0.3040 | 0.1333 | 0.3604 | 0.1675 |
| DGCF | 0.3302 | 0.1603 | 0.3849 | 0.1925 |
| RecVAE | 0.1832* | |||
| Ours | 0.3548* | 0.4224* | 0.2321* |
From the analysis of system requirements, it can be seen that in order to design a lending management system through the reinforcement learning framework to improve the lending experience, it is first necessary to meet a series of requirements such as reader management, book management, book lending, statistical analysis of lending data and so on. In order to verify the effectiveness of the system designed in this paper to achieve the borrowing management requirements, this paper takes the smart library of university G as an example, installs the personalized borrowing system designed in this paper in its management system, and examines the effectiveness of the system to achieve the requirements.
Borrowing management data of readers in different faculties and departments Figure 5 shows the overall book borrowing of students from different faculties. The overall book borrowing situation can be seen from the book borrowing: literature (I) has the largest amount of borrowing, indicating that the popularity is particularly high. The second is economic books (F), which also confirms that University of G is an economic institution, specializing in economics, so the readers borrowed a larger amount of economic books. In third place are language and text books (H). As readers belong to different faculties and majors, the knowledge they have studied varies greatly, which results in readers’ borrowing process with very obvious faculty characteristics. In order to better illustrate the situation, books with generally low borrowing volume were excluded, and finally the books in the top ten categories were selected to illustrate the group characteristics of readers from different faculties. Table 4 represents the average number of books borrowed by readers of different faculties for each book, where outliers are marked in bold. In the following, the College of Information and the College of Mathematics and Statistics are taken as examples to illustrate the method of borrowing data management using the system of this paper.
In the process of borrowing books in the College of Information, the average borrowing amount of books in category B (philosophy and religion), category C (general social science), category F (economics), category I (literature), category K (history and geography), and category T (industrial technology) are outliers. After checking the specific values in the table, it is found that T (industrial technology) is anomalous because of the highest average loan volume, and the other books are anomalous because of the low loan volume. It is not difficult to find out that the College of Information Technology belongs to the engineering category, and the majors offered by the College are mainly electronic information and computer, so it is not surprising that the average amount of books borrowed from T (industrial technology) is the highest, but the low amount of books borrowed from other categories also reflects the readers’ bias towards learning professional knowledge and neglecting the overall development. After observation, it was found that the School of Mathematics and Statistics had outliers in the average amount of books borrowed in C (General Social Sciences) and O (Mathematical Sciences and Chemistry). Comparing the specific values, it was found that the average borrowing of books in the category of C (General Social Sciences) and O (Mathematical Sciences and Chemistry) was characterized as high. The reason for this may be that the majors in the School of Mathematics and Statistics are mainly mathematics and statistics, so it is not surprising that the average number of books borrowed in these two categories is high. And the average borrowing amount of other kinds of books is at a normal level, which can reflect the readers’ wide range of hobbies and comprehensive development.

The whole book borrowing situation
The books borrowing situation of each schools
| B | C | D | F | G | H | I | J | K | O | T | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Info. | 0.17 | 0.32 | 1.28 | 0.39 | 0.59 | ||||||
| Art | 0.08 | 0.15 | 0.43 | 0.03 | 1.62 | ||||||
| Hum. | 1.54 | 0.72 | 0.23 | 1.21 | 2.01 | 1.65 | 0.06 | 1.29 | |||
| Fore. | 0.18 | 0.98 | 0.46 | 6.78 | 0.44 | 1.15 | 0.03 | 0.25 | |||
| Law | 1.47 | 0.53 | 0.40 | 1.43 | 5.35 | 0.45 | 1.00 | 0.06 | 0.26 | ||
| Math | 1.62 | 0.41 | 2.30 | 0.53 | 2.05 | 7.44 | 0.54 | 0.87 | 0.97 | ||
| BA | 1.62 | 0.31 | 0.56 | 1.84 | 6.95 | 0.59 | 1.43 | 0.35 | 0.73 | ||
| Tour. | 1.00 | 0.24 | 3.22 | 0.80 | 1.76 | 7.29 | 0.51 | 0.12 | 0.49 | ||
| CS. | 1.43 | 0.69 | 0.34 | 0.43 | 3.29 | 6.87 | 0.55 | 1.47 | 0.36 | 0.65 | |
| Fin. | 1.22 | 0.54 | 0.35 | 0.39 | 1.67 | 6.19 | 0.47 | 1.02 | 0.34 | 0.59 | |
| Bio. | 1.28 | 0.93 | 0.14 | 0.33 | 1.90 | 7.25 | 0.55 | 1.16 | 0.69 | 1.14 | |
| Man. | 1.10 | 0.70 | 0.20 | 2.92 | 0.50 | 1.83 | 8.67 | 0.62 | 1.41 | 0.75 | 2.01 |
| FTM | 1.17 | 0.79 | 0.42 | 3.61 | 0.53 | 1.48 | 7.95 | 0.54 | 1.25 | 0.25 | 0.61 |
| Acc. | 1.22 | 0.54 | 0.45 | 3.62 | 0.40 | 1.78 | 6.19 | 0.46 | 1.11 | 0.28 | 0.49 |
| PA | 1.48 | 0.82 | 1.25 | 1.70 | 0.54 | 1.54 | 7.31 | 0.50 | 1.34 | 0.35 | 0.67 |
| EM | 1.22 | 0.61 | 0.24 | 2.20 | 0.38 | 1.40 | 5.82 | 0.41 | 1.22 | 0.16 | 0.47 |
In conclusion, this system provides statistics on the overall borrowing of books by readers as well as the borrowing situation of each college, and then finds out the outliers of each college on the basis of the average borrowing volume, which can intuitively understand the readers’ preference of each college from the point of view of the outliers to realize personalized reading recommendation and enhance the reading experience.
Borrowing management data of readers in different grades Readers’ book borrowing behavior is affected by the grade level of the readers, so the following is to explore the stage characteristics of readers’ book borrowing in different grade levels. Since readers are in different grades in different time periods, the number of people in each grade varies greatly, so the absolute amount of books borrowed is not comparable. Based on this, this paper uses the average borrowing volume of each grade to represent the characteristics of a grade, and the results are shown in Figure 6. As can be seen from the figure, in general the types of books borrowed in these four grades are basically the same. The borrowed books are basically concentrated on the ten categories of Literature (I), Economics (F), Philosophy (B), Social Sciences (C), Politics and Law (D), Languages (H), Arts (J), History (K), Mathematical and Physical Sciences (O), and Automation Technology (T). However, there is a difference in the amount of books borrowed in each middle category, so it is only necessary to analyze these ten categories separately and observe the average amount of books borrowed in these ten categories in each grade level, so that the grade level characteristics of readers’ book borrowing can be observed from the size of the amount of books borrowed. Borrowing traffic management data per unit time slot Grasping the distribution of lending traffic in a fixed period of time can help optimize the management of the smart library, develop personalized lending services, and improve the lending experience. Since the library of University of G lends from 8:00 a.m. to 9:30 p.m., readers can borrow on their own during this period. Based on the characteristics of the university library, the borrowing time is divided into 14 time slots with an interval of one hour. The amount of books borrowed during each time slot is summarized statistically and obtained as shown in Figure 7. Through the graph, it can be seen that the school every day at about 11:00 and 19:00 in the afternoon for the peak borrowing period. Through analysis, it is found that readers usually do not go to the library too early to borrow books due to the work schedule and school teaching schedule as well as the opening time of the library. With the end of the first two classes in the morning, some students who do not have any subsequent classes will go to the library to borrow books, so the amount of books borrowed gradually increases and reaches a great value around 11:00 am. Afterwards, the number of books borrowed gradually decreases as lunch is approaching. In the afternoon, the borrowing volume reaches its peak at around 19:00, which is similar to the situation in the morning. However, the evening lending volume is larger than the morning and lasts a long time, this is due to the evening free time, a considerable part of the students choose to go to the library to borrow books, reading books to supplement their knowledge reserves. Therefore, the library can rationally arrange personalized lending services according to the above characteristics of lending traffic to improve the lending experience.

The books borrowing situation of each grades

Books borrowing situation in a typical day
In the one-week period after the application of this paper’s system in the library of this university, the comparison of the borrowing data before and after the application was counted to visually test whether the personalized borrowing service method based on reinforcement learning experienced a personalized borrowing experience and attracted more readers to borrow books.
Based on this, June 2024 was taken as the observation period, and the new personalized lending system was introduced on June 15, and the lending information for each day in June was counted, as shown in Figure 8. As a whole, the book lending in a week every day of the lending volume changes a lot, and shows a relatively regular lending pattern, that is, in the Saturday and Sunday lending volume shows an upward trend, and Sunday lending volume is the highest in a week, and then Monday to Friday basically shows a downward trend, in which the lending volume on Friday is the smallest.

The school library’s books borrowing situation in June
After the introduction of the new personalized lending system for lending services on June 15, the number of loans increased significantly in the following days, and the peak number of daily loans in late June was close to 500, which was about 50 books higher than that before the implementation of personalized lending services. It can be seen that the smart library personalized lending service system based on the reinforcement learning framework can intelligently recommend reading materials that may be of interest to readers, which is conducive to improving the reading experience.
In this paper, we improve the reinforcement learning personalized recommendation algorithm by means of dynamic user group clustering, and accordingly design a library management system aimed at enhancing personalized borrowing experience. The joint training method used in the study has an increase in time spent, but the reward of the model fluctuates less and converges faster than the normal training mechanism. In this paper, the gradient descent algorithm is used to realize the parameter update of user groups, which obtains a smaller loss function value (0.0032) and converges faster. Meanwhile, the model in this paper achieves significantly leading recommendation accuracy and normalized discount cumulative gain on both MovieLens and RL4RS datasets. The personalized lending management system under the reinforcement learning framework successfully analyzes the reading preferences of readers from different faculties and grades, and counts the distribution pattern of lending traffic during fixed time periods. Comparison reveals that there is a significant increase in the number of borrowing after applying the system in this paper. All the above conclusions show that the user personalized lending service system based on the reinforcement learning framework can analyze the lending data scientifically and thus improve the lending experience.
