Study on the Enhancement of Personalized Borrowing Experience of Smart Library Users Based on Reinforcement Learning Framework

Smart library is a library management system based on Internet of Things (IoT) technology, which realizes the intelligent management and service of the library through the interconnected network of sensors, IoT devices and Internet technology, and brings better personalized borrowing experience for the readers, which can be enhanced by the reinforcement learning framework [1-4].

The personalized service of smart library is mainly reflected in the following aspects. Self-service borrowing and returning: Through the self-service borrowing and returning machine, readers can borrow and return books independently without human intervention. Readers only need to brush the card or scan the code to complete the borrowing or return operation. This service mode saves readers’ time and improves borrowing efficiency [5-8]. Unattended library: The intelligent library adopts automated equipment and intelligent system to realize the unattended state. Readers can through the self-service lending machines, self-service query terminals and other equipment for borrowing, return and query operations. At the same time, the library is also equipped with video surveillance system to ensure the safety of the library [9-12]. Personalized Recommendation Service: By analyzing the borrowing records and reading preferences of readers, the intelligent library uses artificial intelligence technology to provide readers with personalized book recommendations. The library system will recommend relevant books, periodicals and newspapers according to readers’ reading habits and interests, providing more accurate reading recommendations [13-16]. Digital resource service: Intelligent libraries digitize paper resources such as books, periodicals and newspapers to establish a digital library platform [17-18]. Readers can access digital libraries through electronic devices, such as computers and cell phones, to read e-books, journals and newspapers. The digital resource service facilitates readers’ reading and saves the space and cost of libraries [19-22].

Literature [23] is based on evaluating readers’ preferences to test the feasibility of the self-service model of smart bookcase. It shows that the convenience of borrowing books and openness of information are factors that affect readers’ experience. The research results are favorable to promote the development of public cultural reading services. Literature [24] reveals the deficiencies of current libraries, and based on the Internet of Things information technology, puts forward measures such as the establishment of specialized institutions for information resource sharing and the establishment of a sound investment mechanism for information resource sharing. Literature [25] explored the impact of Al and ML on libraries, pointing out the positive impacts including automation and personalization, as well as the existence of disadvantages such as large investment and ethics. Literature [26] reveals that in the context of the gradual intelligence of university libraries, innovating the content of library information services, constructing diversified information services, and so on, requires innovation and development on intelligent library service models. Literature [27] emphasized the important role played by information technology such as AR and VR in enhancing library user experience. It also outlines the impact of emerging technologies on the multifunctional transformation of libraries. Literature [28] explored the design of a Java-based book search system aimed at reducing the work pressure of library managers and enhancing the borrowing experience of patrons. It revealed that it not only facilitates the classification and organization of book information, but also provides personalized recommendations for readers. Literature [29] examines the application of Al, loT technologies in library services and proposes a unified framework aimed at improving efficiency, security and user satisfaction. And shows the effectiveness of the proposed approach. The results of the above studies reveal the wide application of intelligent libraries, which contribute to the improvement of library management and patrons’ borrowing experience. However, it is not difficult to see that the current application of reinforcement learning framework is relatively small, and the application in the personalized borrowing experience of smart library users is basically absent. Therefore, it is of great significance to study the enhancement of personalized borrowing experience of smart library users based on reinforcement learning framework.

In order to study the path of personalized borrowing experience enhancement in smart libraries, this paper firstly proposes an improved reinforcement learning recommendation algorithm based on the basic idea of reinforcement learning, combined with dynamic user group clustering. Embedding logic is introduced into the multi-information representation layer to obtain user and item feature vectors. The Euclidean distance between the feature vectors is calculated, k-means clustering of users is performed, and the cluster allocation is updated according to the feedback. A multi-level item filtering module is designed to divide action groups and provide choices for agents. Based on the preference characteristics of the user groups, combined with a multi-arm slot machine algorithm, the arm with the highest expectation is selected for maximum gain. A joint training method combining backpropagation and reinforcement learning is designed to capture the dynamic user preferences and adjust the revenue maximization strategy in real time to complete the personalized recommendation of borrowed content. Taking the improved reinforcement learning algorithm as a framework, we analyze the needs of reader management, book borrowing and statistical analysis of borrowing data in the process of personalized borrowing, and design a personalized borrowing management service system in combination with the borrowing process, and analyze the effect of the system’s function implementation and the actual borrowing experience through experiments.

2

Improved reinforcement learning recommendation methods

2.1

Enhanced learning

Reinforcement learning (RL) is a field of machine learning which is a machine learning method that maximizes cumulative rewards through continuous experimentation by an intelligent body in its environment and it is also one of the three basic machine learning paradigms, alongside supervised and unsupervised learning. The main difference between Reinforcement Learning and Supervised Learning is that Supervised Learning uses the difference between the output and the label as a feedback, as a way of correcting the parameters of the system to achieve error minimization, whereas Reinforcement Learning only needs to tell the system about the right and wrong outputs, and based on the right and wrong, it gets the rewards and penalties in order to get the converged model, which focuses on finding a balance between exploring the unknown domains and utilizing the existing knowledge [30].

The theoretical basis of reinforcement learning is the Markov Decision Process (MDP), which contains the state space S of all possible states, the action space A, the state transfer probability $P (s' | s, a)$ of the environment transferring to the next state s′ after the intelligent body takes a an action in the s state, and the reward function for obtaining the reward value R(s, a), and the discount factor for the future rewards γ. The goal of the intelligent body is to find the optimal strategy that maximizes the total reward V. where V(s) denotes the maximum reward that can be obtained by taking the optimal strategy in state s, and Q(s, a) denotes the maximum reward after taking action a in state s. The value function can generally be computed by algorithms such as value iteration and policy iteration, and common reinforcement learning algorithms, such as Q-learning, SARSA, and Actor-Critic, are based on the Markov decision-making process [31].

Policy-Gradient algorithm belongs to the probability-based reinforcement learning model, and since the probability distribution is characterized by continuity, Policy-Gradient algorithm is commonly used to solve the policy problem in continuous action space. It is a method to update the policy parameters by gradient ascent, specifically, the objective function to maximize the expected return is defined first: (1) $J (θ) = E_{τ ~ π θ} [R (τ)]$

where π_θ denotes the state-action distribution of the strategy at parameter θ. The objective function J(θ) is next maximized using gradient ascent: (2) $θ_{t + 1} = θ_{t} + α \nabla_{θ} J (θ)$

where the policy parameter at the tst iteration is θ_t, the learning rate is α, the gradient of the objective function J(θ) with respect to the policy parameter θ is ∇_θJ(θ), and the computation is: (3) $\nabla_{θ} J (θ) = E_{τ ~ π θ} [\nabla_{θ} \log π_{θ} (a_{t} | s_{t}) R (τ)]$

where the logarithm of the probability of choosing action behavior a_t at state s_t is $\log π_{θ} (a_{t} | s_{t})$ , and so $\nabla_{θ} \log π_{θ} (a_{t} | s_{t})$ can be viewed as the degree of preference of strategy π_θ for action behavior a_t at state s_t, i.e., the degree of probability of action behavior a_t at state s_t in relation to strategy parameter θ at strategy π_θ [32].

In the Policy-Gradient algorithm, the gradient of the objective function J(θ) is usually estimated by sampling multiple trajectories. The initial state s₀ is first sampled from the currently executed policy π_θ, and then the actions in the policy are executed sequentially to obtain the state sequence s₀, s₁, …, s_T, where T denotes the length of the trajectory. Then the reward payoff values of the trajectories are computed by the reward payoff function R(τ). After sampling multiple trajectories, the gradient of the objective function J(θ) is estimated from these trajectories, and the estimated values of the gradient are as follows: (4) $\nabla_{θ} J (θ) \approx \frac{1}{N} \sum_{i = 1}^{N} \sum_{t = 0}^{T_{i}} \nabla_{θ} \log π_{θ} (a_{i, t} | s_{i, t}) R (τ_{i})$

where the total number of sampled trajectories is N, the ind trajectory is τ_i, and τ_i the behavioral action at time step t is a_i,t the state is s_i,t.

2.2

Reinforcement learning recommendation methods incorporating dynamic user groups

The model in this paper can be divided into several main parts including multi-information representation module, dynamic clustering module, rating-based multilevel item filtering module and joint training module and the model is named as DJT-RS. The overall framework of this work is shown in Fig. 1.

2.2.1

Multi-information characterization module

In order to make the raw data more informative in terms of characterization, a basic embedding logic is introduced in the multi-information representation layer, and the feature vectors are dynamically updated in coordination with other modules. The update produces a loss function defined as L₁: (5) $L_{1} = \sum_{(u, i)} {(r_{i z} - y_{i z})}^{2} + λ ({‖ I ‖}_{F}^{2} + {‖ Z ‖}_{F}^{2})$

Where, r_iz is the true rating value of user i for item z, y_iz is the predicted score of user i for item z, ${‖ I ‖}_{F}^{2}$ is the paradigm square of the matrix formed by all the users and ${‖ Z ‖}_{F}^{2}$ is the paradigm square of the matrix formed by all the items.

The raw features of the users as well as the raw features of the items are fed into the embedding layer to obtain additional user as well as item embeddings. The matrices obtained through matrix factorization and the feature matrices obtained from the Embedding layer are concatenated to obtain the user as well as item embedding vectors.

The user and item embedding vectors are expanded into several dimensions, then softmax is used to determine the importance of each weight and summed up, and the average is used to obtain the final interpretable item features, and the final feature values for each feature domain are synthesized as Eq: (6) $\frac{\sum_{i = 1}^{n} α_{i} x_{i}}{\sum_{i = 1}^{n} x_{i}}$

2.2.2

Dynamic user group module

A k-means clustering of the user feature vectors is performed, and the parameters of each user cluster are represented by the average of the user feature vectors within that cluster. The closest users are linked together, and the parameter of each user is the previously computed user feature vector, with the average parameter of each user group: (7) $W_{s} = \frac{1}{m} \sum_{i = 1}^{m} w_{i, s} p_{i, s}$

W_s is the user group feature matrix of the snd user clustering group, m is the total number of users in the sth user clustering group, w_i,s is the feature weight of the ith user in the sth user clustering group, and p_i,s is the comprehensive user feature vector of the ith user in the sth user clustering group.

Personalized recommendations are made based on the user feature vectors and user clusters, and the distance between the user and each cluster is recalculated based on the feedback received to update the cluster assignments.

User feature vectors are updated after each recommendation to capture changes in user preferences and provide more accurate recommendations. A gradient descent algorithm is used to synthesize and update the user and item embedding information to obtain updated user synthesis feature vectors: (8) $P^{'} = P + \frac{Z}{θ} \cdot X^{T}$

P′ is the updated user composite feature vector of the target user, P is the user composite feature vector of the target user, Z is the sum of the reward values of the item group to be recommended, X^T denotes the average value of the feature vectors of all the items in the item group to be recommended, and θ is the hyperparameter.

The updated user feature vectors are merged into the dynamic user group module to regroup the current users again. Assuming p_i denotes the feature vector of the data point, q_i is the center of the cluster j, and n is the dimension of the feature vector, the data point p_i is assigned to the nearest cluster center q_i based on the distance: (9) $d (p_{i}, q_{j}) = \sqrt{\sum_{k = 1}^{n} {(p_{i, k} - q_{j, k})}^{2}}$

For each cluster j, the mean vector of all data points within that cluster is calculated as the new cluster center: (10) $q' = \frac{1}{m} \sum_{i = 1}^{m} p_{i}$

q′ is the updated clustering center, m is the total number of users in the current user clustering group, and p_i is the user composite feature vector of the ith user in the current user clustering group.

2.2.3

Rating-based multi-level project filtering module

Create a filtering mechanism to better categorize projects to get different sets of projects recommended to users. First, a set of highly rated items is collected to obtain a set of highly rated item recommendations. Then the interaction logs of the current recommended users are analyzed and the categories of the highly rated items in the interaction logs are used as connecting points to find items that are the same as these categories, which are evenly divided into groups of items, and the number of items in the group can be modified as a parameter. Item sets were made based on the ratings and the categories of the items in the interaction logs, and the smaller sets of the two larger item sets were simultaneously used as candidate recommendation arms for the agent to choose from.

2.2.4

Reinforcement Learning Modeling

In this paper, the whole recommendation process is modeled as a Markov decision-making process, and a multi-arm slot machine algorithm is used to select the largest arm for output to obtain the maximum benefit. 1)

Agent.

In rating prediction, different item categories are classified based on labels and each category is used to characterize the externally exposed item feature vector. Each item in each item group is traversed using an agent and the selection formula is: (11) $i_{j} = \arg \max (σ^{2} (x_{j}) + β \sqrt{{x^{'}}_{j}^{T} W_{n}^{- 1} x_{j}^{'} \log (1 + t)})$

i_j is the recommendation value of the jnd group of items to be recommended, $σ^{2} (x_{j})$ is the variance of the jth group of items to be recommended, β is the reinforcement learning hyperparameters, $x_{j}^{'}$ is the mean value of the eigenvectors of all the items in the jth group of items to be recommended, ${x^{'}}_{j}^{T}$ is the transpose of the mean value of the eigenvectors of all the items in the jth group of items to be recommended, $W_{n}^{- 1}$ is the inverse matrix of the feature matrix of the nth group of user groups corresponding to the target user, and t is the number of reinforcement learning rounds. .

After traversing all the items in all the item groups to get the recommended value of the current item group, traverse all the actions in turn to get the recommended value of all the arms, and take the arm corresponding to the largest recommended value as the recommended list of items. 2)

State

The state is affected by the actions taken by the agent. In the scenario context, each element in the set represents a list of recommendations, and the state at any moment is as follows: (12) $S_{u} = (a_{1}, a_{2}, a_{3}, \dots)$

S_u represents the state of a particular user u, and a₁, a₂, a₃, … represents a particular suggestion associated with that user at a particular moment in time, a state that effectively captures the dynamic nature of the environment. 3)

Actions

The agent influences the environment by selecting actions. At each moment, the system selects an action that represents an arm, groups all items in the candidate pool according to the user’s preferences, and generates a list of recommendations using a multi-level item filtering module: (13) $a_{u} = (x_{1}, x_{2}, x_{3} \dots, x_{t, a})$

where a_u is the candidate action of user u. In the model, x_t,a is the feature vector obtained at moment t after the user takes action a_u and pulls the arm. 4)

Reward

Obtaining rewards provides the necessary information for the intelligence to make informed decisions in subsequent steps. If the target user’s rating value of the item to be recommended in the current round is equal to the second threshold, the reward value of the item to be recommended is the first value. If the rating value of the target user for the item to be recommended in the current round exceeds the second threshold, the reward value of the item to be recommended needs to be additionally added to the difference between the current rating and the second threshold in addition to the first number of fingers.

2.2.5

Joint training

Reinforcement learning algorithms are introduced to maximize returns while using changes in the embedding vectors representing users and items to reflect the evolution of user preferences. In this paper, the algorithm trains the representation layer of users and items, and when it converges rapidly, the whole system is launched to update the user and item embedding vectors based on the recommendation results, while the parameters of users and user groups in reinforcement learning are adjusted to realize joint training.

L₂ is introduced on the basis of the classical mean square error loss function, which is calculated by comparing the recommendation results of the model and the actual clicking behavior of the user. L₂ is shown in Equation (14): (14) $L_{2} = \max (i - b_{1} \log L_{1})$

i is the output value in reinforcement learning, and when trying to minimize L₁, the recommender system tries to maximize the desired output value of the reinforcement learning part of the iterative process.

This optimization process not only focuses on the predictive accuracy of the model, but also considers the performance of the model in user interactions to provide more personalized and effective recommendations.

3

Personalized lending system design under the reinforcement learning framework

3.1

System Requirements Analysis

This subsection describes the requirements analysis of the personalized lending system from two aspects. On the one hand, it describes the analysis points that need to be taken into account in the business aspects of this research, while on the other hand, it focuses on the analysis of the development and management aspects of the whole system.

3.1.1

Analysis of business functional requirements

Reader management requirements. It is mainly the management and maintenance of the basic information of the readers and users themselves, which mainly includes the addition, deletion, modification and checking of the basic information. In addition, according to the reader’s educational level (i.e., the type of reader-user), the total number of books borrowed at the same time, the borrowing and returning period is set up accordingly to give full play to the value of the books, so as not to allow highly educated people to return the books in a hurry because of the expiration of the books, and not to allow some readers to forcibly occupy the book resources.

Book management needs. The demand in this area mainly focuses on the management and maintenance of basic information of books, similar to the above requirements, and needs to complete the operation of adding, deleting, changing and checking its basic information. At the same time, according to the Chinese library classification method for book classification, the general direction of the same type of books is the same, and its similarity is relatively high, so that the system can recommend books according to the principle of similarity of items.

Book lending requirements. This part focuses on the circulation link, because the books and readers of the information is relatively fixed, generally only at a particular time for unified organization and maintenance, while the demand for book borrowing and returning is a dynamic process, including borrowing, returning, renewing, punishing and so on a number of modular operations, which produces data in the book management is also very important information. Not only to save the reader and the book of the many-to-many relationship, but also to record the time of the book borrowing and returning, to facilitate the recommendation of the book.

System setup requirements. This part is mainly for the maintenance of the whole system platform, including function maintenance, database maintenance, system tuning, etc., setting system parameters, calculating the recommended sequence and other functions.

3.1.2

Management needs analysis

Data security issues. All the basic data information about readers, managers and books used in this thesis should be protected by a strict protection mechanism, which encrypts the user’s private information and the data that can’t be known by other people before storing them in the MySQL database system, and the various operations of the database need to be checked to see whether they have the database operation privileges, and different modules are set up according to the roles of the different users. This is a data security issue that the system must take into account.

Statistical analysis of loan data. According to the large amount of data accumulated in history, the system administrator can carry out large-scale correlation analysis of the collection of books and readers’ information, and generate statistical results based on the historical data, so as to intuitively understand the borrowing and reading situation of the books through the visualization environment, which is convenient for the superiors to view and analyze. Such as encountering popular books to consider whether to increase the number of copies of the library collection, whether the cold books should be withdrawn from the library or to organize activities to let more people know, the handling of overdue book borrowing and so on.

3.2

Borrowing process analysis

Reader borrowing process: first find the book, if the book is not in the library, the book search fails, and at the same time will send the relevant records to the database background, so that administrators can choose whether to make up for the book according to the book’s hot search degree. After finding the book successfully, check the legitimacy of the reader’s identity, and if there are overdue books, you need to carry out the overdue processing operation first. After the identity is legal, within the total number of books borrowed by the reader, the process can be completed, and the borrowing information will be added to the database at the same time.

The reader’s book return process is relatively simple: through the detection of borrowing information database table, combined with the reader’s identity, the calculation of the book should be returned to the date, to determine whether the overdue, if in the return period, then return the book successfully. If it is overdue, the book will be returned to the overdue processing.

3.3

Functional module design of the system

The task of library information management sub-module is to add, delete, change and check the basic information of books, which also includes regular backup and maintenance of the book database. The reader management module is similar to the book management module, providing operations such as adding, deleting, changing and maintaining reader information. The lending management module mainly includes operations such as book borrowing registration, book return registration, renewal, overdue processing, etc. The important part is the lending recommendation submodule, which contains the main research content of this paper, and realizes the functions of recommending books to readers according to readers’ needs and recommending books to readers according to the book information by means of historical data.

Among them, the architecture of the lending recommendation part is similar to the MVC design pattern. The interface layer is similar to the View layer, which is mainly used for human-computer interaction between the user system and the user. The reader inputs relevant request information, and the system returns a list of request data after calculation, which is then displayed to the user through the interface layer. Personalized service layer is somewhat similar to the Controller layer, the layer according to the user’s different requests for different processing processes, in addition to the system automatically run the information collection module and the system recommendation module, the layer contains the main recommendation algorithm implementation. As the bottom layer of the basic database and Model layer is similar to the direct dealings with the database, mainly including database additions, deletions, changes and checks, providing data interfaces for the upper layer, while the efficiency of the database operation needs to be tuned in this layer.

3.4

System business processes

According to the functional module design of the book lending and recommending system, its administrator’s business operation flow is shown in Figure 2.

4

Calculation examples and analysis of borrowing behavior

4.1

Example analysis

In this section, experiments will be carried out on the optimization part of the reinforcement learning model of the article, and the experimental results will be analyzed. The experiments related to the joint training mechanism are mainly centered on the two indicators of time spent as well as cumulative gain, the effectiveness of the parameter update method is verified by comparing the loss function descent curve, and finally, the performance of this article’s model on two datasets is experimentally compared with that of the mainstream model.

4.1.1

Experimental data set

In this paper, we experiment the model on MovieLens dataset and Netflix RL4RS dataset oriented to reinforcement learning recommender system. The data of MovieLens originated from the website movielens.org, collected and made public by GroupLensResearch Lab of the University of Minnesota, which is one of the most commonly used datasets in the field of recommender system. It includes user features, movie features, and user-movie interaction information. The dataset used in this paper is ml-1m, and each user has no less than 40 movie ratings to ensure the adequacy of user rating information. The dataset contains 6050 user counts, 3891 item counts and 1000238 interaction records.

NetEase Fuxi Lab released a specialized dataset to solve the problem of reinforcement learning-based recommendation system: the RL4RS dataset. The data in this dataset comes from a product recommendation scenario of NetEase game, and contains two sub-datasets, Dataset A and Dataset B. Dataset A only considers how to make a single recommendation to the user, while Dataset B considers a continuous recommendation scenario, which makes multiple recommendations to the user. The RL4RS dataset has a total of 186,841 users, 298 item quantities, and 153,87489 interaction records.

4.1.2

Evaluation indicators

In this paper, precision rate and normalized discount cumulative gain (NDCG) are used as evaluation metrics. The precision rate refers to the proportion of true positive items among all items judged to be in the positive category, and is calculated as follows: (15) $P r e c i s i o n @ k = \frac{\sum_{i = 1}^{k} I (i)}{k}$

where I(i) is used to determine whether the recommended item is in the target list, and returns the value 1 if it is and 0 if it is not.

The Normalized Discount Cumulative Gain (NDCG) is calculated as: (16) $N D C G @ k = \frac{\sum_{i = 1}^{k} \frac{2^{s_{i}} - 1}{\log 2 (i + 1)}}{i D C G @ k}$

The numerator part of the formula is the Discount Cumulative Gain (DCG), where a larger i indicates that the item is further down in the recommendation sequence. iDCG is the ideal state ordering, where the items in the recommendation sequence are sorted in order of decreasing relevance scores.

Precision@k does not consider the ordering of the item as long as it is in the recommendation list. NDCG@k, on the other hand, takes into account the ordering of the items, which is a factor to focus on when the recommendation sequence is important. Larger values of Precision@k and NDCG@k metrics mean better performance of the model.

4.1.3

Experimental effects of joint training mechanisms

This section analyzes the effect of the model before and after the optimization of the sample sampling rule, and first investigates the time spent problem. Based on the original D-RS model, we use the joint training mechanism to train the model, which is called DJT-RS, and Table 1 shows the comparative experimental results of the time spent. It can be found that although the original data has been characterized as a more convenient to find the user-item scoring matrix, adding the joint training will still greatly increase the training time of the model, the training of an episode of the time spent on different datasets increased by 31.6% and 78.2%, respectively, the larger the dataset size, the joint training mechanism to bring the more time spent on the increase. This is due to the fact that joint training requires updating the user and item embedding vectors based on the recommendation results, as well as real-time tuning of the parameters of users and user groups in reinforcement learning, which brings about an increase in the time spent.

Table 1.

Results of comparative experiments on time spent

Model	Time(s)
Model	MovieLens	RL4RS
D-RS	95	308
DJT-RS	125	549

The effect of the joint training mechanism on the cumulative reward is then investigated, and Figure 3 illustrates the experimental effect of the joint training mechanism. Where the horizontal axis represents episodes and the vertical axis represents overall rewards. JT stands for joint training mechanism and T stands for normal training mechanism. Comparing the two curves, it can be found that the reward fluctuation of the model using joint training is significantly smaller than that of the pre-optimization model, which implies that the joint training mechanism can make the training of the model more stable. Episodes from 65 onwards, the co-trained model has been able to learn (intelligences receive rewards > 0) and as episodes increase, the cumulative reward becomes progressively larger and converges slightly faster than the normal training mechanism.

To analyze the reason, the joint training mechanism has been trying to find the maximum output value in reinforcement learning by taking into account the constant changes in user preferences and behaviors based on emotions, so it speeds up the convergence speed, but the process of searching is more time-consuming.

In summary, using the joint training mechanism can somewhat improve the convergence speed and reduce the episodes needed for training, but it will increase the length of each round of episode training, which needs to be weighed according to different recommendation tasks.

4.1.4

Experimental effects of parameter updating methods

In this section, the effectiveness of parameter updating methods for user groups is verified, and the comparison methods include the most commonly used optimization algorithms for reinforcement learning, adam, radam, and the gradient descent algorithm used in this paper. Figure 4 shows the loss function descent image of the policy network with different algorithms applied. When the adam optimization algorithm is applied to update the network parameters, the loss function descent process is less smooth and has large fluctuations. radam adds the rectification function on the basis of adam, and the loss function descent process becomes relatively smooth. And the gradient descent algorithm’s loss function descent process is very smooth and has a fast convergence speed.

Optimization was performed using the adam optimization algorithm and the loss function eventually dropped to 0.1157, the radam optimization algorithm dropped to 0.0098, and the gradient descent algorithm minimized the loss function to 0.0032. This demonstrates the feasibility and effectiveness of the gradient descent algorithm used in this paper for recommender system tasks.

4.1.5

Overall experimental results

DJT-RS was compared with classical methods to verify the effectiveness of the model. The selected methods for comparison include: popularity-based recommendation algorithm (POP), Bayesian analysis-based recommendation algorithm (BPR), matrix decomposition-based recommendation algorithm (FISM), recommendation algorithm based on DQN framework (DQN), recommendation algorithm based on Actor-Critic framework (RaCT), recommendation algorithm based on graph neural network (NGCF), lightweight of graph convolutional recommendation algorithms (LightGCF), recommendation algorithms based on graph unwinding modules and intention graphs (DGCF), and recommendation algorithms based on VAE framework (RecVAE).

The results of the comparison experiments on MovieLens and RL4RS datasets are shown in Tables 2 and 3, respectively, where the suboptimal results are marked in black font and the optimal results are marked with *. The experiments prove that the model proposed in this paper performs well on the two datasets and the four experimental metrics. Except for the NDCG@5 metric on the RL4RS dataset, where the performance is slightly inferior to RecVAE, the model in this paper possesses higher accuracy rate and normalized discount cumulative gain in all evaluation metrics on the other datasets.

Table 2.

Results of comparative experiments on MovieLens dataset

Method	Precision@5	NDCG@5	Precision@10	NDCG@10
POP	0.1232	0.1397	0.1032	0.1304
BPR	0.1520	0.1647	0.1252	0.1421
FISM	0.1624	0.1834	0.1505	0.1718
DQN	0.1924	0.2029	0.1762	0.2139
RaCT	0.3224	0.1154	0.3747	0.1698
NGCF	0.4368	0.3292	0.4728	0.3706
LightGCF	0.4110	0.2793	0.4746	0.3281
DGCF	0.4423	0.3386	0.5040	0.3554
RecVAE	0.4731	0.3637	0.5400	0.3771
Ours	0.4763*	0.3706*	0.5471*	0.3863*

Table 3.

Results of comparative experiments on MovieLens dataset

Method	Precision@5	NDCG@5	Precision@10	NDCG@10
POP	0.0919	0.0532	0.1075	0.0600
BPR	0.1507	0.0793	0.1428	0.0645
FISM	0.1700	0.1036	0.1803	0.0945
DQN	0.2328	0.1083	0.2285	0.1199
RaCT	0.2876	0.1143	0.3165	0.1483
NGCF	0.3231	0.1490	0.3861	0.1902
LightGCF	0.3040	0.1333	0.3604	0.1675
DGCF	0.3302	0.1603	0.3849	0.1925
RecVAE	0.3479	0.1832*	0.4113	0.2218
Ours	0.3548*	0.1739	0.4224*	0.2321*

4.2

Borrowing behavior analysis based on personalized lending system

4.2.1

Effectiveness of realization of smart library management needs

From the analysis of system requirements, it can be seen that in order to design a lending management system through the reinforcement learning framework to improve the lending experience, it is first necessary to meet a series of requirements such as reader management, book management, book lending, statistical analysis of lending data and so on. In order to verify the effectiveness of the system designed in this paper to achieve the borrowing management requirements, this paper takes the smart library of university G as an example, installs the personalized borrowing system designed in this paper in its management system, and examines the effectiveness of the system to achieve the requirements. 1)

Borrowing management data of readers in different faculties and departments

Figure 5 shows the overall book borrowing of students from different faculties. The overall book borrowing situation can be seen from the book borrowing: literature (I) has the largest amount of borrowing, indicating that the popularity is particularly high. The second is economic books (F), which also confirms that University of G is an economic institution, specializing in economics, so the readers borrowed a larger amount of economic books. In third place are language and text books (H).

As readers belong to different faculties and majors, the knowledge they have studied varies greatly, which results in readers’ borrowing process with very obvious faculty characteristics. In order to better illustrate the situation, books with generally low borrowing volume were excluded, and finally the books in the top ten categories were selected to illustrate the group characteristics of readers from different faculties. Table 4 represents the average number of books borrowed by readers of different faculties for each book, where outliers are marked in bold. In the following, the College of Information and the College of Mathematics and Statistics are taken as examples to illustrate the method of borrowing data management using the system of this paper. (1)

In the process of borrowing books in the College of Information, the average borrowing amount of books in category B (philosophy and religion), category C (general social science), category F (economics), category I (literature), category K (history and geography), and category T (industrial technology) are outliers. After checking the specific values in the table, it is found that T (industrial technology) is anomalous because of the highest average loan volume, and the other books are anomalous because of the low loan volume. It is not difficult to find out that the College of Information Technology belongs to the engineering category, and the majors offered by the College are mainly electronic information and computer, so it is not surprising that the average amount of books borrowed from T (industrial technology) is the highest, but the low amount of books borrowed from other categories also reflects the readers’ bias towards learning professional knowledge and neglecting the overall development.

(2)

After observation, it was found that the School of Mathematics and Statistics had outliers in the average amount of books borrowed in C (General Social Sciences) and O (Mathematical Sciences and Chemistry). Comparing the specific values, it was found that the average borrowing of books in the category of C (General Social Sciences) and O (Mathematical Sciences and Chemistry) was characterized as high. The reason for this may be that the majors in the School of Mathematics and Statistics are mainly mathematics and statistics, so it is not surprising that the average number of books borrowed in these two categories is high. And the average borrowing amount of other kinds of books is at a normal level, which can reflect the readers’ wide range of hobbies and comprehensive development.

Table 4.

The books borrowing situation of each schools

	B	C	D	F	G	H	I	J	K	O	T
Info.	0.78	0.40	0.17	0.67	0.32	1.28	4.01	0.39	0.69	0.59	9.95
Art	0.78	0.43	0.08	0.40	0.15	0.43	2.97	11.44	0.72	0.03	1.62
Hum.	1.54	0.72	0.23	1.21	2.14	2.01	16.32	1.65	2.22	0.06	1.29
Fore.	1.00	0.36	0.18	0.98	0.46	10.38	6.78	0.44	1.15	0.03	0.25
Law	1.47	0.53	6.85	0.85	0.40	1.43	5.35	0.45	1.00	0.06	0.26
Math	1.62	1.41	0.41	2.30	0.53	2.05	7.44	0.54	0.87	1.74	0.97
BA	1.62	1.14	0.31	4.50	0.56	1.84	6.95	0.59	1.43	0.35	0.73
Tour.	1.72	1.00	0.24	3.22	0.80	1.76	7.29	0.51	1.66	0.12	0.49
CS.	1.43	0.69	0.34	4.27	0.43	3.29	6.87	0.55	1.47	0.36	0.65
Fin.	1.22	0.54	0.35	5.08	0.39	1.67	6.19	0.47	1.02	0.34	0.59
Bio.	1.28	0.93	0.14	0.71	0.33	1.90	7.25	0.55	1.16	0.69	1.14
Man.	1.10	0.70	0.20	2.92	0.50	1.83	8.67	0.62	1.41	0.75	2.01
FTM	1.17	0.79	0.42	3.61	0.53	1.48	7.95	0.54	1.25	0.25	0.61
Acc.	1.22	0.54	0.45	3.62	0.40	1.78	6.19	0.46	1.11	0.28	0.49
PA	1.48	0.82	1.25	1.70	0.54	1.54	7.31	0.50	1.34	0.35	0.67
EM	1.22	0.61	0.24	2.20	0.38	1.40	5.82	0.41	1.22	0.16	0.47

In conclusion, this system provides statistics on the overall borrowing of books by readers as well as the borrowing situation of each college, and then finds out the outliers of each college on the basis of the average borrowing volume, which can intuitively understand the readers’ preference of each college from the point of view of the outliers to realize personalized reading recommendation and enhance the reading experience.

2)

Borrowing management data of readers in different grades

Readers’ book borrowing behavior is affected by the grade level of the readers, so the following is to explore the stage characteristics of readers’ book borrowing in different grade levels. Since readers are in different grades in different time periods, the number of people in each grade varies greatly, so the absolute amount of books borrowed is not comparable. Based on this, this paper uses the average borrowing volume of each grade to represent the characteristics of a grade, and the results are shown in Figure 6.

As can be seen from the figure, in general the types of books borrowed in these four grades are basically the same. The borrowed books are basically concentrated on the ten categories of Literature (I), Economics (F), Philosophy (B), Social Sciences (C), Politics and Law (D), Languages (H), Arts (J), History (K), Mathematical and Physical Sciences (O), and Automation Technology (T). However, there is a difference in the amount of books borrowed in each middle category, so it is only necessary to analyze these ten categories separately and observe the average amount of books borrowed in these ten categories in each grade level, so that the grade level characteristics of readers’ book borrowing can be observed from the size of the amount of books borrowed.

3)

Borrowing traffic management data per unit time slot

Grasping the distribution of lending traffic in a fixed period of time can help optimize the management of the smart library, develop personalized lending services, and improve the lending experience. Since the library of University of G lends from 8:00 a.m. to 9:30 p.m., readers can borrow on their own during this period. Based on the characteristics of the university library, the borrowing time is divided into 14 time slots with an interval of one hour.

The amount of books borrowed during each time slot is summarized statistically and obtained as shown in Figure 7. Through the graph, it can be seen that the school every day at about 11:00 and 19:00 in the afternoon for the peak borrowing period. Through analysis, it is found that readers usually do not go to the library too early to borrow books due to the work schedule and school teaching schedule as well as the opening time of the library. With the end of the first two classes in the morning, some students who do not have any subsequent classes will go to the library to borrow books, so the amount of books borrowed gradually increases and reaches a great value around 11:00 am. Afterwards, the number of books borrowed gradually decreases as lunch is approaching. In the afternoon, the borrowing volume reaches its peak at around 19:00, which is similar to the situation in the morning. However, the evening lending volume is larger than the morning and lasts a long time, this is due to the evening free time, a considerable part of the students choose to go to the library to borrow books, reading books to supplement their knowledge reserves. Therefore, the library can rationally arrange personalized lending services according to the above characteristics of lending traffic to improve the lending experience.

4.2.2

Effectiveness of personalized recommendation in lending system

In the one-week period after the application of this paper’s system in the library of this university, the comparison of the borrowing data before and after the application was counted to visually test whether the personalized borrowing service method based on reinforcement learning experienced a personalized borrowing experience and attracted more readers to borrow books.

Based on this, June 2024 was taken as the observation period, and the new personalized lending system was introduced on June 15, and the lending information for each day in June was counted, as shown in Figure 8. As a whole, the book lending in a week every day of the lending volume changes a lot, and shows a relatively regular lending pattern, that is, in the Saturday and Sunday lending volume shows an upward trend, and Sunday lending volume is the highest in a week, and then Monday to Friday basically shows a downward trend, in which the lending volume on Friday is the smallest.

After the introduction of the new personalized lending system for lending services on June 15, the number of loans increased significantly in the following days, and the peak number of daily loans in late June was close to 500, which was about 50 books higher than that before the implementation of personalized lending services. It can be seen that the smart library personalized lending service system based on the reinforcement learning framework can intelligently recommend reading materials that may be of interest to readers, which is conducive to improving the reading experience.

5

Conclusion

In this paper, we improve the reinforcement learning personalized recommendation algorithm by means of dynamic user group clustering, and accordingly design a library management system aimed at enhancing personalized borrowing experience. The joint training method used in the study has an increase in time spent, but the reward of the model fluctuates less and converges faster than the normal training mechanism. In this paper, the gradient descent algorithm is used to realize the parameter update of user groups, which obtains a smaller loss function value (0.0032) and converges faster. Meanwhile, the model in this paper achieves significantly leading recommendation accuracy and normalized discount cumulative gain on both MovieLens and RL4RS datasets. The personalized lending management system under the reinforcement learning framework successfully analyzes the reading preferences of readers from different faculties and grades, and counts the distribution pattern of lending traffic during fixed time periods. Comparison reveals that there is a significant increase in the number of borrowing after applying the system in this paper. All the above conclusions show that the user personalized lending service system based on the reinforcement learning framework can analyze the lending data scientifically and thus improve the lending experience.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias de la vida, Ciencias de la vida, otros, Matemáticas, Matemáticas aplicadas, Matemáticas generales, Física, Física, otros

RSS Feed de revista

Study on the Enhancement of Personalized Borrowing Experience of Smart Library Users Based on Reinforcement Learning Framework

Haiying Sun

Mingzhi Fan

Publicado en línea: 26 sept 2025

Recibido: 03 feb 2025

Aceptado: 10 may 2025

DOI: https://doi.org/10.2478/amns-2025-1045

Palabras claveReinforcement learning, Joint training, Personalized lending, Smart library, Dynamic user groups

© 2025 Haiying Sun and Mingzhi Fan, published by Sciendo.

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
Reinforcement learning, Joint training, Personalized lending, Smart library, Dynamic user groups