Accès libre

Research on unmanned delivery path optimization strategy based on reinforcement learning in intelligent logistics system

,  et   
26 sept. 2025
À propos de cet article

Citez
Télécharger la couverture

With the rapid development of the logistics industry, unmanned logistics fleets play an increasingly important role in distributing goods in the region. In this paper, deep reinforcement learning algorithms are proposed, which are applied to the distribution path planning problem of unmanned logistics fleet with time window and the distribution path planning problem of unmanned logistics fleet with time window considering regional congestion. Based on deep reinforcement learning, an Actor policy network model incorporating the attention mechanism and a Critic value judgment network are designed. Through the parameterized probability estimator for model input state output node probability estimation, the reinforcement learning reward function, state transfer function, masking scheme, decoding strategy, objective and loss function, and the reinforcement learning algorithm that combines the ideas of AC and round updating used to train the policy network model and the value network model are given for the VRP problem with time window. Finally, an arithmetic example is used to explore the impact of different combinations of working hours and different vehicle loading capacities in a round on the total cost. Under three customer node sizes, the optimal solution occurs under the combination of 8 hours of working hours and 200 vehicle loading capacity, and the reasonable arrangement of different types of unmanned delivery fleets and working hours can effectively reduce the total cost in the delivery process.