Research on unmanned delivery path optimization strategy based on reinforcement learning in intelligent logistics system 
Online veröffentlicht: 26. Sept. 2025
Eingereicht: 13. Jan. 2025
Akzeptiert: 29. Apr. 2025
DOI: https://doi.org/10.2478/amns-2025-1079
Schlüsselwörter
© 2025 Lijun Kao et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
With the rapid development of the logistics industry, unmanned logistics fleets play an increasingly important role in distributing goods in the region. In this paper, deep reinforcement learning algorithms are proposed, which are applied to the distribution path planning problem of unmanned logistics fleet with time window and the distribution path planning problem of unmanned logistics fleet with time window considering regional congestion. Based on deep reinforcement learning, an Actor policy network model incorporating the attention mechanism and a Critic value judgment network are designed. Through the parameterized probability estimator for model input state output node probability estimation, the reinforcement learning reward function, state transfer function, masking scheme, decoding strategy, objective and loss function, and the reinforcement learning algorithm that combines the ideas of AC and round updating used to train the policy network model and the value network model are given for the VRP problem with time window. Finally, an arithmetic example is used to explore the impact of different combinations of working hours and different vehicle loading capacities in a round on the total cost. Under three customer node sizes, the optimal solution occurs under the combination of 8 hours of working hours and 200 vehicle loading capacity, and the reasonable arrangement of different types of unmanned delivery fleets and working hours can effectively reduce the total cost in the delivery process.
