Research on unmanned delivery path optimization strategy based on reinforcement learning in intelligent logistics system
Published Online: Sep 26, 2025
Received: Jan 13, 2025
Accepted: Apr 29, 2025
DOI: https://doi.org/10.2478/amns-2025-1079
Keywords
© 2025 Lijun Kao et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
With the rapid development of China’s economy, the rise of e-commerce and the increasing demand of consumers for logistics services, the logistics industry is facing unprecedented challenges. The traditional logistics and distribution methods can no longer meet the needs of modern society in terms of efficiency, cost and service quality [1-4]. Therefore, the concept of intelligent logistics came into being, aiming to realize the intelligence, automation and high efficiency of the logistics distribution process through the introduction of advanced information technology, Internet of Things, artificial intelligence and other means [5-7]. As an unmanned, fast and flexible distribution method, UAV distribution has unique advantages, but due to the complexity of the UAV distribution path planning and optimization problem, how to effectively plan the distribution path of UAVs and maximize the distribution efficiency has become a key issue [8-11].
The goal of path optimization is to find the shortest, safest, and most energy-efficient flight path for UAVs from the origin to the destination. Obstacles such as buildings, power lines, and communication towers in the urban environment can pose a threat to the flight of UAVs. Therefore, it is necessary to avoid these obstacles when planning the path [12-15]. Considering the regulations of air traffic control, the flight altitude and airspace of UAVs also need to be strictly limited. Weather conditions in different regions, such as the strength and direction of wind and rainfall, also affect the flight efficiency and safety of UAVs, so these factors should also be taken into account in path planning [16-19]. In order to achieve path optimization, some advanced techniques and algorithms can be applied, such as reinforcement learning. Reinforcement learning is a machine learning method, which is a process of learning through the interaction between an intelligent body and its environment [20-23]. In reinforcement learning, the intelligent body interacts with the environment by choosing actions and getting feedback information from the environment, thus achieving the optimal path optimization [24-25].
Literature [26] explored the application of UAVs in logistics operations, pointing out that datasets, constraint distributions, and features have a significant impact on the selection of optimal delivery schemes and methods. Literature [27] proposed an attention-based pointer network model to solve the UAV delivery trajectory optimization problem, which effectively and automatically adapts to new UAV trajectory data and outperforms heuristic algorithms in UAV trajectory analysis and optimization. Literature [28] aims to provide an overview of route planning concepts, logics, trends, and opportunities, noting that a proper look at the to-do tasks of operations management can identify development strategies or possible system introductions, including methods and tools. Literature [29] designed a factory logistics route planning model based on a grid environment and ant colony optimization algorithm, which enables accurate logistics route planning and provides a competitive solution for implementing smart logistics. Literature [30] proposed an intelligent logistics UAV to replace the courier to complete the delivery of small goods and be able to detect the surrounding obstacles in flight, and the ground control station can receive the feedback information to prepare for emergency actions. Literature [31] investigated the effects of wind speed and direction on the UAV flight state, as well as the logistics UAV path planning problem with UAV energy constraints, customer time windows and wind conditions, and merged LNS with GA to form GA-LNS, revealing that the algorithm can produce a superior solution. Literature [32] analyzed the application of Q-learning algorithm for reinforcement learning in optimizing logistics distribution paths and elaborated that the algorithm performs well in path optimization and time saving, improves the distribution efficiency, and reveals the potential of reinforcement learning in logistics distribution optimization.
Literature [33] proposed an unmanned vehicle path planning strategy based on ant colony algorithm, introduced the optimization strategy by analyzing the advantages and disadvantages of ant colony algorithm, and the results of the study provide important reference value for actual unmanned vehicle path planning. Literature [34] outlines the problems of wasteful logistics data supervision and high logistics and distribution costs in the development process of cold chain, and proposes a path optimization strategy for cold chain unmanned vehicles. Literature [35] proposed a dynamic path planning strategy based on fuzzy logic and improved ant colony optimization, revealing that the FLACO algorithm is capable of identifying the most cost-effective and efficient paths, and emphasizing that FLACO can find the most efficient and safe paths for unmanned vehicles. Literature [36] proposed a path planning method that combines BAS and SA algorithms, revealing that the method excels in planning paths in terms of path length, path points, and safe obstacle avoidance, and is able to satisfy the needs of path planning for low-altitude logistics for UAVs. Literature [37] examined the multi-vehicle-UAV cooperative distribution problem considering congestion, proposed a heuristic column generation algorithm, and designed a dynamic planning algorithm to solve the pricing problem, and launched a sensitivity analysis to derive the impact on management. The above research explores the path optimization methods for drones and unmanned vehicles under the algorithms and models based on attention, ant colony optimization algorithm, reinforcement learning, etc., which reveals the importance of path optimization for various industries, especially in the field of science and technology, and at the same time, it also shows that there are a lot of ways to achieve path optimization, which is important for the logistics and unmanned vehicle fields, It also shows that there are many ways to realize “path optimization”, which is important for logistics, drones and other fields.
The study first introduces the principles of deep reinforcement learning algorithms and their applications in navigation, including the basic principles of reinforcement learning and the principles of deep reinforcement learning based on the value function, policy gradient, and AC framework, which lays the theoretical foundation for the design of the subsequent UAV algorithms and comparative experiments. Subsequently, an improved, end-to-end, deep reinforcement learning method incorporating an attention mechanism is proposed to solve the vehicle path problem with a time window. The model combines the decoder with the attention mechanism to parameterize the stochastic strategy for solving the VRPTW. A reward function is designed to inform the model how to adjust the model parameters to improve the quality of the solution by incorporating the idea of Markov decision process, and the model is trained by a RINFORCE algorithm that fuses the reinforcement learning Actor Critic idea with the idea of round-based updating, incorporating the idea of Actor Critic, and a reinforcement learning-based environment of good state transfer function model. Finally, the trained deep reinforcement learning model is used to solve the problem, and the solution results are analyzed.
The feedback framework of the reinforcement learning algorithm is shown in Fig. 1, each time the action is executed, the environment will give reward feedback to the unmanned delivery fleet, and the behavior of the unmanned delivery fleet will be affected by the environmental rewards, when the unmanned delivery fleet obtains the state observation at the moment of

Perception-action-reward feedback framework for reinforcement learning
Reinforcement learning is usually modeled using a Markov decision process (MDP), denoted as a quintuple
The most representative solution methods for reinforcement learning can be divided into three categories: value function methods, policy gradient methods, and Actor-Critic methods that combine the two. Before introducing the three categories of methods, some concepts or symbolic representations are first explained:
Value function: i.e., action value function
State-value function: the expectation of the value of the action, which is used to reflect the good ring of the current state, denoted as
Optimal action: denotes the action when Value function approach Q-learning is the most representative value function-based approach among traditional reinforcement learning algorithms, where the
Its
Since Q-learning cannot handle too large state-action spaces, it is difficult to be applied to complex tasks. Combining neural networks with Q-learning algorithms, the famous DQN algorithm is proposed, the core of which is the use of convolutional neural networks to fit the value function, which inputs the state
Where function
The quadratic cost function is used to calculate the error as equation (4):
where
Subsequently, the gradient of the objective function is solved to obtain Eq. The update of the network parameters using the back propagation algorithm.
The training flow of the DON algorithm is shown in Fig. 2.

DQN algorithm training flowchart
In order to alleviate the
Where,
TD-error is calculated according to equation (8):
Strategy gradient The value function based method uses the action value network to estimate the value of all actions in the current state and selects the optimal action based on the action selection strategy and estimation results. In contrast, in the strategy gradient-based method, a neural network is used to directly approximate the strategy function
The unmanned delivery fleet can get the probability of executing a certain action in the current state according to the policy network, and after the action is executed, the environment will give rewards thus realizing the updating of the network, and in the actual use process, the state value function is usually used to evaluate the current state and the policy network, as shown in (10):
As state
The goal is to maximize
where
The specific expression of the strategy gradient can be derived based on Eq. The derivation considers
After that, its discrete and continuous expressions can be obtained:
where
where Actor-Critic. Reinforcement learning methods based on the AC framework usually contain two networks, the Actor network is used to approximate the policy function, which is used for decision making, and the Critic network is used to approximate the value function, which is used for evaluating the actions made in the current state, and the framework of the Actor-Critic algorithm is shown in Fig. 3.

Actor-Critic algorithm framework
The first part of the model is to map the local and global state information from reinforcement learning into a high dimensional vector space after a 1D convolution operation. For customer
Attention mechanism can effectively deal with the expansion of customer node size, decoding customer node
The calculation formula is as follows:
where
Next, the probability estimate for each customer is computed, where
The input to a long short-term memory neural network has three components, the hidden state output of the previous moment
A separate network is used to obtain the Critic value estimation used to fit the trajectory rewards, the input to the Critic value estimation network is sample instance
A masking function is set up in the model, which shields those nodes that do not satisfy the feasibility conditions and serves the purpose of speeding up the training. If at the moment of decoding step
Where
At each decoding step, the next next customer node to be visited is determined based on the probability estimates of all customer nodes. The model uses two decoding strategies, the greedy decoding strategy is to greedily select the customer node with the highest probability given by the model each time the next customer node to be visited is selected, and the stochastic decoding strategy is a strategy to randomly select the next customer node that obeys a probability distribution based on the estimated probability distribution.
The reward function is defined on the basis of the sequence of customer nodes
For VRPTW reinforcement learning the system states are denoted by
The update of
Where
The car’s loading capacity is updated next:
There is no need to consider the case where
Meanwhile the update formula for the demand is as follows:
The objective function is the goal of updating the parameters of the reinforcement learning algorithm, expressed as some function of the policy parameters, with the aim of minimizing the objective function, which is set in the model to be the negative expected total return of trajectory
The gradient of the loss function for
Derive the formula according to the probability chain and obtain
In this section, experiments on solving the arithmetic example are conducted for an arithmetic example with a customer point size of 100. In the arithmetic example, there is no effect of time-varying characteristics and customer time window, i.e., the unmanned delivery fleet is at a constant speed throughout the delivery process and the customer can receive the service at any time. In the arithmetic example, time-varying characteristics and customer time windows affect the entire delivery scheme of the unmanned delivery fleet with drones, i.e., the speed of the unmanned delivery fleet changes over time, and the customer receives the service outside of its time window incurs a certain time-window penalty cost. The 100-size arithmetic example consists of 101 nodes, and the 100-size node information is shown in Table 1. The number 100 is the distribution center (Depot), the number 0-9 is the drone delivery customer point (DC), the number 90-99 is the unmanned delivery fleet delivery customer point (TC) and the number 10-89 is the unmanned delivery fleet and drone co-delivery customer point (FC), the node distribution diagram of the 100-scale example is shown in Figure 4, and the customer time window distribution diagram is shown in Figure 5.
100 scale node information
| Numbering | Coordinate Position | Demand | Coordinate Attribute | Numbering | Coordinate Position | Demand | Coordinate Attribute |
|---|---|---|---|---|---|---|---|
| 0 | (21,42) | 2 | DC | 51 | (34,14) | 14 | FC |
| 1 | (21,22) | 0 | DC | 52 | (25,40) | 19 | FC |
| 2 | (21,10) | 2 | DC | 53 | (58,29) | 18 | FC |
| 3 | (54,12) | 1 | DC | 54 | (30,71) | 24 | FC |
| 4 | (53,38) | 5 | DC | 55 | (64,56) | 18 | FC |
| 5 | (14,22) | 2 | DC | 56 | (43,53) | 16 | FC |
| 6 | (57,48) | 8 | DC | 57 | (54,58) | 23 | FC |
| 7 | (32,13) | 0 | DC | 58 | (3,61) | 20 | FC |
| 8 | (40,28) | 1 | DC | 59 | (43,60) | 20 | FC |
| 9 | (7,68) | 1 | DC | 60 | (19,23) | 7 | FC |
| 10 | (29,45) | 16 | FC | 61 | (54,42) | 23 | FC |
| 11 | (53,13) | 12 | FC | 62 | (46,47) | 18 | FC |
| 12 | (50,74) | 14 | FC | 63 | (19,34) | 20 | FC |
| 13 | (30,68) | 19 | FC | 64 | (31,41) | 7 | FC |
| 14 | (16,18) | 17 | FC | 65 | (19,72) | 4 | FC |
| 15 | (56,66) | 22 | FC | 66 | (19,21) | 24 | FC |
| 16 | (28,42) | 20 | FC | 67 | (35,69) | 17 | FC |
| 17 | (36,24) | 11 | FC | 68 | (46,9) | 15 | FC |
| 18 | (24,25) | 20 | FC | 69 | (46,53) | 24 | FC |
| 19 | (29,23) | 11 | FC | 70 | (17,24) | 16 | FC |
| 20 | (70,62) | 17 | FC | 71 | (14,56) | 12 | FC |
| 21 | (29,48) | 10 | FC | 72 | (32,2) | 14 | FC |
| 22 | (55,41) | 18 | FC | 73 | (24,31) | 18 | FC |
| 23 | (25,63) | 18 | FC | 74 | (11,27) | 12 | FC |
| 24 | (61,28) | 11 | FC | 75 | (20,30) | 16 | FC |
| 25 | (19,10) | 17 | FC | 76 | (44,27) | 17 | FC |
| 26 | (19,49) | 23 | FC | 77 | (60,38) | 20 | FC |
| 27 | (64,11) | 17 | FC | 78 | (23,23) | 15 | FC |
| 28 | (23,21) | 19 | FC | 79 | (40,13) | 18 | FC |
| 29 | (14,50) | 6 | FC | 80 | (43,37) | 14 | FC |
| 30 | (64,74) | 17 | FC | 81 | (45,34) | 12 | FC |
| 31 | (10,19) | 20 | FC | 82 | (3,4) | 8 | FC |
| 32 | (8,58) | 16 | FC | 83 | (3,56) | 15 | FC |
| 33 | (47,35) | 16 | FC | 84 | (42,16) | 15 | FC |
| 34 | (49,52) | 7 | FC | 85 | (5,18) | 22 | FC |
| 35 | (42,29) | 16 | FC | 86 | (17,11) | 17 | FC |
| 36 | (7,14) | 21 | FC | 87 | (18,25) | 19 | FC |
| 37 | (15,40) | 19 | FC | 88 | (49,10) | 16 | FC |
| 38 | (36,57) | 22 | FC | 89 | (6,18) | 10 | FC |
| 39 | (12,28) | 14 | FC | 90 | (5,47) | 8 | TC |
| 40 | (23,1) | 18 | FC | 91 | (53,9) | 31 | TC |
| 41 | (21,28) | 17 | FC | 92 | (19,47) | 19 | TC |
| 42 | (45,67) | 19 | FC | 93 | (65,17) | 23 | TC |
| 43 | (43,14) | 17 | FC | 94 | (68,48) | 32 | TC |
| 44 | (54,48) | 17 | FC | 95 | (54,52) | 30 | TC |
| 45 | (21,63) | 11 | FC | 96 | (41,42) | 21 | TC |
| 46 | (16,24) | 17 | FC | 97 | (69,32) | 26 | TC |
| 47 | (73,2) | 11 | FC | 98 | (31,12) | 32 | TC |
| 48 | (40,39) | 22 | FC | 99 | (60,66) | 18 | TC |
| 49 | (34,31) | 24 | FC | 100 | (32,38) | 0 | Distribution center |
| 50 | (14,41) | 2 | FC |

100 scale calculation example node distribution

The 100 size of the customer time window distribution
Using the model trained under the ideal speed and no time window conditions combined with random sampling decoding method to solve the example, the solution result that the total cost of distribution of the UAV-assisted unmanned delivery fleet distribution case considering the customer service limitation under the size of 100 customer points is 2135, 100 example path scheme as shown in Table 2. As can be seen from the table, in the size of 100 customer points to consider the customer service limitations of the drone-assisted unmanned delivery fleet delivery case needs to be by four 400 capacity unmanned delivery fleet, respectively, in the four drones to complete the delivery task with the assistance of the drone. The first group of unmanned delivery fleet and drones provide logistics and distribution services to 26 customer locations, and the loading rate of the unmanned delivery fleet1 is 99.2%. Among them, 23 customer points were served by the unmanned delivery fleet and 3 customer points were served by UAV1. Overall, the distribution tasks of the unmanned distribution fleet and the drone under this distribution program are reasonably distributed and meet the requirements of vehicle loading capacity.
100 example path scheme
| Numbering | Path | Total demand | Loading rate |
|---|---|---|---|
| Truck 1 | 100→62→51→72→47→89→90→34→94→31→30→52→72→24→69→9→67→60→37→42→26→16→18→64→100 | 386 | 99.2% |
| Uav 1 | 100→0→ 47, 38→ 83→ 27, 65→ 14→ 69 | 12 | |
| Truck 2 | 100→56→75→72→61→82→33→45→17→24→70→87→35→26→87→42→69→93→17→40→100 | 322 | 98.3% |
| Uav 2 | 100→ 21→ 54, 64→ 0→ 32, 24→ 70→ 16, 36→80→88, 84→ 0→ 39, 76→ 1→ 99 | 69 | |
| Truck 3 | 100→50→96→54→59→40→13→98→19→18→54→48→67→35→60→59→23→97→95→76→76→100 | 352 | 99.6% |
| Uav 3 | 11→ 35→ 18, 47→ 57→ 38, 57→ 6→ 13 | 50 | |
| Truck 4 | 100→49→16→22→47→39→86→85→38→74→7→89→88→19→92→29→93→53→81→35→100 | 322 | 97.5% |
| Uav 4 | 100→ 91→ 15, 54→ 8→ 34, 78→ 82→ 61, 88→27→25, 26→5→20 | 61 |
The path diagram of the algorithm is shown in Fig. 6, which shows the customer points and access paths served by each of the unmanned delivery fleet and the UAV during the delivery process, and it can be judged that the unmanned delivery fleet and the UAV have completed the service to all the customer points in this delivery task.

The path diagram of the example
In the above arithmetic example, there is often a case of low vehicle loading rate, in this section, we use the same case as the arithmetic example and set up a heterogeneous fleet for solving, where the single fleet adopts the traditional distribution path scheme, and the heterogeneous fleet adopts the distribution path scheme designed in this paper, so that we can better understand the impact of different unmanned distribution schemes on solving the multi-trip path planning problem. Next, the results of solving the heterogeneous fleet model at different scales are shown and compared with the results of solving the single fleet. 25 The algorithmic path schemes are shown in Table 3. The table shows a comparison between the single fleet and heterogeneous fleet multi-trip problem solution schemes at 26 customer size. It is clearly seen that in the single fleet case, the loading rate for both 1-2 and 2-2 trips is below 50%, while in the heterogeneous fleet case, the loading rate for each trip is over 85%. The total cost of the final example is 915, with transportation costs of 875 and fixed costs of 40.
Example path scheme
| Scale of 25(C=150) | Scale heterogeneity | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Numbering | Total demand | Loading rate | General course | Stroke rate | Numbering | Total demand | Loading rate | General course | Stroke rate |
| 1-1 | 142 | 97% | 221 | 98.5% | 1-1 |
195 | 98.5% | 202 | 87.6% |
| 1-2 | 65 | 44.6% | 2-1 |
132 | 93.6% | 213 | 92.6% | ||
| 2-1 | 132 | 87.6% | 192 | 82.6% | 2-2 |
80 | 86.7% | ||
| 2-2 | 45 | 29.6% | |||||||
50 The example path scenarios are shown in Table 4. In the single fleet case, the utilization rate of vehicle loaded goods is relatively low. In contrast, in the heterogeneous fleet case, a total of four persons make deliveries, and each person’s loading rate for each trip exceeds 80%, effectively saving vehicle transportation costs. The total cost of the final example is 1522, of which the transportation cost is 1432 and the fixed cost is 90.
Example path scheme
| Scale of 50(C=150) | Scale heterogeneity | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Numbering | Total demand | Loading rate | General course | Stroke rate | Numbering | Total demand | Loading rate | General course | Stroke rate |
| 1-1 | 149 | 99.8% | 220 | 94.2% | 1-1(200) | 193 | 95.1% | 213 | 89.6% |
| 1-2 | 93 | 60.3% | 1-2 |
96 | 96.9% | ||||
| 2-1 | 146 | 99.7% | 241 | 99.3% | 2-1 |
143 | 96.5% | 186 | 77.6% |
| 2-2 | 38 | 23.1% | 3-1 |
96 | 97.4% | 235 | 97.6% | ||
| 3-1 | 148 | 96.4% | 240 | 100% | 3-2 |
93 | 95.4% | ||
| 3-2 | 66 | 40.5% | 4-1 |
84 | 82.2% | 133 | 56% | ||
| 4-1 | 68 | 48.3% | 178 | 75.6% | |||||
The 100-calculation path scenario is shown in Table 5. In the single convoy case, the loading rates for the 2-3, 5-2, and 6-2 trips are 46.5%, 46%, and 32.8%, respectively. In contrast, in the heterogeneous fleet case, the lowest loading rate is 75.8% for the 7-2 trip, which is much higher than the single fleet loading rate. The total final cost was 3522, with transportation costs of 3239 and fixed costs of 283.
Example path scheme
| Scale of 50(C=150) | Scale heterogeneity | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Numbering | Total demand | Loading rate | General course | Stroke rate | Numbering | Total demand | Loading rate | General course | Stroke rate |
| 1-1 | 144 | 93.8% | 230 | 97.9% | 1-1(200) | 191 | 97.2% | 233 | 95.5% |
| 1-2 | 150 | 99.1% | 2-1(200) | 184 | 95.8% | 230 | 75.8% | ||
| 2-1 | 147 | 98.6% | 215 | 88.6% | 3-1(200) | 199 | 99.6% | 239 | 96.5% |
| 2-2 | 130 | 91.2% | 4-1(200) | 196 | 97.3% | 229 | 99.6% | ||
| 2-3 | 80 | 46.5% | 5-1(200) | 198 | 85.5% | 245 | 100% | ||
| 3-1 | 150 | 95.7% | 239 | 99.6% | 6-1(200) | 187 | 98.2% | ||
| 3-2 | 136 | 88.3% | 7-1(200) | 138 | 92.2% | 222 | 93.6% | ||
| 4-1 | 149 | 95.2% | 235 | 96.2% | 7-2(200) | 138 | 97.2% | ||
| 4-2 | 131 | 87.5% | |||||||
| 5-1 | 144 | 92.5% | 232 | 98.2% | |||||
| 5-2 | 66 | 46% | |||||||
| 6-1 | 147 | 98.3% | 150 | 58.6% | |||||
| 6-2 | 57 | 32.8% | |||||||
In order to more deeply understand and analyze the influence of the combination between different working hours and unmanned delivery fleet loading capacity on the solution of multi-trip path planning problems with different scale nodes, this section randomly selects a case from the 10 test cases, and solves the solution for the combination of different unmanned delivery fleet loading capacity and unmanned delivery fleet working hours, corresponding to the distance that can be traveled by the unmanned delivery fleet, respectively. The solution results are as follows:
For the case of 25+1 scale nodes, the total cost reaches the minimum value when the unmanned delivery fleet loading capacity is 200 and the working duration is 7.5 hours, while the total cost under all three unmanned delivery fleet types reaches the maximum value when the working duration is 4 hours. For the case of 50+1 scale nodes, we observe that among the different vehicle capacities, the cost of spending different working hours at 200 capacity is consistently lower than the other combinations of vehicle capacity and working hours. Specifically, the total cost reaches its highest value when the unmanned delivery fleet capacity is 100 and the working time is 11 hours. And the total cost reaches its lowest value when the unmanned delivery fleet capacity is 110 and the working hours are 6.5 hours. For the case of 100+1 scale nodes, we observe a trend of smaller cost with larger on-board capacity as the number of scale nodes increases. Specifically, the total cost is minimized when the on-board capacity is 200 and the working duration is 8 hours. However, when the on-board capacity is 100, the solution costs for different working hours are higher than the case of on-board capacity of 200, especially peaking at 6.5 hours of work. The 100+1 node path planning is shown in Fig. 7 (Figs. a~c are for the 25+1 scale, 50+1 scale, and 100+1 scale, respectively).

Node path planning
This paper focuses on the path optimization problem of unmanned aircraft delivery mode considering customer service constraints and time windows, designs a deep reinforcement learning solution framework that can effectively solve the problem according to the problem characteristics, and designs arithmetic experiments to validate the effectiveness of this paper’s method.
The study utilizes a deep reinforcement learning model for solving the problem, and finds that the first group of unmanned delivery fleet and UAVs provide logistics and distribution services for 26 customer points in total, and the loading rate of unmanned delivery fleet is 99.2%. Overall, the distribution tasks of the unmanned delivery fleet and UAVs under this distribution scheme are reasonable and meet the requirements of vehicle loading capacity.
In the comparison experiment of single fleet and heterogeneous fleet, the loading rate of 1-2 and 2-2 trips is lower than 50% in the case of single fleet, while the loading rate of heterogeneous fleet under the unmanned distribution path scheme designed in this paper is more than 85% for each trip. This proves the superiority of the intelligent distribution logistics model designed in this paper.
