Research on Reinforcement Learning Based Regulation Scheme for Renewable Energy System in Green Buildings
Publicado en línea: 21 mar 2025
Recibido: 22 oct 2024
Aceptado: 02 feb 2025
DOI: https://doi.org/10.2478/amns-2025-0607
Palabras clave
© 2025 Yin Li et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
In today’s society, with the increasing severity of environmental problems and energy crisis, the concept of green building is getting more and more attention and importance. Green building emphasizes to maximize resource conservation, protect the environment and reduce pollution during the whole life cycle of the building, and to provide people with healthy, suitable and efficient use of space [1-4]. The utilization of renewable energy is a crucial part of green building, which can not only reduce the energy consumption of the building, reduce the dependence on traditional energy sources, but also provide a clean and sustainable energy supply for the building [5-7].
Renewable energy refers to energy sources that can be continuously regenerated and perpetually utilized in nature, such as solar energy, wind energy, hydroelectric energy, and bioenergy. The utilization of renewable energy in green buildings is of great significance [8-9]. Reducing energy consumption and greenhouse gas emissions: Traditional buildings often rely on non-renewable energy sources such as fossil fuels, and the combustion process produces a large amount of carbon dioxide, sulfur dioxide and other greenhouse gases and pollutants, causing serious damage to the environment. The use of renewable energy can effectively reduce the energy demand of buildings, reduce greenhouse gas emissions, and alleviate the pressure of global climate change [10-13]. Reducing building operating costs: Although the initial investment in renewable energy may be relatively high, its operation and maintenance costs are low in the long run. With the continuous progress of technology and the gradual reduction of costs, the use of renewable energy can save building owners a lot of energy costs and improve the economic efficiency of buildings [14-17]. In addition, the utilization of renewable energy can also provide buildings with a more stable and comfortable indoor environment [18].
Literature [19] reveals the opportunities and challenges of renewable energy in green buildings, with the most significant challenge being the high upfront cost of renewable energy technologies. It also pointed out that the reliability issues posed by renewable energy require effective energy storage solutions and grid integration strategies. Literature [20] reviewed the emerging practices of integrating renewable energy in the building sector. Based on a case study, it was noted that the integration of renewable energy into buildings can fulfill the energy needs of buildings in different aspects. Literature [21] initiated a literature review on the optimization of solving renewable energy problems in green building rating systems, reconstructing the variables in renewable energy optimization and implementing an appropriate CA. by proposing a framework consisting of renewable energy optimization, green innovations, and CA, it links it to recent reviews of optimization in the literature. Literature [22] emphasizes the importance of energy savings through building energy efficiency by describing several key aspects of building energy efficiency and exploring the economic and environmental impacts of these aspects. Buildings with integrated renewable systems such as hot water heating and solar photovoltaic electrification are realized. Literature [23] developed a systematic analysis of HRES, stating that the system can transform a facility into a green building and reduce dependence on conventional energy sources by generating clean energy close to GHG emissions, and its effectiveness was verified in simulation experiments. Literature [24] compares these standards in terms of energy efficiency and indoor and outdoor environmental quality based on the introduction of the latest evaluation standards for green buildings in China, the United Kingdom and the United States. The characteristics of each standard system are outlined, and relevant suggestions are put forward to improve the green building evaluation standards in China. Literature [25] aims to develop a new methodology that will be used to design and analyze the effectiveness of RES-based energy supply strategies for green buildings, and determines the feasibility and advantages of the proposed hybrid system based on a comparison with conventional energy supply systems. Literature [26] applies a powerful reinforcement learning control methodology in order to minimize the energy and power losses in the distribution network, and this optimization is embodied in the BEEL system. The comparison shows the higher compliance of the proposed method compared to other methods. Literature [27] aims to examine the economic impact of increasing energy expenditure. Based on the results of economic scale theory and model simulation, it is concluded that there is a large gap between generation and use of energy and recommendations are made to increase energy production in order to reduce this gap. The urgency of investing in renewable energy projects is also emphasized.
In this paper, the green building renewable energy system such as photovoltaic and grid is constructed and the equipment models within the system such as fan coil unit, ground source heat pump unit, and battery are studied. Based on reinforcement learning algorithm, Deep Deterministic Policy Gradient (DDPG) algorithm is proposed to obtain more effective and attractive regulation strategies for green building renewable energy systems. Dealing with continuous state-space problems, the satisfying equations of PV power, user load, SOC and real-time tariffs are determined based on realistic physical constraints on the maximum and minimum values of charging and discharging power of the constrained batteries. The reward function settings are consistent with those of the DQN model and Q-learning model, taking into account the user’s comprehensive energy cost and battery operation. The winter months of January, November and December in Changsha, Hunan Province, China in 2021 are set as the simulation scenarios, and simulation experiments of regulation and optimization of green building renewable energy system are carried out after certain parameter settings are completed to test the actual regulation effect of the green building renewable energy system strategy based on DDPG algorithm proposed in this paper.
As countries around the world become increasingly concerned about environmental issues, the realization of cleaner energy is becoming a common goal for all countries.The use of renewable energy is one of the most important ways to achieve this goal. Currently, the carbon emissions associated with the energy consumption of the building industry account for about 22% of the global total, and the development of renewable energy sources such as photovoltaic (PV) and wind power to meet the growing demand for building energy consumption is an important way to realize the dual-carbon goal.
The renewable energy system for green buildings constructed in this chapter includes the power supply side, consisting of photovoltaic panels, wind turbines, and power grids.The demand side consists of ground source heat pumps, fan coil units (FCUs), and various appliances.The system uses batteries as energy storage units. The GSHP is connected to the water tank to ensure a steady supply of heat and cold.
Next, the modeling of the equipment within the system will be further discussed.
The system constructed in this chapter controls the room temperature primarily by varying the amount of cooling/heating provided by the FCU.The amount of cooling/heating provided by the FCU (
Based on
Where
The power consumption of FCU at each stage can be calculated according to the set airflow at each moment and the rated airflow and rated power of FCU:
Where
When the water in the tank enters the GSHP, the compressor starts when the water temperature is below the heat pump temperature set point (set to 40°C in this study). The water supply temperature to the GSHP can be calculated according to equation (4):
Where
The GSHP power consumption can be further calculated based on the coefficient of performance (COP).The COP formula is obtained by fitting based on the actual operation data, as shown in equation (5).
Where
Where
Based on the COP and heating power of GSHP, the power consumption of GSHP can be calculated using equation (7):
Where
In this study, the capacity state of the battery is represented by the State of Charge (SOC). This model divides a day into
Where
The charging and discharging process of the battery causes a loss in the capacity of the battery. When the charging and discharging cycle depth of the battery is ΔSOC, the maximum number of charging and discharging cycles before failure
In Equation (9),
Renewable energy generation data are actual collection data. The installed capacity of the wind turbine is 3kW, the installed capacity of the photovoltaic is 3.9kW, and the time step of the power generation data is in minutes.
Data-driven reinforcement learning algorithms are widely used in the field of energy system regulation for their strong self-adaptation and low requirements on model accuracy. In this chapter, based on the reinforcement learning algorithm, we will obtain a more effective and attractive regulation strategy for green building renewable energy system, realize the mutual coordination of green building renewable energy in different time periods to enhance the level of building renewable energy consumption, improve the balance between supply and demand of the energy system, and maintain the stability of the power grid.
Reinforcement Learning (RL) is a machine learning approach to artificial intelligence that involves creating problem computer programs that can solve problems that require intelligence.RL is unique in that it learns by trial and error and learns from feedback that is continuous, evaluative, and meter-sampled through the use of powerful nonlinear function approximations. This means that the RL program learns how to perform tasks or solve problems better by repeated trial and error. Reinforcement learning has a wide range of applications in many fields due to its strong generalizability.
Reinforcement learning is actually the process by which an intelligent body learns to make optimal decisions while interacting with its environment. The rules on how to choose actions based on states and rewards are called strategies
Markov Decision Process (MDP) is mainly based on Markov Process and Dynamic Programming Theory, which provides a mathematical framework for sequential decision making to represent and deal with decision problems with uncertainty and delayed feedback [28]. In general, a Markov decision process consists of the following five tuples.
An intelligent body interacts with its environment (MDP) at discrete time steps by executing strategy
The learning objective of reinforcement learning in a Markov decision process is to find the optimal policy
In a Markov decision process, each action lasts for one discrete time step, whereas in a semi-Markov decision process, each time-abstracted action lasts for a number of finite time steps. Equipped with a set of options in a Markov decision process becomes a semi-Markov decision process, while the optimal option value function on the included options is used to select the optimal option to execute the option internal policy. The formal definition of the option value function
The above equation represents an estimate of the long-term cumulative return obtained by following strategy
Q-Learning is a value-based algorithm in reinforcement learning algorithms, Q-Learning refers to the expected gain that can be obtained by taking an action in a state at a certain moment in time, and the environment will feed back the corresponding reward value according to the action of the intelligent body, the main idea of the algorithm is to construct the gain Q of the State and the Action into a Q-table to be stored, and then according to the Q-value to select the action that can get the biggest gain. In the process of updating the Q-function, Q-Learning usually adopts ò–
The Deep Q-Network (DQN) algorithm is an approach that incorporates Deep Neural Networks (DNN) and Q-Leaming. The target network mechanism refers to the use of two neural networks with the same structure but different parameters in DQN, where the Q-eval network has the most recent parameters, while the Qtarget network uses parameters that are some time old [30]. The Q-function of the deep neural network parameterization is denoted as
Outputting actions directly based on the current state leads to another very important algorithm in reinforcement learning, namely the policy gradient and parameterizing
The strategy constraints of the trust domain strategy optimization algorithm are defective in the way of implementation, in order to further improve the efficiency of strategy update, the new algorithm Proximal Policy Optimization Algorithm (PPO) proposes a new way of constraining the difference between the old and the new strategies in the process of strategy updating, and eliminates the effect of the drastic change of the strategies by cropping the strategy updating amplitude, and its strategy gradient updating method is as follows:
Where
Actor-Critic based reinforcement learning algorithms, as another class of policy-based reinforcement learning algorithms, integrate the value function based reinforcement learning methods and policy-based reinforcement learning methods by using Actor to directly optimize the parameters of the policy while also using Critic estimation of the value function
Unlike the stochastic policy gradient algorithm which outputs policies as probability distributions over the action space, the deterministic policy gradient algorithm (DPG) optimizes the deterministic policy
The output of the deterministic policy gradient algorithm is a deterministic action, so it performs better on problems in the continuous action space, and is well able to solve continuous control problems such as robotics.
Photovoltaic power generation cuts the peak load of zero energy residential daytime, the load shows a “duck curve” pattern, the user load peak shifts to 3:00~6:00 and 19:00~22:00, i.e., the heat pump operation time and heating power demand concentration time. Therefore, to determine the basic rules of the battery storage system control, that is, in the peak 3:00 ~ 6:00 and 19:00 ~ 22:00 hours of the battery according to the load demand discharge, in the 6:00 ~ 19:00 hours of the time to use photovoltaic power charging in the other hours of the battery is in an idle state, does not take place in the charging or discharging action.
Prior to the proposal of DQN, it was widely recognized in the academic community that it was more difficult to use value function approximation, and the proposal of empirical playback regions and dual networks was a great innovation, and the Deep Deterministic Policy Gradient Algorithm (DDPG) continues the structure of empirical playback regions and dual networks in the DQN algorithm [31]. Thus, the DDPG algorithm consists of four main convolutional neural networks: an actor network, a critic network, a target actor network, and a target critic network. Among them, the actor network mainly outputs a determined action
Where,
1≤ γ - The discount factor.
The DDPG algorithm is able to deal with continuous state space problems, therefore, only the state space S constituents need to be specified. There is no need to make specific constraints on each constituent element, and according to the realistic physical constraints, it can be seen that the PV power, customer load, SOC and real-time tariffs satisfy Eqs. (26)-(28), respectively:
In the DDPG model, the current state
The purpose of this paper is to examine the benefits and drawbacks of various reinforcement learning algorithms for ZEH energy system management, ultimately leading to a more efficient and appealing regulation strategy. The setting of the reward function in the DDPG model mainly takes into account the user’s comprehensive energy cost and the operation of the battery, which is consistent with that of the reward function in the DQN model and the Q-learning model, so that it is easy to compare and evaluate the regulation effect in a specific In the DDPG model, the reward function setting mainly considers the user’s comprehensive energy consumption cost and battery operation, and is consistent with the reward function setting in the DQN model and Q-learning model, which facilitates the comparison and evaluation of the regulation effect in the specific case analysis.
In this chapter, the effectiveness of the proposed regulation strategy for renewable energy systems in green buildings will be evaluated.
For the optimization of the regulation strategy of green building renewable energy system under winter conditions, the corresponding scenarios in the simulation experiments in this chapter are January, November and December winter in Changsha City, Hunan Province, China in 2021. The meteorological data of Changsha city is simulated, and the area of PV panels is set to 40m2, and the PV power generation in winter is calculated according to the meteorological data as shown in Fig. 1. The model of the energy storage device is 6-GFMJ-200, the capacity is 7kW·h, the charging/discharging efficiency is 90%, the charging/discharging power is set to 1.44kW, and the maximum/minimum value of the charging state is 0.9/0.2.

Photovoltaic power generation in winter
In order to consider the influence of real-time electricity price on the system strategy, the real-time winter electricity price data under similar climatic conditions in the Australian energy website were used, and the peak and trough electricity prices were set at 0.8 and 0.3 yuan/(kW·h), respectively, and the surplus electricity feed-in tariff was 0.4536 yuan/(kW·h). The parameters related to the electric heat pump and the building are shown in Table 1, and the upper and lower limits of the indoor comfort temperature are set to 21°C and 17°C, respectively.
Related parameters
Window size/m2 | Building heat capacity/(J·K-1) | Thermal resistance between indoor and environment/(K·W–1) | Ashp heating power/kW |
---|---|---|---|
10 | 7453000 | 5.28×10–3 | 2 |
For the green building renewable energy system in this paper, the following control strategy is proposed as a benchmark model. The electric heat pump and the energy storage system control the system’s operation by adjusting the operating power and charging/discharging state, respectively. The operating power of the heat pump is determined according to the current indoor temperature and electricity price. The charging and discharging states are within the specified battery charging range, which is determined according to the current electricity price and PV power generation. Thus, it can be seen that the advantage of the benchmark model is to be able to give a definite control strategy according to the current environmental parameters, and timely dynamic adjustment to cope with environmental changes, so as to meet the user’s comfort and economic needs. The specific control strategies are shown in Table 2.
Operation strategy of heat pump and energy storage
Operation strategy of heat pump | |||
---|---|---|---|
Room temperature |
Electrovalence |
||
0.35< |
|||
Low | 1 | 1 | 0.75 |
Medium | 0.75 | 0.75 | 0.5 |
High | 0.25 | 0 | 0 |
Operation strategy of energy storage | |||
Photovoltaic power generation |
Electrovalence |
||
0.35< |
|||
Low | -1 | -1 | -1 |
Medium | 1 | 0 | 1 |
High | 1 | 0 | 1 |
The minimum optimization step is set to be 15 min, and the optimization period is set to be 31 days, i.e., there are 2976 optimization periods in the period. The training process of the system model uses the November and December datasets, and 1500 rounds of training are conducted, and the data of 31 consecutive days are randomly selected during the training. The January dataset is used for the performance validation of the DDPG algorithm, in which the Q-network and the target network each contain three fully connected hidden layers with 128, 256, and 256 neurons, respectively, and rectified linear units are used as the activation function of the hidden layers, and the Adam optimizer is used to update the network weights. The main hyperparameters include the learning rate (0.0001), discount factor (0.99), minimum batch size of 32, and network update rate of 0.002.
In order to evaluate the performance of the Deep Deterministic Policy Gradient (DDPG) algorithm in the proposed green building renewable energy system regulation strategy in this paper, the following three system regulation schemes are used for performance comparison.
1) Scheme I. This scheme does not use any energy storage system, but uses an ON/OFF strategy to control the heat input power to satisfy heating demands.
2) Scheme II. The electrical and thermal energy flows are scheduled without coordination in this scheme.
3) Scheme III. This scheme assumes that all energy storage systems have been dispatched with all uncertainty parameter information obtained. Although it is difficult to obtain such information due to the stochastic nature of the uncertainty parameters, this scheme can provide a lower bound on the performance of the proposed algorithm and is used here only as a reference for optimal performance.
The convergence process of the DDPG algorithm in this paper is specifically shown in Fig. 2. Due to the existence of exploration probability and stochastic system parameters, the reward obtained in each segment will fluctuate within a certain range. In order to show the trend of segment rewards more clearly, we took the average rewards of the previous 200 segments in each segment as the vertical coordinate to draw the average reward curve, i.e., the orange curve in the figure. As the number of clips increases, the average reward gradually increases and becomes more and more stable, which indicates the good convergence of the proposed algorithm.

The convergence process of DDPG algorithm
The operational effectiveness of each system regulation scheme is shown specifically in Figure 3. Figure (a) shows the operating cost of each scheme, while Figure (b) shows the change of indoor temperature under the operation of each scheme. From the figure, it can be seen that the DDPG algorithm in this paper can realize lower operating costs compared with scheme I and scheme II. Specifically, the DDPG algorithm can reduce the operating cost by 24.63% and 5.07% compared to Scenarios I and II, respectively. Although the proposed algorithm has a higher running cost compared to Scheme III, the relative difference between the two is less than 8.05%. Since it is not practical to obtain the perfect information assumed in Scheme III, the system regulation strategy proposed in this paper has the best utility and possesses near-optimal performance without the need to know the explicit building thermal dynamics model. In addition, the DDPG algorithm achieves smaller temperature deviations compared to Scenarios I and II, based on the variation of indoor temperature in Fig.

The operation effectiveness of each system control scheme
In order to further illustrate the effectiveness of the DDPG algorithm, more simulation results are given next in this paper. In this paper, the electricity price curve under the system regulation strategy based on the DDPG algorithm with the PV storage system energy storage level and the electric energy storage system energy storage level is specifically shown in Fig. 4, where the PV storage system and the electric energy storage system respond to the fluctuation of the electricity price and operate dynamically. Figure (a) is the electricity price curve, while Figures (b) and (c) show the energy storage level of PV storage system and the energy storage level of electric energy storage system, respectively. Specifically, when the electricity price is high, the PV storage system and the electrical energy storage system work in the discharge mode, while when the electricity price is low, they work in the charge mode. Therefore, the DDPG algorithm can utilize the dynamic operation of the PV storage system and the electrical energy storage system to reduce the operating cost of the multi-energy system built in the building that includes PV.

Simulation results of system regulation
The thermal energy supply and thermal energy storage system energy storage level situation under the system regulation strategy based on the DDPG algorithm in this paper is specifically shown in Fig. 5. Combined with the case of the energy storage level of the PV storage system above, when the energy storage level of the PV storage system decreases, the level of the thermal energy supply used to satisfy the heating demand in the DDPG algorithm and the energy storage in the thermal energy storage system will increase. As an example, for the case when the number of time slots is 150, the thermal energy supply output of the DDPG algorithm decreases to nearly 0 kWh, but the energy storage level in the thermal energy storage system shows a significant increase to about 21.2 kWh. This suggests that the thermal energy required for space heating is primarily supplied by PV, which effectively reduces the reliance on natural gas boilers for the building’s multi-energy system that includes PV. On the contrary, the thermal energy used to satisfy the heat demand in Scenarios I and II cannot change with the PV output. Therefore, the DDPG algorithm can achieve coordinated operation between the electrical energy flow and the thermal energy flow.

Heat supply and Thermal storage system energy storage level
The robustness of the DDPG algorithm for the three cases of 0.9, 1.8, and 2.4°F are considered as shown specifically in Table 3. The ATV in the table denotes the average temperature deviation. It can be seen that the operating costs of the DDPG algorithm for the 0.9, 1.8, and 2.4°F cases are 2181.3 yuan, 2284.4 yuan, and 2284 yuan, respectively, while the average temperature deviation reaches 0.003°C, 0.005°C, and 0.075°C, respectively. Compared with Scenarios I and II, the DDPG algorithm proposed in this paper achieves lower operating costs as well as smaller temperature deviations.Compared to scheme III, the DDPG algorithm can sometimes trade off smaller temperature deviations for lower operating costs.
Robustness of DDPG
Scheme | Operating cost(RMB) | ATV(°C) | ||||
---|---|---|---|---|---|---|
0.9°F | 1.8°F | 2.4°F | 0.9°F | 1.8°F | 2.4°F | |
Scheme 1 | 3023 | 3034 | 3029 | 0.182 | 0.186 | 0.235 |
Scheme 2 | 2392 | 2406 | 2407 | 0.182 | 0.186 | 0.235 |
Scheme 3 | 2204.34 | 2263.91 | - | 0 | 0 | - |
DDPG | 2181.3 | 2284.4 | 2284 | 0.003 | 0.005 | 0.075 |
Facing the industry trend and demand of green building popularization and energy cleanliness in the construction industry, this paper sets up a green building renewable energy system and proposes a system regulation strategy based on DDPG algorithm based on reinforcement learning algorithm to provide ways and means for fine management and real-time control of green buildings. In order to test the regulation effect of the green building renewable energy system strategy based on DDPG algorithm proposed in this paper, simulation experimental research is carried out with the winter months of January, November and December in Changsha, Hunan Province, China in 2021 as a simulation scenario. Scenario 1, which is controlled only by an ON/OFF strategy, Scenario 2, which is scheduled without coordination, and Scenario 3, which is scheduled after all uncertain parameter information has been acquired, are set as comparison objects.With the increase in the number of segments, the average reward of the DDPG algorithm gradually increases and gradually stabilizes, demonstrating good convergence. As for the effectiveness performance, the DDPG algorithm in this paper reduces the running cost by 24.63% and 5.07% comparing with Scheme I and Scheme II respectively, and the relative difference with Scheme III, which assumes perfect information and is impractical, is only less than 8.05%, and achieves a smaller temperature deviation. In addition, the DDPG algorithm can reduce the operating cost of the building multi-energy system including PV with the help of the dynamic operation of the PV storage system and the electrical energy storage system, and realize the coordinated operation between the electrical energy flow and the thermal energy flow. The robustness of this paper’s regulation strategy is considered in three degrees Fahrenheit cases, 0.9, 1.8, and 2.4°F. Compared with Scenarios I and II, the DDPG algorithm in this paper is still able to maintain the lowest operating cost, which is 2,181.3 yuan, 2,284.4 yuan, and 2,284 yuan for the 0.9, 1.8, and 2.4°F cases, respectively, and can sometimes trade a smaller temperature deviation for a lower operating cost compared with Scenario III.