Open Access

Application of Monte Carlo-based predictive stochastic model for energy efficiency retrofit in building clusters

 and   
Sep 26, 2025

Cite
Download Cover

Introduction

With the increasing global energy consumption and the growing seriousness of environmental problems, the energy-saving renovation of building complexes has become a very important topic in today’s society [1-2]. Building stock is one of the main sources of energy consumption, so it is important to take energy-saving measures to reduce energy consumption, reduce environmental pollution and improve economic efficiency [3-6]. The energy-saving renovation of the building stock refers to the renovation of the existing building stock with various technical means and measures to reduce energy consumption and improve the efficiency of energy utilization, so as to achieve the purpose of saving energy, protecting the environment and reducing emissions [7-10]. The energy-saving renovation of building groups can reduce energy consumption and environmental pollution, which is of great significance for sustainable development. Energy efficiency goals in public buildings can be achieved by improving building insulation, optimizing daylighting systems, improving air-conditioning systems, utilizing renewable energy sources, improving the energy efficiency of equipment and appliances, and conducting energy monitoring and management [11-14]. This can not only provide a more comfortable environment for the public, but also contribute to the sustainable development of society and environment. Therefore, governments at all levels and relevant departments should increase the support for energy-saving retrofit of building clusters and promote its popularization and application in the whole society [15-18]. In the future, energy-saving retrofit of building groups will become an important direction for the development of the construction industry and promote the development of the construction industry in the direction of green and sustainable [19-20]. With the application and development of new energy-saving technologies, the effect of energy-saving retrofitting of building complexes will be further improved to make a greater contribution to the sustainable development of the construction industry [21-23].

Literature [24] specified that the energy consumption of the built environment has a serious impact on the ecological impact of the world, and introduced the concept of near-zero-energy buildings, and reviewed case studies of energy-saving and environmentally friendly methods using the example of the retrofitting and reuse of the energy-saving demonstration building, the Atika building. Literature [25] presents a scientific methodology for determining the priorities of strategic investments, showing that the use of efficient materials contributed to the reduction of energy and material intensity of the renovation of the project, as well as the reduction of construction time, labor costs and energy consumption during construction. Literature [26] used a diversified policy segmentation approach to critically analyze representative initiatives in pilot cities, and summarized and analyzed barriers to building energy efficiency retrofits from government, business, and market perspectives to propose targeted strategies. Literature [27] analyzes and evaluates energy retrofits of buildings aiming to improve energy performance, revealing that retrofit programs including LED lights, high-efficiency equipment, and ventilation system upgrades are beneficial for energy savings. Literature [28] analyzed sustainable building retrofits based on a literature review, emphasized sustainable building retrofits as an important way to address energy and environmental issues, revealed that current research in this field focuses on retrofitting building performance, modeling, and energy efficiency, and made recommendations for further research. Literature [29] points out the limited current research related to the project delivery process to achieve energy efficiency retrofits and fills the knowledge gap in this area through qualitative research, emphasizing the importance of investing in human and technological solutions to achieve energy efficiency through building retrofits. Literature [30] explored the retrofitting of older buildings and indicated that vertical extensions support the energy efficiency retrofitting of buildings, while the combination of low energy consumption and vertical extensions had the highest return on investment and the lowest environmental impact. Literature [31] introduced mixed-integer linear programming models for the design of energy retrofits in buildings and considered the application of energy supply systems and energy efficiency measures aimed at reducing the building’s demand in retrofits approaching the zero-energy building standard, providing design choices and operational strategies to meet the building’s retrofit. Literature [32] illustrates that the incorporation of energy efficiency measures (EEMs) in single-family home retrofits can unlock the potential for energy efficiency to combat climate change and explores the impact of factors that have been found to influence ER implementation. Literature [33] proposes a computational model based on the TNM decree to calculate and analyze the energy consumption of winter heating and domestic hot water supply before and after retrofitting old buildings, showing that the thermal insulation performance of the building envelope is a key factor affecting the energy consumption of old urban buildings. Literature [34] analyzed the application of DSFs with integrated solar active systems, aiming to find the best solution for energy efficiency retrofitting of building stock. It was revealed that the use of DSF in multi-storey existing buildings is sustainable because not only does it save energy, but it also does not require additional insulation in the building.

In this study, the concept of building energy consumption information model is firstly proposed, and its main characteristics and the categories of feature parameters included are summarized. In order to reduce the difficulty of data collection and guarantee the quality of data, the classification of feature parameters and the quality assurance of the collection process are introduced. Then the related techniques used are described in detail, and the Bayesian statistical inference theory is introduced into the calibration estimation method, and the sample entry probability is regarded as an unknown parameter. Based on the idea of Bayesian statistical inference to set the prior distribution of the unknown parameter, the posterior distribution of the parameter is obtained through the derivation of the Bayesian formula, and the posterior sample is simulated from the sample observations and the prior distribution of the parameter using the Gibbs sampling method of the MCMC algorithm. Based on the statistical data of a city, the Monte Carlo method is used to simulate the whole life cycle cost of eco-homes and non-eco-homes, to obtain the expectation and confidence interval of the cost, and to calculate the average building energy efficiency payback period of eco-homes. Finally, the predictive stochastic model proposed in this paper is applied to carry out a predictive validation experiment for energy efficiency retrofit of building stock.

Building energy information model
Building Energy Consumption Information Model

Building Energy Information Model (BEIM) is an information model consisting of data parameters related to building energy consumption with standardized data formats and data relationships. These parameters include real parameters determined in the design phase and those determined in the operation phase. It specifies both the collected parameters characterizing the building and the forms for interpretation and investigation of the parameters. Based on the building energy consumption information model, a large number of building energy consumption related information can be collected to form an energy consumption information database. The components of the building energy information model are shown in Fig. 1. In order to better describe the building energy information model, this study will analyze the building energy information model proposed in this study in terms of design parameters, operation parameters, data format and collection [35]. Characterization parameters are used in this study to describe a building. These characteristic parameters can be categorized into design parameters and operation parameters according to the stage they are in. For each characteristic parameter, this study needs to give their definitions, which focuses on the operability of the collected data. For operational parameters, this study focuses on human behavior, building operation mode, operational energy consumption and user evaluation. For data format, this study focuses on adapting to the needs of building diversity and data analysis. For data collection, this study emphasizes the grading of characteristic parameters to reduce the difficulty of data collection.

Figure 1.

The composition of the building energy information model

Data collection
Hierarchy of data

In reality, building energy consumption data is characterized by a large amount of information and complexity. Similar to an organism, objects in a building, especially those related to people, are always complex and variable. In addition, there are many parameters related to building energy consumption, for example, in order to conduct a detailed energy simulation, engineers need to enter 2000~4000 parameters. To summarize, it is time-consuming and laborious to collect all the information of a building in a complete manner.

The hierarchical characterization of the parameters is an effective means to reduce the difficulty of collection. Grading data is also a common methodology used in building energy audits, such as the ASHRAE Energy Audit Standard, which categorizes energy audits into three levels. The first level (Level 1) is the most basic review record, requiring a simple survey of the building to analyze low-cost energy efficiency measures in the building and to identify what additional analysis is needed in the future. Level 2 is an intermediate level of review that requires a more detailed study of the building, to itemize energy consumption, analyze the energy savings potential and cost of all energy efficiency measures, and identify elements that need to be analyzed further. The third level (Level 3) is the most demanding audit, requiring a focused analysis of energy efficiency measures on buildings with large investments, more detailed on-site research, and a more accurate analysis process.

Quality assurance of data

High-quality data are essential for subsequent high-quality data analysis. There are a variety of factors that can affect the quality of the data collected, notably:

Lack of complete historical information on design and operation. For historic buildings, design information may have been discarded. For past operational information, building owners may lack a tradition of collecting, organizing and managing energy consumption information. Limited knowledge and practice of data collectors. Different results may be obtained when conducting data collection due to different conceptual understanding of the collectors.

High-quality data is the basis for data-driven building energy analysis. Poor communication between the building design team, energy analysis engineers, and the owner’s team may lead to degradation of data quality when performing building energy analysis.

Building design data

Design data refers to characteristic parameters that are determined at the design stage, and from a macroscopic point of view is data information that does not change over time. For example, the construction information of a building façade is specific and its physical properties remain unchanged, and although the energy savings of the façade may be attenuated over time, these attenuations are considered to be minor and have little impact on the overall energy consumption of the building.

Building operational data

There are a large number of parameters related to energy consumption during the operational phase of a building, and they often vary over time, many of which are related to human behavior. Human behavior has a significant impact on the energy consumption of a building, and there is a wealth of research in this area. In the field of building energy analysis, research on human behavior has focused on behavior monitoring and data collection, modeling of human behavior models, and integration of human behavior models into existing building energy simulation software. The traditional building energy simulation process uses fixed standard schedule information, however, human behavior is complex and variable, different regions, different types of buildings, and different people behave differently at different times, so the use of fixed schedules cannot effectively describe human behavior.

The building being used is always generating operation data, which can be divided into those generated by human behavior and those generated passively because of human behavior; the former includes the usage pattern of lamps and plug-in devices, room temperature setting and air-conditioning system setting, and the latter includes the data of electricity usage and indoor temperature change. The operating patterns of a building can be obtained in two ways, either through passive observation or by using sensors. From the frequency the operating data changes in minutes steps.

Data format and storage

Data needs to have a certain format to be stored, transmitted, displayed, and exchanged. Tim Berners-Lee proposed a systematic approach to evaluating the format of data on the Internet by categorizing the format of data into five levels. Level 1 data can be any data format in which information can be displayed on the Internet, using open source protocols. Tier 2 data uses structured data, such as Excel spreadsheets. Level 3 data uses non-proprietary data formats. Level 4 data uses URLs (resource locators) to link to data so that users can locate this data from other locations. Level 5 data can be linked to specific occurrences of the environment.

The quality of the data within the database has a very important impact on the results of the analysis. The quality of the data refers to whether the data meets the requirements of a number of criteria. The literature lists six data criteria for data: completeness, uniqueness, currency, validity, accuracy, and continuity. Some simple validation measures can be set up to ensure the validity of data input.

Monte Carlo-based predictive stochastic models
Basic Theory
Introduction to Markov chain theory

A Markov chain is a mathematical system that undergoes transitions from one state to another according to certain probability rules. In Eq. Let x0, x1, x2, …, xn be a sequence of random variables taking values in state space {S1,S2,,Sn}$$\left\{ {{S_1},{S_2}, \ldots ,{S_n}} \right\}$$ and the sequence be a Markov chain or Markov process. Pointing out that the sequence of rocks conforms to the characteristics of a Markov chain, i.e., the state at a given moment in a state space is only related to the previous moment T0 and not to the state before T0, the conditional probability of the variable is expressed as the following equation p(xi+1|x1,x2,,xi1,xi)=p(xi+1|xi)$$p\left( {{x_{i + 1}}|{x_1},{x_2}, \ldots ,{x_{i - 1}},{x_i}} \right) = p\left( {{x_{i + 1}}|{x_i}} \right)$$. where xi+1 denotes the pixel point to be predicted and xi is the current pixel point, both belonging to a Markov chain X, and the states of xi are in the set of states S.

A transfer matrix is the set of probabilities of transitioning from one state to another, represented as a matrix. A Markov chain has a finite state space, then the transfer probabilities of all states can be arranged in a matrix in a single-step evolution to obtain the transfer matrix: Pn,n+1=(Pin,in+1)=[ p00 p01 p10 p11 ]$${P_{n,n + 1}} = \left( {{P_{{i_n},{i_{n + 1}}}}} \right) = \left[ {\begin{array}{*{20}{c}} {{p_{0 \to 0}}}&{ {p_{0 \to 1}}}& \cdots \\ {{p_{1 \to 0}}}&{ {p_{1 \to 1}}}& \cdots \\ \cdots & \cdots & \cdots \end{array}} \right]$$

Row in of the matrix represents the probability of Xn+1 taking all possible states at time Xn=Sin$${X_n} = {S_{{i_n}}}$$ (discrete distribution), while a single element pi,j of the matrix with a value of p(ji) represents the probability of changing from state i to state j, where each element is non-negative and the sum of the elements in each row is 1, i.e. pi,j ≥ 0, ipi,j=1$$\sum\limits_i {{p_{i,j}}} = 1$$.

Some non-Markovian processes can be constructed by expanding the notion of “present” and “future” states, which is called a second-order Markov process. By analogy, higher order Markov processes can also be constructed. For a rock image, the set of states is the set of all possible values for each pixel. In this paper, the rock is divided into two parts, skeleton and pore, and the CT scan image of the rock is binarized according to these two parts, where the state value of 0 is the skeleton, and 1 is the pore, so there are only 2 elements 0 and 1, i.e., S = {0, 1}, in the state set of the rock, and then there are only four state transfer probabilities in the rock image transfer matrix, i.e., 0 → 0, 0 → 1. 1 → 0, and 1 → 1 as shown in Eq. (2): P=[ p00 p01 p10 p11]$$P = \left[ {\begin{array}{*{20}{l}} {{p_{0 \to 0}}}&{ {p_{0 \to 1}}} \\ {{p_{1 \to 0}}}&{ {p_{1 \to 1}}} \end{array}} \right]$$

Let X(i) need to take s discrete state values and satisfy a stochastic process called Markov chain: p(x(i)|x(i1),,x1)=T(x(i)|x(i1))$$p\left( {{x^{(i)}}|{x^{(i - 1)}}, \ldots ,{x^1}} \right) = T\left( {{x^{(i)}}|{x^{(i - 1)}}} \right)$$

If there is no change in i and T, then equation (3) is called a chi-squared Markov chain, and the chain change in x is related to the constant transfer matrix. The following Markov chain contains 3 state values, assuming that the transfer matrix is: T=[ 0.25 0.5 0.25 0 0.13 0.87 0.2 0.05 0.75]$$T = \left[ {\begin{array}{*{20}{c}} {0.25}&{ 0.5}&{ 0.25} \\ 0&{ 0.13}&{ 0.87} \\ {0.2}&{ 0.05}&{ 0.75} \end{array}} \right]$$

The irreducible case of randomized transfer matrices ensures that the chain gets the simplest independent matrices. The irreducible case prevents Markov from falling into a loop. p(x) needs to be satisfied: p(x(i))T(x(i1)|x(i))=p(x(i1))T(x(i)|x(i1))$$p\left( {{x^{(i)}}} \right)T\left( {{x^{(i - 1)}}|{x^{(i)}}} \right) = p\left( {{x^{(i - 1)}}} \right)T\left( {{x^{(i)}}|{x^{(i - 1)}}} \right)$$

Summing the above equations yields: p(x(i))=x(t1)p(x(i1))T(x(i)|x(i1))$$p\left( {{x^{(i)}}} \right) = \sum\limits_{x(t - 1)} p \left( {{x^{(i - 1)}}} \right)T\left( {{x^{(i)}}|{x^{(i - 1)}}} \right)$$

Introduction to Monte Carlo Theory

Monte Carlo method is a numerical simulation method that takes probabilistic phenomena as the object of study, and is a computational method for inferring unknown characteristic quantities based on the sampling method of obtaining statistical values. Monte Carlo methods are usually more efficient than traditional numerical methods and are also known as multi-probability models. The basic idea is: in order to solve the problem, a probabilistic model or a stochastic process is established, so that its parameters or numerical characteristics are equal to the solution of the problem, and then the model or the stochastic process is computed through observation or sampling tests, which results in these parameters or numerical characteristics and finally gives an approximate result of the solution of the problem. The precision of the solution is expressed as the standard error of the estimate.

When faced with severe uncertainty when making predictions or estimates, some methods replace uncertain variables with a single mean, but Monte Carlo method simulations instead use multiple values and then average the results. The probability of changing the outcome cannot be determined due to random variables interfering. Monte Carlo simulation therefore focuses on constantly repeating random samples. Monte Carlo simulation takes variables with uncertainty and assigns a random value to them, then runs the model and provides results. This process is repeated again and again until the requirements are met, in other words the Monte Carlo method is about constant experimentation. Probability distributions are statistical functions that represent a set of values distributed between limits and are often used to predict a possible uncertain variable that may consist of discrete or continuous values. In the MCMC method, the probability distribution for Monte Carlo simulation is the transfer probability matrix of the Markov chain. For rock images, the value of the requested point of the image can be regarded as a random variable, and the value of the point is requested by sampling a large number of random samples of the point and estimating the value of the point by the value of these samples.

Markov chain Monte Carlo method

The traditional Monte Carlo method is no longer applicable when the random variables are multivariate or the probability density is in non-standard form. Monte Carlo methods require a large number of samples, whereas Markov chains can sample a large number of samples because a random walk is equivalent to a sample, and the combination of the two can be used to solve complex problems. The MCMC method can be used to sample from any probability distribution. Mostly, we use it to sample from tricky posterior distributions for inference. A two-dimensional image is essentially a matrix, where the height and width of the image are the rows and columns of the matrix, respectively [36]. In general, a binarized image is used to represent a core that includes only pore and skeleton information, i.e., rock pores and rock skeleton are represented by 1 and 0, respectively, so that each pixel point of this image matrix has only two state values, 1 and 0. The structural features of the image can be characterized by a probability distribution function. Let the number of pixels of a two-dimensional image be n, x=(x1,x2,,xn)$$x = \left( {{x_1},{x_2}, \ldots ,{x_n}} \right)$$ represents the value at that pixel, and xi can only be 0 (skeleton) or 1 (pore space) in a binarized core image. Based on the original image, reconstructing an image of a 2D porous medium with similar properties using the MCMC method theoretically requires obtaining a complete probability distribution function of x, i.e., p(x).

Bayesian Inference Modeling

The calibration estimation method based on Bayesian inference is mainly based on the Bayesian model to correct the sample entry probability θk (or the initial design weight dk) to realize the weighted estimation of the sample. Through random sampling to obtain the sample entry probability of each domain sample is θk, through the Bayesian model to obtain the calibration estimate θ^k$${\hat \theta_k}$$, at this time the new calibration weight is ωk=1/θ^k$${\omega_k} = 1/{\hat \theta_k}$$, through the weighting of the sample and then get the overall parameters of the overall estimate, such as the overall total value of Y, the overall mean value of Y¯$$\bar Y$$ [37].

From the Bayesian theoretical point of view, the probability of entry θk has some prior distribution π(θ), and the generation of samples X=(x1,x2,,xm)$$X = \left( {{x_1},{x_2}, \ldots ,{x_m}} \right)$$ proceeds in two steps:

First, a sample θ0 is assumed to be generated from the prior distribution π(θ). Second, a set of samples is generated from p(X|θ0)$$p\left( {X|{\theta_0}} \right)$$. Obtain the sample X=(x1,x2,,xm)$$X = \left( {{x_1},{x_2}, \ldots ,{x_m}} \right)$$ joint conditional probability function based on the formula. Synthesize the random variable θk with π(θ), and obtain the joint distribution h(X, θ) of sample X with parameter θk based on the formula, which combines the sample information, the overall information and the prior information [38]. At this point, Bayesian formula is utilized: π(θ|X)=h(X,θ)m(x)=p(X|θ)π(θ)Θp(X|θ)π(θ)dθ$$\pi (\theta |X) = \frac{{h(X,\theta )}}{{m(x)}} = \frac{{p(X|\theta )\pi (\theta )}}{{\int_\Theta p (X|\theta )\pi (\theta )d\theta }}$$

Model-based calibration estimator
A priori setting

In this paper, the prior distribution is selected using a stratified prior, which ensures that the established model can avoid over-reliance on the prior distribution to a certain extent and enhance the robustness of the model estimation. The overall population is divided into m layer according to a certain characteristic, and the overall value of each layer is set as Nk. A sample of capacity n is randomly selected from the target population s, in which the sample value of each layer is nk, and the probability of entry is θk. Considering the probability of entry θk as an unknown parameter, the unknown parameter θk is given in the form of a density function, i.e., θk~π1(θk|λ)$${\theta_k}\sim {\pi_1}\left( {{\theta_k}|\lambda } \right)$$ is used as the first layer of the a priori distribution, in which λ is the hyperparameter, and Λ is the range of the values. Secondly a second layer prior (super-prior) π2(λ) is given for the hyperparameter λ. The multilayer prior for the unknown parameter θk is expressed as: π(θk)=Λπ1(θk|λ)π2(λ)dλ$$\pi \left( {{\theta_k}} \right) = \int_\Lambda {{\pi_1}} \left( {{\theta_k}|\lambda } \right){\pi_2}(\lambda )d\lambda$$

The following Bayesian inference procedure is carried out based on π(θk)$$\pi \left( {{\theta_k}} \right)$$.

Assume that the prior distribution given to the parameter to be estimated θk is the gamma distribution of parameters α, β, i.e., θk ~ Gamma(α, β), k = 1, 2, …, m. Then the expression for the prior distribution of θk is: p(θk|α,β)=βαθkα1eθkΓ(α),θk>0$$p\left( {{\theta_k}|\alpha ,\beta } \right) = \frac{{{\beta^\alpha }\theta_k^{\alpha - 1}{e^{ - {\theta_k}}}}}{{\Gamma (\alpha )}},{\theta_k} > 0$$

where α and β are hyperparameters, but both are unknown. The second level a priori assumes respectively that hyperparameter α obeys an exponential distribution, i.e., α ~ Exponential(λ), and hyperparameter β obeys a Gamma distribution, i.e., β ~ Gamma(a, λ), as follows: p(α)=λeλa,α>0,λ>0 Scale Parameters p(β)=λaβa1eλβΓ(a),β>0,a>0 Shape parameters λ>0 Shape parameters$$\begin{array}{l} p(\alpha ) = \lambda {e^{\lambda a}},\alpha > 0,\lambda > 0\ {\text{Scale Parameters}} \\ p(\beta ) = \frac{{{\lambda^a}{\beta^{a - 1}}{e^{ - \lambda \beta }}}}{{\Gamma (a)}},\beta > 0,a > 0\ {\text{Shape parameters}} \\ \lambda > 0\ {\text{Shape parameters}} \\ \end{array}$$

A posteriori derivation

In Bayesian inference, the prior distributions of the parameters to be estimated are the basis for modeling, while the posterior distributions are the basis for statistical inference. Based on the prior distribution function of parameter θk and hyperparameters α, β, the joint posterior distribution function of parameter θk and hyperparameters α, β can be obtained: p(α,β,θk|data)=p(α)p(β)p(θk|α,β)p(nk|θk)$$p\left( {\alpha ,\beta ,{\theta_k}|data} \right) = p(\alpha )p(\beta )p\left( {{\theta_k}|\alpha ,\beta } \right)p\left( {{n_k}|{\theta_k}} \right)$$

When the hyperparameters α, β are fixed, the provincial entry probabilities θk are independent of each other and the joint posterior distribution has covariance, i.e., the posterior distribution function of the entry probability θk is Gamma(nk+α,Nk+β)$$Gamma\left( {{n_k} + \alpha ,{N_k} + \beta } \right)$$. Therefore, the joint posterior distribution function of the entry probability θk is: p(θk|α,β, data)=i=1m(Nk+β)nk+αθknk+α1e(Nk+β)θkΓ(nk+α)$$p\left( {{\theta_k}|\alpha ,\beta ,{\text{ }}data} \right) = \prod\limits_{i = 1}^m {\frac{{{{\left( {{N_k} + \beta } \right)}^{{n_k} + \alpha }}\theta_k^{{n_k} + \alpha - 1}{e^{ - \left( {{N_k} + \beta } \right){\theta_k}}}}}{{\Gamma \left( {{n_k} + \alpha } \right)}}}$$

Gibbs simulation sampling

Suppose that a random sequence θn of parameter θ is obtained by the MCMC algorithm, and when n is large enough, θn+1, θn+2, θn+3⋯ is considered to be a sample from the posterior distribution. This sample discards the first θ1, θ2, …, θn−1 cells of the sequence to eliminate the effect of initial values on the results. The most commonly used MCMC methods are the Metropolis-Hastings algorithm (H-T algorithm) and Gibbs sampling. MH The algorithm is relatively simple and effective, but it is too demanding for density function selection, and its effectiveness relies on the premise that the selected density function should be close to the true posterior distribution. In contrast, Gibbs sampling has better simulation performance and specifies the conditional distribution at the time of sampling for each element of the parameter vector.

Suppose that the stochastic processes {Xk,kT}$$\left\{ {{X_k},k \in T} \right\}$$, T = {0, 1, 2, …}, corresponding to Xk take the value {x0,x1,x2,}$$\left\{ {{x_0},{x_1},{x_2}, \ldots } \right\}$$ if for any kT the conditional probabilities are satisfied: p{Xk=xk|X0=x0,X1=x1,,XT1=xT1} =p{Xk=xk|XT1=xT1}$$\begin{array}{l} p\left\{ {{X_k} = {x_k}|{X_0} = {x_0},{X_1} = {x_1}, \ldots ,{X_{T - 1}} = {x_{T - 1}}} \right\} \\ = p\left\{ {{X_k} = {x_k}|{X_{T - 1}} = {x_{T - 1}}} \right\} \\ \end{array}$$

Then {Xk,kT}$$\left\{ {{X_k},k \in T} \right\}$$ is called a Markov chain.

Thus it can be obtained: p{X0=x0,X1=x1,,XT=xT} =p{X0=x0}i=1Tp{Xi=xi|Xi1=xi1}$$\begin{array}{l} p\left\{ {{X_0} = {x_0},{X_1} = {x_1}, \ldots ,{X_T} = {x_T}} \right\} \\ = p\left\{ {{X_0} = {x_0}} \right\}\prod\limits_{i = 1}^T p \left\{ {{X_i} = {x_i}|{X_{i - 1}} = {x_{i - 1}}} \right\} \\ \end{array}$$

Gibbs sampling also draws samples in a Markov chain, when the number of simulations is sufficiently high, the sampling distribution eventually converges to the joint distribution, which is suitable for occasions where the target distribution is multidimensional, as exemplified by the sampling of the a posteriori distribution f(θx), where θ=(θ1,θ2,,θk)$$\theta = \left( {{\theta_1},{\theta_2}, \ldots ,{\theta_k}} \right)$$. Gibbs sampling of parameter θ=(θ1,θ2,,θq)$$\theta = \left( {{\theta_1},{\theta_2}, \ldots ,{\theta_q}} \right)$$ is shown in Fig. 2.

Figure 2.

Gibbs sampling

Gibbs sampling is performed based on the posterior distribution p(θk|α,β,data)$$p\left( {{\theta_k}|\alpha ,\beta ,data} \right)$$ of parameter θk to simulate the generation of a martesian chain θk(0),θk(1),$$\theta_k^{(0)},\theta_k^{(1)}, \ldots \ldots$$ about θk. Gibbs sampling is a type of MCMC algorithm that involves computing the marginal, conditional, and joint distributions of a random variable θ1, θ2, …, θn. Next, the complete conditional distribution is sampled such that the sequentially drawn samples form a smooth martensian chain. Assume that the joint density probability function of θ1, θ1, …, θn depends only on the conditional density function f(θkθk),θk=(θk,,θk1,θk+1,θn)T$$f\left( {{\theta_k}{\theta_{ - k}}} \right),{\theta_{ - k}} = {\left( {{\theta_k}, \ldots ,{\theta_{k - 1}},{\theta_{k + 1}},{\theta_n}} \right)^T}$$ given θk under θk. Gibbs sampling entails generating samples from the conditional density function f(θkθk)$$f\left( {{\theta_k}{\theta_{ - k}}} \right)$$. First, a set of random numbers θ1(0),θ2(0),,θn(0)$$\theta_1^{(0)},\theta_2^{(0)}, \ldots ,\theta_n^{(0)}$$ is chosen as initial values, which in turn can be generated θ1(1)$$\theta_1^{(1)}$$ from the conditional density function f(θ1|θ1(0))$$f\left( {{\theta_1}|\theta_{ - 1}^{(0)}} \right)$$. By analogy, θ2(1),θ3(1),,θn(1)$$\theta_2^{(1)},\theta_3^{(1)}, \ldots ,\theta_n^{(1)}$$ is generated sequentially from f(θ2θ1(1),θ3(0),,θn(0)),,f(θnθn(1))$$f\left( {{\theta_2}\theta_1^{(1)},\theta_3^{(0)}, \ldots ,\theta_n^{(0)}} \right), \ldots ,f\left( {{\theta_n}\theta_{ - n}^{(1)}} \right)$$, respectively. Thus, the first set of iterated values θ1(1),θ2(1),,θn(1)$$\theta_1^{(1)},\theta_2^{(1)}, \ldots ,\theta_n^{(1)}$$ is obtained, and after t iterations of this kind, the distribution of (θ1(t),θ2(t),,θn(t))$$\left( {\theta_1^{(t)},\theta_2^{(t)}, \ldots ,\theta_n^{(t)}} \right)$$ converges to the distribution of (θ1,θ2,,θn)$$\left( {{\theta_1},{\theta_2}, \ldots ,{\theta_n}} \right)$$ under certain conditions.

The prior distributions based on parameter θk and hyperparameters α, β are represented as directed relationship graphs by Bayesian graphical modeling method, called Doodle model at WinBUGS, consisting of graphs such as nodes, arrows, and flat panels, where each parameter, the overall value, and the sample observations are regarded as model nodes, connected by arrows. Where α is represented by alpha, β by beta, θk by theta[k]$$theta\left[ k \right]$$, the scale parameter λ by lambda[k]$$lambda\left[ k \right]$$, and the overall values Nk and sample observations nk of each stratum are represented by NN[k] and nn[k]. Parameter estimates are generated sequentially from the conditional distributions that match each parameter by setting initial values for each parameter. The Doodle model is shown in Figure 3.

Figure 3.

Doodle model

Step 1: Build and test the Doodle model

A Doodle model is built based on the distribution of each parameter α, β, θk, and λ and ensures that each node, arrow, and plate passes the model test.

Step 2: Data Loading, Model Integration and Initial Value Generation and Assignment

The overall and sample values Nk and nk of each stratum of the variable are defined as arrays and loaded into the model and integrated, and the initial value of the parameter (θ1(0),θ2(0),,θm(0))$$\left( {\theta_1^{(0)},\theta_2^{(0)}, \ldots ,\theta_m^{(0)}} \right)$$ is set when the integrity and consistency of the data have been tested, but the value does not need to be necessarily approximated by the actual expected value of the parameter.

Step 3: Model Iteration and Posterior Sample Generation

Gibbs sampling values are generated sequentially from the full conditional distribution of each parameter. After t simulated sampling, a Mahalanobis chain is generated θk(0),θk(1),,θk(t0),$$\theta_k^{\left( 0 \right)},\theta_k^{\left( 1 \right)}, \cdots ,\theta_k^{\left( {{t_0}} \right)}, \cdots$$. Whenever the number of simulated sampling is sufficiently high, more accurate estimation results can be obtained. Meanwhile, in order to ensure the accuracy of the estimation, the pre-sampling values of the results of the first t0 initial iterations are considered to be excluded, and the sampling values of the last n = tt0 times are used to calculate the marginal a posteriori densities of each parameter, and the mean value of the a posteriori estimation is chosen to be used as the estimation value of the target parameter θk in this paper. The estimated value θk^$$\widehat {{\theta_k}}$$ of the target variable θk of each layer is obtained, and the estimated weight of each layer is obtained according to dk=1/θk^$${d_k} = 1/\widehat {{\theta_k}}$$ and the next step of calibration estimation is carried out.

Step 4: Convergence discrimination of MCMC algorithm

Convergence discrimination refers to, when estimating each parameter of the model using Gibbs sampling algorithm, judging whether it is converged or not based on the trajectory graph of the parameter in the process of finite iterations, i.e., when the image obviously tends to be stable as the number of iterations increases, it can be determined that the Markov chain of the parameter is converged and the posterior estimation based on the parameter is valid.

Empirical studies
Methodology and steps for simulation using the Monte Carlo method
Selection of data for simulation basis

We assumed that the cost of a traditional home is 5000 yuan and the incremental cost is 100, 200 and 300 yuan for simulation. The empirical study concludes that the service life of a home in the 1960s is 25.5 years, in the 1970s is 35.7 years, and in the 1980s is 40.4 years, and with the development of building technology and the ecological concept of building design, the service life of an eco-home should be more than 40 years, and in this paper we use a conservative estimate of 40 years.

The operating costs of a home consist primarily of energy costs. Energy costs are the costs incurred as a result of the energy consumed in heating, ventilating, cooling and lighting the home. In this paper, these two indicators are measured by the amount of water and electricity consumed per unit area (m2) of the residence per year. In this paper, domestic water consumption, residential electricity consumption and residential housing area in a city from 2000 to 2023 are selected as the basis of simulation data. From the results of the data, it can be seen that electricity costs have shown a rising trend in recent years, while water costs are relatively stable and decreasing year by year. Based on the analysis of the actual data, the variance of electricity consumption costs is 22% of the expected variance, which is due to the fact that the cost of electricity varies greatly over time, and the longer the period of time, the more factors that may affect the residential electricity consumption. In contrast, the cost of water consumption fluctuates less and the variance is fixed at 0.7. Based on the Housing Security and Housing Administration (HSA) statistics, it can be concluded that the average annual maintenance cost of a home in 2023 is 3.75yuan per square meter. Since the maintenance cost tends to increase over time and the volatility increases. Therefore, based on the actual data, the expectation of the annual maintenance cost Mt per unit area of the house is satisfied. The standard deviation is 21% of the expectation.

The most important feature of eco-house is energy saving, and the energy saving effect is generally responded by the energy saving rate. According to China’s ecological building design standards, the energy saving rate of some existing ecological buildings in China reaches more than 65%, and the power saving rate of general ecological houses is more than 50%, and the water saving rate is about 20%. Therefore, it is assumed that the power saving rate is evenly distributed between 50% and 60%, and the water saving rate is evenly distributed between 15% and 25%.

Full life cycle cost simulation analysis

In this paper, we simulate the total cost of a 100m2 eco-house and a conventional house of the same size in a city for 40 years and the energy-saving payback period of the eco-house. First, the end-of-period total costs of the eco-house with a construction cost of 5100 yuan/m2 per unit area and the traditional house with 5000 yuan/m2 are simulated. The full life cycle cost of the conventional house is shown in Figure 4. It can be seen that the total cost of the ecological residence has a 95% confidence interval of 540802 to 547936 yuan, which is significantly smaller than that of the traditional residence of 585904 to 605372 yuan. This indicates that eco-homes are economically feasible from a full life cycle perspective. However, in the current construction market, due to differences in design, construction, and choice of materials, the construction cost per unit area of many eco-homes is higher than that of traditional homes by 200 to 300 yuan. We simulate the total end-of-period costs in the case of eco-homes with a construction cost of 5,200 yuan/square meter and 5,300 yuan/square meter, respectively. The full life cycle cost of the eco-homes (5100 yuan/square meter) (5200 yuan/square meter) (5300 yuan/square meter) is shown in Fig. 5, Fig. 6, and Fig. 7. Comparison of the full life cycle cost of ecological and traditional houses is shown in Table 1. From the data in the table, it can be concluded that even if the construction cost of eco-households is at 5,300 yuan/m2, the total cost of traditional houses (595,582 yuan) is significantly higher than that of eco-houses (564,251 yuan). The eco-house can save almost 50% of the cost in the process of using it.

Figure 4.

Traditional residential life cycle cost

Figure 5.

Ecological residential full life cycle cost

Figure 6.

Ecological residential full life cycle cost

Figure 7.

Ecological residential full life cycle cost

Cost comparison

Housing type Cost of construction cost Final assembly Standard deviation Confidence interval(95%)
Ecology 5100 544322 1989 (540802,547936)
Ecology 5200 554158 1936 (550815,557855)
Ecology 5300 564251 1922 (560933,567811)
Tradition 5000 595582 4969 (585904,605372)

The next simulation is the energy-saving payback period of the eco-house. In this paper, the total end-of-period costs of eco-homes are simulated for each year from year 1 to year 40 under the conditions of construction costs of 5100 yuan/m2, 5200 yuan/m2, and 5300 yuan/m2, respectively, and the energy-saving payback period is calculated in comparison with that of a conventional house in the same period. The energy-saving payback periods of eco-homes with different construction costs are shown in Table 2. It can be seen that the average energy-saving payback period lengthens and increases in volatility as the unit price of construction cost rises. When the construction cost is 5100 yuan/square meter, it takes at most 10 years to recover the energy-saving investment. And when the construction cost is 5,300 yuan/square meter, it takes about 25 years. It can be seen that the lower the initial construction cost control, the shorter the energy-saving payback period, and the more obvious the advantages of eco-housing are reflected. However, it is difficult to reduce the incremental cost of eco-house construction to less than 100 yuan/square meter due to the influence of technology, design, materials, construction and environment. Based on this paper, the construction unit can consider controlling the incremental cost between 100 and 200 yuan/square meter, at which time the energy-saving payback period is 9 to 13 years, which is still economically beneficial relative to the 40-year life cycle. The payback period will be even shorter if the social and environmental benefits that cannot be calculated are taken into account.

Energy saving investment recovery period

Cost of construction cost Average energy saving investment recovery period The energy efficient investment returns 90% of the number
5100 9 year 10 year
5200 12 year 13 year
5300 25 year 26 year
Predictive analysis of energy efficiency retrofit measures based on posterior distribution

For large-scale energy-saving renovation of building groups, often due to statistical errors, measurement errors and samples can not do full coverage, or using design information, literature research or sampling parameters, using a priori information into the energy consumption prediction model, the energy saving prediction results obtained often have a certain bias, which often affects the decision-making of energy-saving renovation.

After using the Bayesian model to correct the parameters, the posterior distribution of energy parameters is closer to the actual value of the parameters, according to the study case to obtain the posterior distribution, can simulate the prediction results of energy savings before and after the correction of the Bayesian model, in which the posterior distribution of lighting density, in accordance with the Bayesian estimation of a fixed energy cutoff to take the value. The analysis of energy-saving measures based on the Bayesian energy model is shown in Fig. 8 (Figs. a~d show the energy savings from air-conditioning cooling source energy-saving measures, energy savings from lighting energy-saving measures, energy savings from window energy-saving measures, and energy savings from air-conditioning cooling source+lighting+window energy-saving measures, respectively). The results show:

Figure 8.

Analysis of energy saving measures based on Bayesian energy model

Air conditioning cold source retrofit energy savings are predicted. Refrigeration coefficient a priori parameter normal distribution mean value is 5.0, the posterior distribution mean value is 5.5, if the use of high-efficiency chiller and other energy-saving measures to improve the cooling source refrigeration coefficient to 6, energy consumption prediction model will be over-predicting the energy-saving effect of the cold source, the prediction model energy savings of 46.3 TJ, Bayesian estimated energy savings of 36.2 TJ, the prediction model is overestimated by 24.2%.

Improvement of thermal performance of external windows energy savings prediction. Heat transfer coefficient a priori parameter normal distribution mean value of 4.0, the posterior distribution mean value of 4.6, if the use of insulating glass, film and shading and other energy-saving measures to reduce the heat transfer coefficient of the exterior window to 2.3, energy consumption prediction model will be too low prediction of energy-saving effect, the prediction model energy savings of 3.15TJ, Bayesian estimated energy savings of 8.22 TJ, a difference of about 3 times.

Efficient lighting retrofit energy savings prediction. Lighting a priori parameter triangular distribution of the maximum possible value of 10, the posterior distribution of the maximum possible value of 9, if the use of high-efficiency lamps and lanterns such as LEDs, will reduce the lighting power density of 2W/m2, then the energy consumption prediction model will be too high prediction of energy-saving effect, the prediction model energy savings of 61.3 TJ, Bayesian estimated energy savings of 45.0 TJ, the prediction model estimate is higher than 26.6%.

Predicted energy savings for the three combined measures of air-conditioning cooling source, exterior window insulation, and high-efficiency lighting. The prior information and posterior distributions of the energy parameters of the three measures were used as input conditions, respectively: the predictive model energy savings were predicted to be 89.6 TJ, while the Bayesian model energy savings were predicted to be 65.2 TJ, and the Bayesian prediction of the combined energy savings was 72.8% of the predictive model’s simulated value.

Validation of the predictive model for energy efficiency retrofits in the building stock
Sample information on the energy efficiency retrofit prediction validation building stock

To verify the accuracy of the stochastic model’s energy efficiency retrofit predictions and the consistency of the database energy efficiency retrofit potential data. Within the study area, this paper evaluates the energy consumption data of 15 building clusters that have completed energy efficiency retrofits, with a floor area accounting for 24% of the buildings in the entire region, and demonstrates the accuracy of the model’s prediction outputs of energy efficiency retrofits on a large regional scale by observing the distribution of the building energy efficiency rates obtained from these retrofitted building samples, following the adoption of a comprehensive package of energy efficiency measures. Since the overall operating conditions and environments of these retrofit building samples do not change much in the study area, the monthly energy billing method is used to compare the results with the predictive model output. The building information and total energy consumption information in the sample pool of 15 building clusters against the energy prediction model is shown in Table 3. The total energy consumption of the observed 15 building sample pre-retrofit buildings is 510 TJ, which is 20% of the total regional building energy consumption. A total of 93 energy saving measures were taken, reducing energy consumption by 93 TJ, and the overall observed sample building energy saving rate was 17.2%.

Fixed information input parameters table of 15 retrofit buildings

Primary energy consumption Post-renovation energy consumption Floor area Layer height h0(m) Ka(m) a(m) (k+1)/k h0/a h0/h
Building 1 8.56E+13 6.85E+13 113973 54 4 92 22 1.2 0.16 0.02
Building 2 6.33E+13 5.2E+13 68119 32 4 115 17 1.18 0.15 0.06
Building 3 4.89E+13 4.36E+13 53417 28 4 302 61 1.21 0.04 0.03
Building 4 3.8E+13 2.95E+13 44806 9 4 159 22 1.14 0.14 0.08
Building 5 1.33E+13 1E+13 20007 22 4 42 22 1.5 0.15 0.04
Building 6 1.05E+13 7.95E+13 20690 21 4 51 20 1.4 0.16 0.05
Building 7 3.96E+13 3.12E+13 103368 32 4 82 39 1.53 0.06 0.02
Building 8 2.95E+13 2.65E+13 55414 43 4 59 22 1.3 0.16 0.01
Building 9 2.01E+13 1.45E+13 31809 22 4 66 24 1.3 0.17 0.05
Building 10 1.88E+13 1.52E+13 53983 39 4 82 17 1.25 0.14 0.04
Building 11 1.32E+13 1.11E+13 26881 20 4 52 19 1.37 0.16 0.03
Building 12 9.9E+13 6.95E+13 23695 25 4 47 23 1.46 0.16 0.02
Building 13 9.23E+13 7.58E+13 30833 32 4 50 21 1.4 0.14 0.03
Building 14 6.21E+13 5.22E+13 17629 22 4 44 22 1.47 0.13 0.06
Building 15 9.55E+13 8.13E+13 284633 30 4 398 299 1.75 0 0.03
Changes in energy consumption before and after energy-saving retrofits in the actual retrofitted building stock

The results of the energy consumption comparison of the 15 buildings before and after the energy efficiency retrofit are shown in Figure 9.

Figure 9.

Energy consumption comparison results

A graph comparing the LSBESR database with the actual energy savings rates for the 15 building retrofit sample is shown in Figure 10. Compared to the measure energy savings in the database, the energy savings rate data for the two groups are very close. When comparing the arithmetic mean of the two sets of data, the deviation between them is 15%. Taking building number 5 as an example, four energy-saving technical measures were implemented to retrofit the building. For the energy-saving retrofit prediction model, the corresponding prediction factors in the lighting density decreased by 5.5W/m2, the winter heating efficiency index increased from 0.85 to 3, and the summer cooling source COP index increased by 22%.

Figure 10.

Comparison of LSBESR database and the actual energy saving rate

The monthly electricity consumption data (in 10,000 k Wh) of a building is shown in Table 4. The month-by-month energy consumption before and after the energy-saving retrofit of the case is shown in Fig. 11, with (a)~(b) showing the energy consumption of the building before the retrofit and the energy consumption of the building after the retrofit, respectively. According to the energy consumption observation comparison data, the energy consumption of a building before and after the retrofit decreased by 22.1%, especially when the energy-consuming electric boiler was replaced with a high-efficiency air-cooled heat pump, which increased the heating efficiency by 200% in winter under the same heating load.

The monthly use of a building is based on the number of electricity

Pretransformation Retrofitting
2018 2019 2020 2021 2022 2023
January 148 118 162 121 111 96
February 98 135 98 107 98 88
march 109 172 91 86 91 77
April 20 55 61 51 60 51
may 75 43 73 57 54 51
June 87 80 79 74 60 62
July 108 99 105 88 73 82
August 135 103 111 79 87 83
September 80 80 70 83 68 61
October 67 61 63 58 53 42
November 60 56 58 41 43 51
December 130 110 106 117 91 67
Total 1117 1112 1077 962 889 811
Figure 11.

Energy consumption and energy consumption

Validation Comparison of Predictive Model Outputs with Actual Before and After Retrofit Data

Through the simulation model simulation of this paper, the comparison of the energy consumption before and after the renovation is obtained, and the comparison between the simulated value and the actual results shows Figure 12 (Figure a is the distribution of the total annual energy consumption before the renovation of the renovated building complex, Figure b is the distribution of the total energy consumption frequency of the renovated building complex before the transformation, Figure c is the annual total energy consumption distribution of the renovated building complex, and Figure D is the distribution of the total annual energy consumption frequency after the renovation of the renovated building complex). The following results are obtained:

Figure 12.

Comparison of simulation results with actual observations

The average value of the energy consumption simulated by the prediction model before the retrofit is 526TJ, while the observed value of the actual energy consumption is 511TJ, with a deviation of 2.9%, and the actual value of the energy consumption falls within the simulation value of the interval μ ± δ.

After adjusting the retrofit energy parameters, the average value of total annual energy consumption simulated by the prediction model is 405TJ, while the observed value of actual energy consumption is 412TJ, with a deviation of -1.7%, and the actual value of energy consumption falls within the interval of simulation value μ ± δ.

The average energy saving before and after the retrofit is 121 TJ, and the energy saving rate is 23%, while the actual observed energy saving is 99 TJ, and the observed energy saving rate is 19.4%, with a deviation of 22.22%, and the deviation of the energy saving rate is larger than the deviation of the total energy consumption.

Conclusion

In this paper, the established predictive stochastic model based on Monte Carlo method is used to carry out the simulation analysis of the whole life cycle cost of the building complex and the prediction of energy-saving renovation, and the validation analysis is carried out for the predicted results, and the results are as follows:

Through the simulation analysis, the total cost of eco-homes has a 95% confidence interval of 540,802 yuan to 547,936 yuan, which is significantly smaller than the total cost of traditional homes. It can be seen that eco-homes have obvious cost advantages compared with traditional buildings in the whole life cycle.

The energy-saving effect of 15 buildings that have implemented energy-saving renovation in the regional building group is compared, and the changes in building energy consumption data and energy parameters before and after the renovation are collected. After adjusting the energy parameters of the renovation, the average value of the total yearly energy consumption simulated by the prediction model is 405 TJ, with a deviation of -1.7% from the observed value of the actual energy consumption, which further verifies the accuracy of the model.

Language:
English