Dynamic channel model estimation based on gradient descent method and its optimization in massive MIMO

With the advent of the information age, wireless communication network plays an extremely important strategic role in the social and economic development of the country, and penetrates into various fields of society, and its development shows the trend of high-speed, broadband, heterogeneous and ubiquitous, and ultimately constitutes the air and ground, three-dimensional intersection and seamless link of the global network [1-4]. At the same time, wireless communication networks are also facing serious challenges, mainly in the relative “shortage” and “waste” of spectrum resources, the simultaneous existence of multiple heterogeneous network standards, ubiquitous access and services, and the complexity of network management [5-8]. The root cause of these problems is that the network is not a “network”, but a “network”. The root cause of these problems is that the static wireless network is difficult to adapt to the dynamic changes in the environment, one of the effective ways to solve these problems is cognitive radio technology.

The study of multiple-input multiple-output (MIMO) channels has now become a hot research topic in cognitive radio networks. Most of the current studies on MIMO model the channel from a static scenario of some fixed propagation environment to describe the characteristics of radio wave propagation in the channel, which will lead to a lack of perception of dynamic changes in the characteristics of radio wave propagation at the user terminal [9-13]. Wireless MIMO systems effectively utilize the spatial degrees of freedom of multi-antenna transceivers, which provide significant improvements in increasing channel capacity, link reliability, and reducing channel interference, but these properties cannot be optimized at the same time in practical applications [14-16]. Therefore, observable MIMO channel models are developed using suitable methods for estimating channel parameters and enhancing the ability of user terminals to gain a deeper understanding of the multidimensionality and multiscale characteristics of radio wave propagation in wireless environments that are in a dynamic stochastic nature [17-18].

In this paper, the Stein Variational Gradient Descent (SVGD) method is firstly applied to the channel simulation of large-scale MIMO in dynamic environments, which utilizes the maximum likelihood function established by the SVGD sampling model to improve the accuracy of channel estimation with the help of the deterministic updating direction of particles. Then, the performance of SVGD-based channel estimation algorithm is optimized in large-scale MIMO scenarios, and a dynamic channel simulation technique based on SVGD is proposed, which supports static channel, dynamic channel and birth/death channel simulation, and can also support large-scale MIMO fading types. Finally, the channel estimation problem is transformed into a variational inference optimization problem by taking advantage of the low-rank nature of massive MIMO channels, and the performance of the new method is simulated and experimented.

2

Methodology

2.1

Gradient descent method

When exact inference is used to solve the dynamic channel model, the accuracy of the model computation results is improved, but at the same time, a high computational complexity is introduced. Therefore, to strike a balance between computational complexity and accuracy, this paper uses approximate inference to solve the dynamic channel model. There are two common methods for approximate inference. The first one is to complete the approximation by using a randomized sampling algorithm, and the commonly used method is MCMC. The second one is to complete the inference by using deterministic particle updating method, and the commonly used method is Stein’s variational gradient descent method.

2.1.1

Markov chain Monte Carlo

Markov chain Monte Carlo (MCMC) is a method for solving integrals by simulation, which is widely used in various fields. Currently, MCMC has been able to solve many difficult high-dimensional complex models and non-Bayesian problems [19]. The main idea of MCMC is to construct a Markov chain and obtain a smooth distribution after many iterations, at which time the particles obtained by sampling the smooth distribution can approximate the target distribution.The main sampling method of MCMC is Metroplis-Hastings.

The Metroplis-Hastings algorithm is a classical MCMC sampling method, which is based on the idea of “rejection sampling”, and each iteration is updated according to the probability α of the sampling results, and finally realizes the approximation of the target distribution. Before introducing the Metroplis-Hastings algorithm, we first introduce the Metroplis algorithm. Assuming that the target distribution to be sampled is $f (x) = \frac{g (x)}{Z}$ \[f\left( x \right)\text{ }=\text{ }{}^{g\left( x \right)}\!\!\diagup\!\!{}_{Z}\;\], where the normalization constant Z is unknown and difficult to compute, the Metroplis method samples the target distribution according to the following procedure:

1) Initialize the starting value x₀,x₀ to satisfy f(x₀) > 0.

2) Sample a candidate point x^* from some jump distribution q(x₁,x₂) based on the current value of x. This distribution is also known as the recommendation distribution, and the restriction on the recommendation distribution in the Metropolis algorithm is that it must be symmetric, i.e.: (1) $q (x_{1}, x_{2}) = q (x_{2}, x_{1})$ \[q({{x}_{1}},{{x}_{2}})=q({{x}_{2}},{{x}_{1}})\]

3) Given candidate point x^*, calculate the ratio of the density function at candidate point x^* and current point x_t–1: (2) $α = \frac{p (x^{*})}{p (x_{t - 1})} = \frac{f (x^{*})}{f (x_{t - 1})} = \frac{g (x^{*})}{g (x_{t - 1})}$ \[\alpha =\frac{p({{x}^{*}})}{p({{x}_{t-1}})}=\frac{f({{x}^{*}})}{f({{x}_{t-1}})}=\frac{g({{x}^{*}})}{g({{x}_{t-1}})}\]

From this, it can be found that the normalization constant Z can be eliminated by calculating the ratio of the density functions at the two points.

4) When α > 1, update x_t = x^* and return to step two. When α < 1, a random number [0, 1] is generated. If this random number is less than α, update the value of x_t, otherwise return to the second step.

In the Metropolis sampling process, it is calculated first: (3) $α = \min (\frac{f}{(x^{*})} f (x_{t - 1}), 1)$ $\alpha =\min (\frac{f}{({{x}^{*}})}f({{x}_{t-1}}),1)$

A random number is then generated and compared with the acceptance probability α to decide whether to accept the candidate point. Since the transfer probability from x_t to x_t+1 depends only on x_t and is independent of (x₀,⋯,x_t–1). Thus, Metropolis sampling is a Markov chain process (x₀,x₁,⋯,x_t–1,⋯). After a sufficiently long burn-in period, the Markov chain approximates the steady state distribution, and the obtained (x_k+1,⋯,x_t+n) from sampling can then be viewed as an approximation to the target distribution f(x).

Hastings optimized the Metropolis sampling method and proposed the Metropolis-Hastings method, which sets the acceptance probability α: (4) $α = \frac{f (x^{*}) q (x^{*}, x_{t - 1})}{f (x_{t - 1}) q (x_{t - 1}, x^{*})}$ \[\alpha =\frac{f({{x}^{*}})q({{x}^{*}},{{x}_{t-1}})}{f({{x}_{t-1}})q({{x}_{t-1}},{{x}^{*}})}\]

The Metropolis-Hastings method is able to use an asymmetric transfer probability function of: (5) $q (x_{1}, x_{2}) = \Pr (x_{1} \to x_{2})$ $q({{x}_{1}},{{x}_{2}})=\Pr ({{x}_{1}}\to {{x}_{2}})$

Thus, the condition of strict symmetry of the recommendation distribution is avoided.

2.1.2

Stein variational gradient descent method

Stein Variational Gradient Descent (SVGD) is a variational inference method that uses particle deterministic updating to approximate the target distribution, avoiding the need to solve for regularization coefficients in the process of approximating the target distribution [20]. SVGD solves for the fastest descent direction in a different way compared to traditional variational inference algorithms. The traditional variational inference algorithm approximates and then optimizes when solving for the fastest descent direction, while SVGD uses optimization first and then goes for approximation.

Suppose that by minimizing the KL scatter, a simpler distribution q^*(x) is found from the predefined set Q = {q(x)} to approximate the target distribution p(x) as: (6) $q^{*} = \arg \min_{q \in Q} {KL (q | | p) \equiv E_{q} [\log q (x)] - E_{q} [\log \bar{p} (x)] + \log Z}$ \[{{q}^{*}}=\arg {{\min }_{q\in Q}}\{\text{KL}(\left. q \right|\left| p \right.)\equiv {{\text{E}}_{q}}[\log q(x)]-{{\text{E}}_{q}}[\log \bar{p}(x)]+\log Z\}\]

Suppose that T is a smooth mapping of X → X and that x is obtained from a tractable reference distribution q₀(x) with a density function of z = T(x) by a change in the form of the variables: (7) $q_{[T]} (z) = q (T^{- 1} (z)) \cdot | \det (\nabla_{z} T^{- 1} (z)) |$ \[{{q}_{[T]}}(z)=q({{T}^{-1}}(z))\cdot |\det ({{\nabla }_{z}}{{T}^{-1}}(z))|\]

where T⁻¹ denotes the inverse transformation of T and ∇_zT⁻¹ represents the Jacobian matrix of T⁻¹.

Let T(x) = x + òϕ(x),x ~ q(x),q_[T](z) denote the density function of z = T(x) with: (8) $\nabla_{ò} KL (q_{[T]} | | p) |_{ò = 0} = - E_{x ~ q} [t r a c e (A_{p} ϕ (x))]$ \[{{\nabla }_{\text{ }\!\!\grave{\mathrm{o}}\!\!\text{ }}}\text{KL}({{q}_{[T]}}||p)\left| _{\text{ }\!\!\grave{\mathrm{o}}\!\!\text{ }=0} \right.=-{{\text{E}}_{x\tilde{\ }q}}[trace({{\text{A}}_{p}}\phi (x))]\]

where A_pϕ(x) = ∇_xlog p(x)ϕ(x)^T + ∇_xϕ(x) is the Stein operator.

Let $B = {ϕ \in H^{d} : {‖ ϕ ‖}_{H^{d}}^{2} \leq S (q, p)}$ $\text{B}=\{\phi \in {{\text{H}}^{d}}:\left\| \phi \right\|_{{{\text{H}}^{d}}}^{2}\le \text{S}(q,p)\}$ be the unit ball in vector-valued Hilbert space H^d, then the optimal perturbation direction is: (9) $ϕ_{q, p}^{*} (\cdot) = E_{x ~ q} [k (x, \cdot) \nabla_{x} \log p (x) + \nabla_{x} k (x, \cdot)]$ \[\phi _{q,p}^{*}(\cdot )={{\text{E}}_{x\tilde{\ }q}}[k(x,\cdot ){{\nabla }_{x}}\log p(x)+{{\nabla }_{x}}k(x,\cdot )]\]

Therefore, it can be found that the value of the gradient of Eq. (9) is equal to the value of the KSD, i.e: (10) $\nabla_{ϵ} K L (q_{[T]} ║ p) |_{ϵ = 0} = - \sqrt{S (q, p)}$ \[{{\nabla }_{\epsilon }}KL({{q}_{[T]}}||p){{|}_{\epsilon =0}}=-\sqrt{\text{S}(q,p)}\]

The above equation shows a specific iterative process, transforming step by step from the initialized distribution to finally approximate the target distribution. First, the initialized particle ${x_{i}^{0}}_{i = 1}^{m}$ $\{x_{i}^{0}\}_{i=1}^{m}$ is obtained by sampling the density function q₀(x) and setting p(x) = f(x). From this, the optimal perturbation direction can be obtained: (11) ${\hat{ϕ}}_{q_{0}, p}^{*} (x) = \frac{1}{m} \sum_{j = 1}^{m} [\nabla_{x_{j}^{0}} \log p (x_{j}^{0}) k (x_{j}^{0}, x) + \nabla_{x_{j}^{0}} k (x_{j}^{0}, x)]$ $\hat{\phi }_{{{q}_{0}},p}^{*}(x)=\frac{1}{m}\sum\limits_{j=1}^{m}{\left[ {{\nabla }_{x_{j}^{0}}}\log p(x_{j}^{0})k(x_{j}^{0},x)+{{\nabla }_{x_{j}^{0}}}k(x_{j}^{0},x) \right]}$

Doing transformation $T_{0} (x) = x + ϵ {\hat{ϕ}}_{q_{0}, p}^{*} (x)$ ${{T}_{0}}(x)=x+\epsilon \hat{\phi }_{{{q}_{0}},p}^{*}(x)$, the particles updated in the first iteration can be obtained. Then set ${x_{i}^{1}}_{i = 1}^{m} = T_{0} (x)$ $\{x_{i}^{1}\}_{i=1}^{m}={{T}_{0}}(x)$ to get the optimal perturbation direction for the second iteration from Eq. (11): (12) ${\hat{ϕ}}_{q_{1}, p}^{*} (x) = \frac{1}{m} \sum_{j = 1}^{m} [\nabla_{x_{j}^{1}} \log p (x_{j}^{1}) k (x_{j}^{1}, x) + \nabla_{x_{j}^{1}} k (x_{j}^{1}, x)]$ $\hat{\phi }_{{{q}_{1}},p}^{*}(x)=\frac{1}{m}\sum\limits_{j=1}^{m}{\left[ {{\nabla }_{x_{j}^{1}}}\log p(x_{j}^{1})k(x_{j}^{1},x)+{{\nabla }_{x_{j}^{1}}}k(x_{j}^{1},x) \right]}$

Doing transformation $T_{1} (x) = x + ϵ {\hat{ϕ}}_{q_{1}, p}^{*} (x)$ ${{T}_{1}}(x)=x+\epsilon \hat{\phi }_{{{q}_{1}},p}^{*}(x)$ yields the particles updated for the second iteration. The cycle continues sequentially, and when $ϕ_{q, p}^{*} (\cdot) \equiv 0$ $\phi _{q,p}^{*}(\cdot )\equiv 0$, the final particle is able to approximate the target distribution.

In summary, Stein variational gradient descent is a new particle-based variational inference algorithm. It combines the advantages of variational inference and deterministic updating of particle methods to effectively utilize gradient information for the approximation of the target distribution, thus achieving a faster convergence rate. Unlike traditional variational inference that constructs a parametric approximation of the target distribution by minimizing the KL scatter, SVGD directly approximates the target distribution with a series of particles, which are continuously iteratively updated to reduce the KL scatter as fast as possible in all possible velocity fields of the regenerative kernel Hilbert space of a positive definite kernel. Furthermore, it is theoretically possible to apply SVGD directly to high-dimensional models. However, standard SVGD involves kernel functions for updating particles at each iteration, but its kernel functions are defined over all variable dimensions, and the use of global kernel functions over all variables introduces a loss in the algorithm’s performance in higher dimensions. Therefore, SVGD does not produce distributed message passing like the confidence propagation algorithm (BP). In this paper, we mainly consider the problem of estimating the mean of high-dimensional model variables using SVGD, and do not consider variance estimation, so the performance defect of SVGD on high dimensions will not affect the accuracy of the experimental results in this paper.

2.2

SVGD-based dynamic channel modeling

2.2.1

Dynamic channel model

In dynamic scenarios, the transceiver antenna of the communication device is in a mobile state, the propagation environment changes continuously, and the channel parameters such as delay, maximum Doppler shift, and path loss should have randomness. At the same time, the continuous change of the channel scene also causes the delay, maximum Doppler shift and path loss to change regularly and continuously. The theoretical model of channel impact response at moment t can be expressed as: (13) $\tilde{h} (t, τ) = \sum_{l = 1}^{L} a_{l} (t) {\tilde{β}}_{l} (t) δ (τ - τ_{l} (t))$ \[\tilde{h}(t,\tau )=\sum\limits_{l\;=\;1}^{L}{{{a}_{l}}}(t){{\tilde{\beta }}_{l}}(t)\;\delta \;(\tau -{{\tau }_{l}}(t))\]

where t denotes the time; a_l(t)β_l(t),τ_l(t) denotes the time-varying path loss, channel fading and multipath delay parameters. The time-varying impact response of the dynamic channel is shown in Fig. 1. The dynamic channel can be categorized into two types: mobile channel and birth/death channel. Among them, the mobile channel path delay varies with time. The extinction channel is the two distinguishable path delay alternately change “life” and do not change “extinction”, and the location of the change is random.

2.2.2

Multi-channel dynamic channel simulation platform

According to the microwave signal transmission path to establish a multi-channel wireless channel simulation platform shown in Figure 2, the platform contains user parameter configuration unit, RF input unit, channel parameter storage unit, channel simulation unit, RF output unit and other modules. When the multi-channel channel simulation platform works, it first selects the channel mode and configures the channel parameters through the user software interface. Then, the channel parameters are passed to the FPGA through the data interface, and the FPGA stores the parameters according to the user-selected channel mode, and the RF input module downmixes the input RF signal to IF. Then downconvert to baseband in FPGA, simulate channel fading, delay, loss, and superimpose channel noise on the baseband signal according to user-configured parameters, and then after upconversion to IF number, output to RF output unit to upmix to RF.

Among them, the user parameter configuration unit is the core process of dynamic channel parameter configuration of the simulation platform, which is through the user parameter configuration software on the PC, and the flow of the user parameter configuration unit is shown in Figure 3. User parameter function configuration, parameter calculation, parameter transmission, and formation of dynamic channel parameters are needed to form the frame flow structure shown in Figure 4. The parameter function configuration mainly consists of the user selecting each channel mode, including three modes: static channel, mobile channel, and birth/death channel. Then, the channel parameters are configured by the user, and the number of fading paths, the delay of each path, the loss, the fading type, the moving speed, the communication frequency, and the signal-to-noise ratio need to be configured in the static mode. In dynamic mode, it is necessary to configure the number of fading paths, the fading type, the starting traveling speed, the traveling acceleration, the communication frequency, the path loss, the basic delay, the delay variation range, the delay variation rate, the signal-to-noise ratio and so on. In fading mode, it is necessary to configure the moving speed, communication frequency, path loss, basic delay, delay change range, delay interval, number of fading positions, fading period, signal-to-noise ratio, etc.

The parameter calculation is based on the Stein Variational Gradient Descent (SVGD) method to calculate the channel parameter calculation with parameter localization as follows:

1) Calculate the Doppler frequency according to SVGD

The Doppler frequency is calculated for static channel as: (14) $f_{d} = \frac{f_{c} v}{c}$ \[{{f}_{d}}=\frac{{{f}_{c}}v}{c}\]

When moving the channel, the moving speed at moment t_k is first calculated based on the user-set starting speed v₀, moving acceleration a and equation (14). The Doppler frequency at this moment is: (15) $v = v_{0} + a t_{k}$ \[v={{v}_{0}}+a{{t}_{k}}\]

2) Calculate the delay of each path according to the parameters set by the user

In the mobile channel mode, the path delay of each path at the t_k th moment is calculated according to the user setting of the basic delay of each path τ_l,0, the range of delay variation (τ_min,τ_max), and the rate of delay variation Δτ: (16) $τ_{l, k} = τ_{l, 0} + Δ τ \cdot t_{k}$ \[{{\tau }_{l,k}}={{\tau }_{l,0}}+\Delta \tau \cdot {{t}_{k}}\]

And satisfy 0 ≤ τ_min ≤ Δτ·t_k ≤ τ_max.

3) In the birth/death channel mode, the path delay at the t_k th moment of each path is calculated according to the user setting of the basic delay τ_l,0, the delay variation range (τ_min,τ_max), the delay interval ゔτ, and the number of birth/death positions M: (17) $τ_{l, k} = τ_{l, k - 1} + Δ τ \cdot R$ \[{{\tau }_{l,k}}={{\tau }_{l,k-1}}+\Delta \tau \cdot R\]

where τ_l,k–1 denotes that the delay R of the lnd path at the moment of τ_l,k is a random number obeying a uniform distribution in the interval [1, M] and satisfies: (18) $0 \leq τ_{\min} \leq Δ τ \cdot R \leq τ_{\max}; τ_{\max} = τ_{\min} + Δ τ \cdot (M - 1)$ $0\le {{\tau }_{\min }}\le \Delta \tau \cdot R\le {{\tau }_{\max }}\;;{{\tau }_{\max }}={{\tau }_{\min }}+\Delta \tau \cdot (M-1)$

4) Calculate the fading factor, discrete Doppler, and phase of each path according to the user setting of each path fading type and spectrum.

5) Fixed-point processing of calculated channel parameters

Parameter transmission is based on the user’s choice of channel mode, the fixed-point channel parameters are assembled into frames in a certain order, and header information is added, including the channel mode, channel dynamic update rate, signal-to-noise ratio, etc., and transmitted to the FPGA through the data interface.

2.3

Channel Modeling for Massive MIMO Systems

Compared with the MIMO system in traditional 4G mobile communications, the massive MIMO system in 5G communications deploys a dense array of antennas at the base station end, which results in an increase in the transmission capacity of the channel and an increase in the spectrum utilization of the system, as well as the ability to satisfy more low-latency and high-reliability services. However, the large number of antennas makes the channel estimation techniques for massive MIMO systems face unprecedented challenges, so it is crucial to investigate high-performance channel estimation methods [21].

2.3.1

Large-scale MIMO systems

The principle of massive MIMO is to configure hundreds or thousands of antennas at the transceiver ends to simultaneously provide services to multiple single-antenna users sharing the same time-frequency time slots, and the framework of the massive MIMO system is shown in Fig. 5 [22]. Consider the single-cell multi-user massive MIMO case, the cell base station end is configured with M antenna, and there are K users in the cell, of which M₯ K. Therefore, in the uplink transmission link, the base station end receives the signal vector sent by the K user is denoted as: (19) $y = G x + n$ \[y=Gx+n\]

Where, G denotes the M×K-dimensional channel matrix between the base station side and the user side, x denotes the signal vector at the transmitter side, which is known from the mobile communication principle, and y denotes the signal at the receiver side. n denotes the M×1-dimensional channel noise in the receiving system.

By definition, element (m,k) in channel matrix G can be expressed as: (20) $g_{m k} = h_{m k} β_{m k}^{1 / 2}$ \[{{g}_{mk}}={{h}_{mk}}\beta _{mk}^{1/2}\]

where g_mk represents the channel coefficient between the m nd antenna array and the k rd user, h_mk represents the multi-transit fading coefficient between the m th antenna at the base station end and the k th user received, and β_mk represents the large-scale fading coefficient between the m th antenna at the base station end and the k th user received. The channel matrix G can be expressed as: (21) $G = [\begin{matrix} g_{11} & \dots & g_{1 K} \\ \dots & \dots & \dots \\ g_{M 1} & \dots & g_{M K} \end{matrix}] = H D$ \[G=\left[ \begin{matrix} {{g}_{11}} & \cdots & {{g}_{1K}} \\ \cdots & \cdots & \cdots \\ {{g}_{M1}} & \cdots & {{g}_{MK}} \\ \end{matrix} \right]=HD\]

Among them: (22) $H = [\begin{matrix} h_{11} & \dots & h_{1 K} \\ \dots & \dots & \dots \\ h_{M 1} & \dots & h_{M K} \end{matrix}], D = [\begin{matrix} β_{11} & \dots & 0 \\ \dots & \dots & \dots \\ 0 & \dots & β_{M K} \end{matrix}]$ \[H=\left[ \begin{matrix} {{h}_{11}} & \cdots & {{h}_{1K}} \\ \cdots & \cdots & \cdots \\ {{h}_{M1}} & \cdots & {{h}_{MK}} \\ \end{matrix} \right],D=\left[ \begin{matrix} {{\beta }_{11}} & \cdots & 0 \\ \cdots & \cdots & \cdots \\ 0 & \cdots & {{\beta }_{MK}} \\ \end{matrix} \right]\]

Introducing the above communication channel model to multi-antenna users, consider that the cell base station end is equipped with M antenna and there are K users in the cell, each with N antennas. In the downlink transmission link, the signal received by the ith user (i = 1,2,3,⋯,K) and sent from the base station end can be expressed as: (23) $y_{i} = G_{i} x + n_{i}$ \[{{\mathbf{y}}_{i}}={{G}_{i}}x+{{n}_{i}}\]

where G_i denotes the channel matrix between the i nd user and the base station end of the downlink, and x denotes the transmit signal, which can be expressed as: (24) $x_{i} (t) = [\begin{matrix} x_{1} (t), x_{2} (t), \dots x_{M} (t) \end{matrix}]$ \[{{x}_{i}}(t)=\left[ \begin{matrix} {{x}_{1}}(t),{{x}_{2}}(t),\cdots {{x}_{M}}(t) \\ \end{matrix} \right]\]

where y denotes the signal at the receiving end, which can be expressed as: (25) $y_{i} (t) = [\begin{matrix} y_{1} (t), y_{2} (t), y_{3} (t) \dots y_{M} (t) \end{matrix}]$ \[{{y}_{i}}(t)=\left[ \begin{matrix} {{y}_{1}}(t),{{y}_{2}}(t),{{y}_{3}}(t)\cdots {{y}_{M}}(t) \\ \end{matrix} \right]\]

where n_i denotes the additive Gaussian white noise in the channel.

Considering that the channel undergoes flat fading in massive MIMO systems, Eq. (24) and Eq. (25) can be simplified and written as: (26) $y = H x + n$ \[y=Hx+n\] (27) $y_{i} = H_{i} x + n_{i}$ \[{{y}_{i}}={{H}_{i}}x+{{n}_{i}}\]

2.3.2

Channel Estimation Optimization

In this section, the channel estimation problem is modeled as a variational inference optimization model by exploiting the low-rank nature of the massive MIMO channel matrix, and the Stein variational gradient descent (SVGD) algorithm is used to recover the channel state information of the user using the log function as the matrix penalty function. In this method, the log function has a better ability to induce sparsity than the kernel paradigm and penalizes all elements in a uniform way. The log function can impose a larger penalty on small elements in the matrix than on elements with larger values, a property that makes the log function closer to the rank of the matrix than the kernel paradigm, resulting in more accurate channel estimation accuracy.

In the finite scatterer propagation environment, the massive MIMO channel matrix exhibits a low-rank property, and in order to accurately estimate the channel state information at the receiver, the channel estimation problem is transformed into a rank minimization constraint problem according to the SVGD-based dynamic channel model: (28) $\min_{H} r a n k (H)$ \[\underset{H}{\mathop{\min }}\,rank(H)\] (29) $s . t . y = H x$ \[s.t.\;y=Hx\]

While in massive MIMO operating in time-division duplex mode, the commonly used channel estimation method is non-blind channel estimation i.e., sending guide frequency sequences. In the guide frequency training phase, all the users simultaneously send a guide frequency sequence Φ of length L in the same frequency band, assuming that Φ = [ϕ(1),ϕ(2),…ϕ(K)] is a K×L-dimensional guide frequency matrix, the guide frequency observation signal Y of the user at the receiving end can be expressed as: (30) $Y = H Φ + N$ \[Y=H\Phi +N\]

where $Y = [y_{1}^{T}, y_{2}^{T}, \dots, y_{k}^{T}] \in ℂ^{M \times L}, H = [h_{1}^{T}, h_{2}^{T}, \dots, h_{k}^{T}] \in ℂ^{M \times K}$ $Y=\left[ y_{1}^{\text{T}},y_{2}^{\text{T}},...,y_{k}^{\text{T}} \right]\in {{\mathbb{C}}^{M\times L}},H=\left[ h_{1}^{\text{T}},h_{2}^{\text{T}},...,h_{k}^{\text{T}} \right]\in {{\mathbb{C}}^{M\times K}}$ denotes the channel matrix to be estimated and N denotes the noise matrix. Vectorize Eq. (30): (31) $y = ψ h + n$ \[y=\psi h+n\]

Among them: (32) ${\begin{cases} y = v e c (Y) \\ Ψ = Φ \otimes I_{M} \\ h = v e c (H) \\ n = v e c (N) \end{cases}$ \[\left\{ \begin{align} & y=vec(Y) \\ & \Psi =\Phi \otimes {{I}_{M}} \\ & h=vec(H)\; \\ & n=vec(N) \\ \end{align} \right.\] (33) $\min_{H} ║ H | |_{*} = \sum_{i = 1}^{r} σ_{i}$ \[\underset{H}{\mathop{\min }}\,||H|{{|}_{*}}=\sum\limits_{i=1}^{r}{{{\sigma }_{i}}}\] (34) $s . t . y = Ψ h$ \[s.t.\;y=\Psi h\]

Since the above rank minimization constraint problem is an NP-hard problem, it is computationally difficult to solve, but it has been shown that the matrix rank minimization problem can be converted to a kernel-paradigm minimization constraint problem due to the fact that the kernel-paradigm minimization problem is the optimal convex approximation of the rank minimization problem. Where σ_i denotes the ird singular value of the channel matrix H and r denotes the expected rank of the channel matrix.

In order to solve Eq. (31), it is necessary to know the rank information in advance and then compute this optimization problem using the iterative hard thresholding method. However, obtaining a priori information about the rank of the matrix in advance is very difficult, so the low rank matrix can be transformed into an unconstrained kernel-paradigm convex optimization problem by transforming it into an unconstrained kernel-paradigm convex optimization problem: (35) $\min_{H} \frac{1}{2} {‖ y - Ψ h ‖}_{2}^{2} + λ | | H | |_{*}$ $${\min _H}{1 \over 2}y - \Psi h_2^2 + \lambda |\left| H \right|{|_*}$$

Further Eq. (35) is transformed into a weighted kernel paradigm minimization problem: (36) $\min_{H} \frac{1}{2} {‖ y - Ψ h ‖}_{2}^{2} + λ {‖ H ‖}_{*}$ \[\underset{H}{\mathop{\min }}\,\frac{1}{2}\left\| y-\Psi h \right\|_{2}^{2}+\lambda {{\left\| H \right\|}_{*}}\]

Among them: (37) ${‖ H ‖}_{* w} = \sum_{i = 1}^{r} \frac{σ_{i} (H)}{ω_{i}}$ \[{{\left\| H \right\|}_{*w}}=\sum\limits_{i=1}^{r}{\frac{{{\sigma }_{i}}\left( H \right)}{{{\omega }_{i}}}}\]

where H the weighted kernel paradigm, σ_i(H) is the i th singular value of matrix H, and ω_i denotes the ith weight factor. In this paper, the problem is solved using the very large minimal (MM) method, which approximates the rank of the matrix with a bias despite the fact that the weighted kernel paradigm is widely used. In order to better approximate the rank of the matrix and obtain better channel estimation, we transform the rank minimization problem of the channel matrix into a nonconvex relaxation optimization problem, whose mathematical model (LOG function model) is expressed as follows: (38) $\min_{H} \frac{1}{2} {‖ Y - H Ψ ‖}_{F}^{2} + λ \sum_{i = 1}^{l} \log (1 + \frac{| σ_{i} (H) |_{1}}{a})$ $$\mathop {\min }\limits_H {1 \over 2}Y - H\Psi _F^2 + \lambda \mathop \sum \limits_{i = 1}^l \log \left( {1 + {{|{\sigma _i}(H){|_1}} \over a}} \right)$$

where a > 0 is the scale parameter, which is taken as a = 0.45 in this chapter, ▯▯_F is the Frobenius paradigm of the matrix, and σ_i(H) is the i th singular value of the matrix H.

3

Results and discussion

3.1

Simulation environment and training parameters

3.1.1

Simulation environment

The simulation data used in this paper is generated by Deep MIMO, a multi-channel dynamic channel simulation platform. In order to test the ability of the SVGD-based dynamic channel model to adapt to different environments, two types of large-scale MIMO datasets are generated in this paper, LOS and NLO, respectively, and the outdoor scenario O1_60 ray-tracing dataset used in the LOS scenario. The test area is shown in Fig. 6. In the dynamic channel scenario, the test area is divided into three stages from east to west, from near to far from the base station, and the user numbers are R551-R654, R655-R789, and R790-R920, respectively, and the specific parameters of the simulation are shown in Table 1.

Table 1.

Simulation parameters of LOS scenarios

Argument	Numerical value
Carrier frequency	50 GHz
System bandwidth	0.5 GHz
Number of subcarriers	1
Multipath quantity	10
Base station number	1
Base station antenna array	(1,64,32)
Antenna spacing	0.5
Phase shifter resolution	5 bit
User area number	R551-R654, R655-R789, R790-R920

The NLOS scene uses the outdoor scene O1_28B, the overall layout of this scene is similar to O1_60, the only difference is the existence of blocking and reflective surfaces near base station 3, the NLOS scene is shown in Fig. 7. The specific parameter settings of the NLOS scene are shown in Table 2. The NLOS scene realizes the NLOS condition by adding blocking at base station 3, and two reflective surfaces are set up on both sides for create more NLOS paths. In the dynamic scenario, the test area is divided into three phases from east to west, from far to near to the base station, and the user numbers are R551-R650, R651-R750, and R751-R850, respectively.

Table 2.

Simulation parameters of NLOS scenarios

Argument	Numerical value
Carrier frequency	30 GHz
System bandwidth	0.5 GHz
Number of subcarriers	1
Multipath quantity	5
Base station number	2
Base station antenna array	(1,32,2)
Antenna spacing	0.5
Phase shifter resolution	5 bit
User area number	R551-R650, R651-R750, R751-R850

3.1.2

Training parameters

After generating the simulation data it has to be pre-processed before it is fed into the network for training, first the data needs to be normalized as shown in Eq: (39) ${\bar{h}}_{j} = \frac{h_{j}}{\max (| h_{j} |)}$ \[{{\overline{h}}_{j}}=\frac{{{h}_{j}}}{\max (\left| {{h}_{j}} \right|)}\]

where h_j denotes the channel corresponding to the j nd transmitting antenna, and the denominator is the maximum value of all the elements modulo h_j. In addition, the real and imaginary parts of the channel matrix need to be spliced together after normalization because the current deep learning framework cannot handle complex numbers well and directly.

The dynamic channel model based on the Stein variational gradient descent (SVGD) algorithm (hereinafter referred to as the SVGD model), with the remaining hyper-parameters shown in Table 3. The SVGD model consists of two networks, Actor and Critic. The input to the Actor network is state t_s, i.e., the phase of all phase shifters at the transmitter side, with dimension N_T.The Actor network has two hidden layers, each consisting of 16×N_T neurons, and the activation function is a global kernel function.The Actor network outputs a predicted action with the same dimensions as the state, and the activation function is a log function.For Critic networks, the input network is a patchwork of states and actions with dimension 2×N_T .There are two hidden layers, each with 16×N_T neurons, and the activation function is a log function.The output of the Critic network represents the predicted Q-value of the input state-action pair, which is a real scalar with dimension 1.

Table 3.

Network parameters of SVGD model

Argument	Numerical value
Learning rate	0.001
Weight attenuation	0.02
Playback buffer size	10000
Lot size	1036

3.2

Analysis of simulation results

3.2.1

Beamforming capability analysis

In this subsection, the beamforming capability of the SVGD model is tested by selecting the user channel input model at the middle position of the LOS scenario, and the training curves record the optimal beamforming gain magnitude of the SVGD model during the iteration process. The training curve is shown in Fig. 8 and compared with the DFT codebook with 32 beams and the equal gain combination (EGC) performance upper limit, which assumes that the phase shifter variations are continuous, the number of beams within the codebook is equal to the number of users and the channel information is known. From the training curves, it can be seen that the performance of the beams obtained using the SVGD model is already the same as the optimal beams searched in the DFT codebook at about 3200 iterations. And it reaches 90% of the EGC upper limit at 28500 iterations, and finally stabilizes at a gain of 35.5 at 35200 iterations.

The beams obtained from the SVGD model trained in the LOS scene are shown in Fig. 9 and compared with the EGC optimal beam cluster. Among them, Fig. 9(a) shows the beam obtained by the SVGD model after learning, and the beam map shown in Fig. 9(b) is the result after the optimal beam is solved for the root of the 4th power, which aims at enlarging the details of the paraflap for easy observation. From the figure, it can be seen that the beam obtained by the SVGD model learning can accurately point to the user and approximate the optimal beam pattern in the details of the paraflaps well, which is the reason why the SVGD beamforming can outperform the traditional DFT codebook.

3.2.2

Channel Optimization Analysis

For channel optimization of large-scale MIMO systems, the most important issue is how to simulate the spatial domain characteristics of the target channel, including the horizontal and vertical dimensions, in the test area (TV) using a 3D spherical multi-probe device scheme with comprehensive consideration of the horizontal azimuth and downward inclination angles. The main goal of the SVGD-based dynamic channel model is to get the right power weight allocation using the gradient descent method to replicate the spatial domain characteristics of the target channel.

1)

Channel optimization results for Deep MIMO-based generated LOS scenarios

If the probe weights are valued using the Min-Sum objective function, corresponding to different horizontal arrival angles ϕ and vertical arrival angles θ, the simulation results of the LOS scenario are shown in Fig. 10. It can be seen that most |ρ–ρ| are lower than 0.05, where ρ is the simulated spatial correlation and ρ is the target spatial correlation. The lower |ρ–ρ| is, the smaller the error is. This means that the use of Min-Sum objective function can ensure that for the very majority of the spherical power spectrum corresponding to the spatial correlation, it can well reconstruct its spatial domain characteristics with a small error, but there exists individual angle combinations, |ρ–ρ| which will be higher than 0.06, or even up to 0.1, with a large error. In summary, using the Min-Sum method as the objective function in the SVGD dynamic channel model, the channel optimization of large-scale MIMO systems is better, but there are local near misses.

2)

Channel optimization results for Deep MIMO-based generation of NLO scenarios

The NLO scenario uses minimizing the maximum spatial correlation simulation error (Min-Max) with the following objective function: (40) ${\begin{cases} \min_{ϖ} \max_{i} | ρ_{i} (ϖ) - ρ_{i} | \\ s . t 0 \leq ϖ_{m} \leq 1, \forall m \in [1, M] \end{cases}$ \[\left\{ \begin{align} & \underset{\varpi }{\mathop{\min }}\,\underset{i}{\mathop{\max }}\,\left| {{{\rho }_{i}}}(\varpi )-{{\rho }_{i}} \right| \\ & s.t0\le {{\varpi }_{m}}\le 1,\forall m\in [1,M] \\ \end{align} \right.\]

where i denotes the i nd cluster, which avoids excessive simulation errors for a particular cluster, but results in large total simulation errors across clusters. If the Min-Max objective function is used for SVGD dynamic channel model evaluation, the simulation results of the NLO scenario are shown in Fig. 11. It can be seen that the overall |ρ–ρ| are lower than 0.05, and there is no combination of angles with extremely poor results, which means that although the overall reconstruction effect using the Min-Max objective function will be worse than that of Min-Sum, the method will not have a localized extreme point, and the overall reconstruction effect is smoother.

For the channel estimation problem of massive MIMO systems, this paper takes advantage of the low-rank nature of massive MIMO matrices to model the channel estimation problem as an SVGD-based variational inference optimization model. The virtual antenna pair selection schematic for the spherical anechoic darkroom of the massive MIMO scenario is shown in Fig. 12. The test region is sampled by selecting the positions of antenna pairs u and v from three orthogonal line segments inside the spherical darkroom. This method of taking points by the VAPs biases the weight optimization results towards the three axes, but it cannot provide optimal simulation results for all the sampled points in the test region. In order to avoid this situation, this paper decides to use sample points on the surface of the flat ellipsoidal test region, the reason for choosing the ellipsoid is that there is an inconsistency between the number of probes in the azimuthal plane and the number of probes in the elevation plane.

The virtual antenna pair u and v sample points should be selected at locations symmetric about the center of the sphere, and the test area can be sampled by scanning the ellipsoid surface area for point pairs. The optimized antenna probe power weight values can be obtained by solving the model using the convex optimization tool. M antenna probes are used to complete the reconstruction of the spatial domain characteristics of the multipath channel model by signal synthesis in the central ellipsoidal test region. While the radiated signals on each antenna probe are pre-experienced by the target multipath dynamic channel, the fading of the probes in the same cluster obeys an independent homogeneous distribution. As a result, the statistical characteristics of the superimposed signals in massive MIMO are consistent with the fading of each probe, which ensures the accuracy of channel estimation.

3.2.3

Robustness analysis

The microwave darkroom environment (especially phase measurements) is difficult to keep smooth during long time measurements. As a result, the wave propagation phase is unstable, which will adversely affect conventional spatial deconvolution methods that require phase information.

Figure 13 exhibits the 400 Markov chain Monte Carlo (MCMC) simulations implemented with the reference algorithm and the SVGD algorithm proposed in this paper, where a uniformly distributed phase deviation from -3.5° to 3.5° is added to each simulation. This section follows the simulation parameters for D = 50λ, φ₁ = –26°, φ₂ = –24°, and 10 probe positions. It is clear that the reference deconvolution method presents excellent performance in the ideal case, however its performance deteriorates drastically due to small phase deviations. In contrast, the Stein variational gradient descent (SVGD) algorithm proposed in this chapter is insensitive to phase perturbations, and the fluctuations are all within the 95% confidence intervals of the actual AUT direction maps, especially in the main beam and the first side lob (side lob) region. In summary, it shows that the dynamic channel model based on SVGD proposed in this paper has high robustness.

4

Conclusion

Massive MIMO technology is becoming more important in 5G communications and even in next-generation mobile communications. Compared with traditional MIMO technology, the hundreds of antenna arrays at both ends of the massive MIMO base station make the system energy and spectral efficiency greatly improved, and will provide people with higher-quality and richer communication methods. In addition, due to the huge antenna arrays in the communication system, especially when the massive MIMO technology is applied in the frequency division duplex mode, the channel estimation in the downlink will cause a large problem of guide frequency pollution. At this time, the superior performance channel estimation technique plays a pivotal role. However, there are still many challenges in the application of large-scale MIMO technology, such as the current channel estimation techniques have the shortcomings of low estimation accuracy, large channel guide frequency overhead and high computational complexity. Therefore, to address the shortcomings of the above channel estimation techniques, this paper proposes a channel optimization scheme based on Stein’s variational gradient descent method by taking advantage of the inherent low-rank characteristics of large-scale MIMO channels, and establishes a dynamic channel model by applying the theory of low-rank matrix recovery and the SVGD algorithm to channel estimation problems. Finally, experimental data on fading is provided, and the simulation error between the experimental results and theoretical values is less than 0.05 as a whole, which proves the feasibility and practicality of the method.

Funding:

This research was sponsored by the Beijing Nova Program (No.20240484645).

Langue:: Anglais

Périodicité:: 1 fois par an
Sujets de la revue:: Sciences de la vie, Sciences de la vie, autres, Mathématiques, Mathématiques appliquées, Mathématiques générales, Physique, Physique, autres

RSS Feed de la revue

Dynamic channel model estimation based on gradient descent method and its optimization in massive MIMO

Jinhui Chen

Zhan Xu

Ruxin Zhi

Publié en ligne: 21 mars 2025

Reçu: 26 oct. 2024

Accepté: 10 févr. 2025

DOI: https://doi.org/10.2478/amns-2025-0653

Mots clésStein variational splitting, Gradient descent method, Dynamic channel model, Birth-death channel, FPGA

© 2025 Jinhui Chen et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Mots clés
Stein variational splitting, Gradient descent method, Dynamic channel model, Birth-death channel, FPGA