Open Access

Signal Processing and Transmission Quality Improvement Strategies in Artificial Intelligence-Assisted Software Radio Systems

, , ,  and   
Mar 17, 2025

Cite
Download Cover

Introduction

Wireless communication equipment has natural advantages in terms of mobility and resistance to destruction, and occupies an important position in military communication. However, for a long time in the field of wireless communications, there are a series of problems, such as the coexistence of a variety of communication systems, the interconnection between the system is difficult, the radio frequency band is becoming more and more congested, the communication system’s anti-jamming capability needs to be strengthened urgently. Obviously, these shortcomings seriously affect the development of wireless communications, the concept of software radio system came into being [1-5]. The concept of software radio system is to take the hardware as the basic platform for radio communication, on the basis of which as much as possible wireless and personal communication and other various communication services to realize the function of software. It gets rid of the basic idea of designing the system for specific purposes, and through a modularized universal hardware platform, the business provided by the system is freed from the long-term dependence on the fixed-line characteristics, and the new product development of different systems is transferred more to the software development, so that the system improvement and upgrading are also very convenient and low-cost, and the different systems can be interconnected and compatible. This is another major breakthrough in the field of wireless communication after analog to digital communication and fixed to mobile communication [6-10]. Signal processing and transmission is the key part of the whole software radio system, in the software radio hardware platform, the function is realized by signal processing technology. Signal processing technology mainly includes detection technology of communication signals, sorting identification of broadband multiple signals and automatic modulation classification identification technology, information decryption technology and efficient processing algorithm realization technology [11-15]. With the rapid development of artificial intelligence technology, signal processing and transmission technology has gradually become a hot topic in the field of artificial intelligence. The current application of AI in software radio systems faces many privacy protection and security problems, such as data privacy leakage, security loopholes in information transmission, and adversarial attacks. Therefore, it is necessary to study the signal processing and transmission quality improvement strategy for such problems, to ensure the security and integrity of information data in the transmission process, in order to ensure the quality of communication at the same time, to promote the progress of the signal processing technology of software radio system [16-20].

The research in this paper focuses on three basic problems of signal classification, noise suppression and channel estimation for software radio systems in complex electromagnetic environments. Firstly, the functional structure of the software radio system is introduced, and then the convolutional neural network and reinforcement learning model in the artificial intelligence algorithm are elaborated. Through artificial intelligence techniques, the trained reinforcement learning model is used to learn the prior distribution of the signal, which is used to calculate the gradient of the MAP objective function in the classification algorithm, and the Q-learning algorithm is used to solve the objective function, estimate the target signal, and realize signal classification. Finally, the superior performance of the artificial intelligence algorithm in terms of signal decoding, BER reduction, and anti-jamming ability is verified through the software radio system USRP RIO simulation experiments.

Methodology
Software radio systems

Software radio is the term used to refer to Software Defined Radio (SDR), which is a radio broadcast communication technology.Its design concept is to construct a universal hardware platform with openness, standardization, and modularity. In software radio based on universal hardware devices, various communication functions are realized through software-defined wireless communication protocols, such as communication frequency bands, communication signal modulation and demodulation, data frame structure, encryption, coding, air interface protocols and so on. The advantages of such a design are numerous:

First, when it is necessary to switch between different communication modes to accomplish different communication tasks, it is only necessary to replace the corresponding software-defined communication function module to meet the requirements, which greatly reduces the communication cost and deployment difficulties.

Second, the communication system needs to add new communication functions, such as new modulation methods, signal coding schemes, etc., only through the software programming to complete the design of new functional modules can be made, so that the update of the system function is more simple.

Figure 1 shows the software architecture of the software radio system, which is divided into four layers from bottom to top: hardware resources, operating environment, wireless applications, and higher-level communication protocols.

Figure 1.

Software architecture of software radio system

Hardware resource layer, including programmable module and analog RF module, the former includes FPGA, DSP, microprocessor, etc., and the latter is mainly the antenna, and these hardware devices work together to complete the physical layer deployment of the communication system.

Operation environment layer, the main functions realized are hardware resource scheduling management, memory allocation, interrupt service, etc.

The wireless application layer, mainly consists of link layer protocols and modem modules.

Higher-level communication protocols include WAP, TCP/IP and other communication protocols.

The four-layer structure unites with each other to accomplish specific communication tasks.

Convolutional Neural Networks

Convolutional neural networks are specially designed for analyzing visual images and are a feed-forward neural network. It is more efficient in forward propagation and significantly reduces the number of parameters in the network. It is based on the principle of recognizing the input data by changing it into classes through the operations of each layer, and the neuron is the smallest processing unit of the neural network [21]. Convolutional neural networks are mainly structured to include an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer. Among them, the output of the convolutional layer is usually used as the input data for the pooling layer. This kind of neural network mainly extracts the structure of the features, the convolution layer is calculated by the local connection between the local receptive field and the input to get the result, this process is equivalent to the process of doing convolution on the feature surface.

Input layer

The input variable’s size is determined by the number of neurons in the input layer, which can be categorized and learned using gradient descent.

The computation of the neural network update parameters can be expressed as: ω=ωηf(ω) \[\omega =\omega -\eta f(\omega )\] where ω denotes the parameters (weights) to be updated, η denotes the learning rate, and f(ω) denotes the gradient Lω$\frac{\partial L}{\partial \omega }$ of the loss function L(ω) about ω.

In practice, the base batch gradient descent (BGD) algorithm calculates the entire input dataset gradient, however, only one update will be performed, and the gradient of the loss function with respect to ω is calculated as follows: f(ω)=1nωi=1nL(f(x(i);ω),y(i)) \[f(\omega )=\frac{1}{n}\nabla \omega \sum\limits_{i=1}^{n}{L}(f({{x}^{(i)}};\omega ),{{y}^{(i)}})\] where L(ω) represents the loss function, n denotes the number of training samples in the dataset, and x(i) corresponds to an objective of y(i). Each iteration derives a derivative for the parameters in the loss function, and updates the parameters in the inverse direction according to the gradient ω.

In general, before the model is trained, the input data needs to be preprocessed in a standardized way, especially if the network contains a fully connected layer, and the input must be unified in terms of dimensional units. The main purpose of this step is to ensure that the convolutional neural network model is able to do the work that needs to be done more efficiently and accurately, otherwise it may have an impact on the performance of the entire neural network. If the dimensions of the input data are different, it can cause the convolutional neural network to converge slowly, converge less efficiently, and take too long to train. If the range of the input data is different, the data with a large relative range will occupy a larger proportion in the training process, and this situation will cause the neural network model to ignore the role of other data, and it is easy to generalize. Secondly, the value range of the input data is determined after the data needs to be pre-processed first to prevent incomplete data information.

Convolutional layers

In convolutional neural networks, the convolutional layer is an essential and important part. It is the core building block of the convolutional neural network. The convolution kernel occupies a very central position in the computation of the convolutional layer. The convolutional layer extracts the features of the input data by performing convolutional operations, and the convolutional kernel moves on the input feature surface.The more layers there are, the more complex the features can be extracted.The deeper the convolutional neural network is, the more complex the computation and the better the learning ability. However, too deep networks may be more prone to overfitting phenomena. In addition, the sliding step of the convolutional kernel is also an important parameter to consider. In general, it is important to ensure that the convolution kernel moves according to the specified sliding step just enough to convolve the entire feature surface. Otherwise, additional processing of this structure is required.

The output of each layer of the convolutional neural network is as follows: Xin=AF(Uin) \[X_{i}^{n}=AF(U_{i}^{n})\] Uin=iMiXin1*kiji+bin \[U_{i}^{n}=\sum\limits_{i\in {{M}_{i}}}{X_{i}^{n-1}}*k_{ij}^{i}\;+\;b_{i}^{n}\] where AF(·) denotes the activation function used in the neural network, Xin\[X_{i}^{n}\] denotes the output feature map of the ith channel of the convolutional layer n, uin$u_{i}^{n}$ denotes the net activation of the ith channel of the convolutional layer n. uin$u_{i}^{n}$ is obtained from the output feature map Xin1\[X_{i}^{n-1}\] of the previous layer (which may not be a convolutional layer) by convolutionally summing it with the convolutional kernel weight matrix kiji$k_{ij}^{i}$ and adding a bias bin$b_{i}^{n}$. where Mi denotes the subset of input feature maps used to compute the net activation and the “*” symbol indicates that a convolution operation was performed.

Pooling layer

The main role of the pooling layer is to carry out secondary feature extraction without losing important information in the neural network, effectively reducing the network complexity by decreasing the data dimensions, so that it can obtain the ability to resist distortion. The output of the pooling layer still needs the activation function for rectification, and the net activation of the ind channel of the pooling layer n can be expressed as: Uin=βinPF(Xin1)+bin \[U_{i}^{n}=\beta _{i}^{n}PF(X_{i}^{n-1})+\;b_{i}^{n}\] where βin$\beta _{i}^{n}$ represents the weight coefficient of the ird channel of the pooling layer n, bin$b_{i}^{n}$ represents the bias of the pooling layer, and PF(·) represents the pooling method used.

Fully connected layers

The fully connected layer is used to connect all the features and send the output values to the classifier. Generally, the fully connected layer is located at the end of the convolutional neural network, and can be one or more in number, and performs processes such as regression integration of features previously extracted by transforming and mapping through the convolutional and pooling layers. Similarly, in the fully connected layer, an activation function is also added, and here the ReLU function is generally used. The output Xn of the fully connected layer n of a convolutional neural network can be expressed in the following two equations: Xn=AF(Un) \[{{X}^{n}}=AF({{U}^{n}})\] Un=ωnXn1+bn \[{{U}^{n}}={{\omega }^{n}}{{X}^{n-1}}+{{b}^{n}}\] where the Un common to both equations represents the net activation of the fully connected layer n, which is computed by weighting ωn and biasing bn the output feature map Xn–1 of the previous layer in the network. ωn represents the weighting coefficients of the fully connected layer n and bn represents the bias of the fully connected layer n.

After processing by the fully connected layer, the input feature data is converted in dimension and the output becomes a one-dimensional vector to perform advanced decision making.

At the end of the last fully connected layer in the convolutional neural network, its output features are passed to the output layer.

Enhanced learning

The reinforcement learning model involves five components: intelligent body Agent, environment, state, action and reward. In the working process of reinforcement learning algorithm, the intelligent body will continuously perceive the environment and choose an action to influence the environment according to its own state. In the complex electromagnetic environment, the intelligent body will obtain the reward brought by the execution of the action, and according to the changes in the environment and its own state changes to continue to select the execution of the action to obtain the reward, the whole process cycle cycle. Reinforcement learning intelligences accumulate experience and learn strategies through continuous trial and error and experimentation until the end conditions are met.

Markov decision-making

Markov Decision Process (MDP) is the basis of reinforcement learning. In Markov Decision Process, the next moment state of the system is only affected by both the current moment state and the action taken at the current moment, independent of other actions or other states.

A Markov decision process can be expressed in terms of a quaternion (S,A,R,P). In the quaternion, S denotes the state space in which the intelligent body is able to switch states, A denotes the action space available for the intelligent body to make action choices, R denotes the rewards that the intelligent body is able to obtain by choosing to perform an action, and P denotes the probability of the intelligent body transferring between states. In the Markov decision process, the states and actions of the system prior to the current moment do not affect the transfer probability P and the reward R obtained by executing the corresponding action at this moment. So the transfer probability P and the reward R, can be expressed as: Pssa=P[St+1=s|St=s,At=a] \[P_{s{{s}^{\prime }}}^{a}=P[{{S}_{t+1}}={{s}^{\prime }}|{{S}_{t}}=s,{{A}_{t}}=a]\] Rsa=E[Rt+1|St=s,At=a] \[R_{s}^{a}=\text{E}[{{R}_{t+1}}|{{S}_{t}}=s,{{A}_{t}}=a]\]

The strategy of an intelligent body is generally denoted by π. The likelihood of an intelligent body performing action a in state s is represented using the conditional probability function π(a|s), which is denoted as: π(a|s)=P[At=a|St=s] \[\pi \left( a\left| s \right. \right)\text{=}P[{{A}_{t}}=a\left| {{S}_{t}}=s \right.]\]

Define the cumulative reward of the intelligent body after executing strategy π at moment t until the end of the action as Gt. The cumulative reward can be expressed as: Gt=k=0γkrt+k+1,γ(0,1) \[{{G}_{t}}=\sum\limits_{k=0}^{\infty }{{{\gamma }^{k}}}{{r}_{t+k+1}},\gamma \in (0,1)\] where γ is the discount factor, which is used to perform a weighting operation on rewards at different moments.

Since the goodness of a strategy cannot be expressed through instantaneous rewards, a form of value function that expresses the long-term impact is derived to indicate the goodness of the strategy.

Markov decision process value functions can be generally categorized into the following two types:

1) State value function Vπ(s), whose expression is in the form of Bellman’s equation with: Vπ(s)=E[Gt|St=s]=Eπ[ k=0γkrt+k+1|St=s ] \[{{V}_{\pi }}(s)=\text{E}[{{G}_{t}}|{{S}_{t}}=s]={{\text{E}}_{\pi }}\left[ \sum\limits_{k=0}^{\infty }{{{\gamma }^{k}}}{{r}_{t+k+1}}|{{S}_{t}}=s \right]\] Vπ(s)=E(rt+1+γVπ(St+1)|St=s) \[{{V}_{\pi }}(s)=\text{E}({{r}_{t+1}}+\gamma {{V}_{\pi }}({{S}_{t+1}})|{{S}_{t}}=s)\]

2) The state-action-value function Qπ(s,a), whose expression is of the same form as the Bellman equation: Qπ(s,a)=E[Gt|St=s,at=a]=Eπ[ k=0γkrt+k+1|St=s,at=a ] \[{{Q}_{\pi }}(s,a)=\text{E}[{{G}_{t}}|{{S}_{t}}=s,{{a}_{t}}=a]={{\text{E}}_{\pi }}\left[ \sum\limits_{k=0}^{\infty }{{{\gamma }^{k}}}{{r}_{t+k+1}}|{{S}_{t}}=s,{{a}_{t}}=a \right]\] Qπ(s,a)=E(rt+1+γQπ(St+1,At+1)|St=s,At=a) \[{{Q}_{\pi }}(s,a)=\text{E}({{r}_{t+1}}+\gamma {{Q}_{\pi }}({{S}_{t+1}},{{A}_{t+1}})|{{S}_{t}}=s,{{A}_{t}}=a)\]

Q-learning algorithm

The Q-learning algorithm is a time-differentiation algorithm based on value iteration, which is widely used in non-deterministic Markov decision-making processes. This algorithm does not depend on the environment model, i.e., the learning and convergence of the optimal policy π* can still be accomplished when the environment model is not known.

In the Q-learning algorithm, the intelligence achieves an optimal strategy by continuously increasing its value in each state, and its ultimate goal is to find a mapping from state-action pairs to Q-values [22]. The result can be represented by a matrix with Ns×Na, Ns denoting the number of states s and Na denoting the number of actions a that can be taken. In the Q-learning algorithm, the Bellman equation is replaced by an iterative process in which, at each time slot, the reward for action a taken in state s is calculated and the corresponding Q(s,a) is updated with the following update formula: Qn+1(sn,an)Qn(sn,an)+α(rn+γmaxaQπ(sn+1,a)Q(sn,an)) \[{{Q}_{n+1}}({{s}_{n}},{{a}_{n}})\leftarrow {{Q}_{n}}({{s}_{n}},{{a}_{n}})+\alpha \left( {{r}_{n}}+\gamma {{\max }_{a}}{{Q}_{\pi }}({{s}_{n+1}},a)-Q({{s}_{n}},{{a}_{n}}) \right)\]

It can also be expressed as: Qn+1(sn,an)(1α)Qn(sn,an)+α(rn+γmaxaQπ(sn+1,a)) \[{{Q}_{n+1}}({{s}_{n}},{{a}_{n}})\leftarrow (1-\alpha ){{Q}_{n}}({{s}_{n}},{{a}_{n}})+\alpha \left( {{r}_{n}}+\gamma {{\max }_{a}}{{Q}_{\pi }}({{s}_{n+1}},a) \right)\] Where, α ∈ [0,1] is the learning rate, the larger the learning rate α, the less retention of effects for previous learning. γ ∈ [0,1] is the discount factor, which is used to balance the weights of short-term and long-term gains. Qn(sn,an) denotes the value of Q at the current moment, sn+1 denotes the next state to which the action an is transferred by performing the action in the sn state. rn denotes the reward obtained by performing the action an in the current state, rn + γmaxaQπ(sn+1,a) denotes the estimated reward value for this Q function, and Qn + 1(sn,an) denotes the updated Q value.

The execution is repeated until the intelligence reaches the target state before stopping this training cycle.

Modeling of complex electromagnetic environments

In wireless communication, the wireless signal is sent out through the transmitter through the complex electromagnetic environment to the receiving end, the process will be affected by many factors, in addition to the terrain, moving speed and other factors, but also may be subject to communication signals with each other before the interference. Even from the malicious interference of the enemy, which needs to make the communication system has a stronger multipath and interference resistance, and the design of many parameters of the wireless communication system also stems from the actual application of the system scenarios. In this section, several models are considered for the software radio communication system (SDR) in this paper, including the fading channel model and interference model.

Fading channels

Fading channel is a channel model that simulates wireless communication signals that experience multipath effects while propagating. In the wireless signal propagation process, the signal propagates from the transmitter to the receiver through multiple paths, which may have different propagation distances, reflections, scattering, and diffraction, resulting in a change in the received signal, i.e., multipath effect.

The channel considered in this paper is a three-path channel, and the main path is the Rice channel, which is characterized by the existence of a strong direct path in the transmission of the main path, and the rest of the paths are consistent with the Rayleigh distribution, which is the distribution of the modulus of the complex Gaussian distribution X+iY with zero-mean and independently homogeneous distributions in both the real and the imaginary parts, and the probability density function of which is: fz(z)=zσ2ez22σ2 \[{{f}_{z}}(z)=\frac{z}{{{\sigma }^{2}}}{{e}^{-\frac{{{z}^{2}}}{2{{\sigma }^{2}}}}}\] where Z is the mode of the complex Gaussian distribution X+iY and σ2 is its variance. The Rice distribution represents the general case of the Rayleigh distribution, where the means of X and Y are no longer zero and are taken as mi and m2, respectively, and its received signal is a superposition of the complex Gaussian signal and the direct component, and the probability density function of the Rice distribution is: fX(x)=xσ2I0(sxσ2)ex2+s22σ2 \[{{f}_{X}}(x)=\frac{x}{{{\sigma }^{2}}}{{I}_{0}}\left( \frac{sx}{{{\sigma }^{2}}} \right){{e}^{-\frac{{{x}^{2}}+{{s}^{2}}}{2{{\sigma }^{2}}}}}\] where s=m12+m22$s=\sqrt{m_{1}^{2}+m_{2}^{2}}$, I0(·) are the zeroth-order Bessel functions for the first type of correction. The Rice factor is: K=s22σ2 \[K=\frac{{{s}^{2}}}{2{{\sigma }^{2}}}\]

The signal passing through the Rice channel can be represented as: r(t)=h(t)os(t)+n(t) \[r(t)=h(t)os(t)+n(t)\] Where, r(t) is the received signal, h(t) is the channel coefficient, s(t) is the transmitter output signal and n(t) is the noise. The specific parameters used for subsequent simulations are shown in Table 1.

Fading channel simulation parameters

Channel parameter Parameter value
Rice factor 50
Maximum path delay (us) [0, 2.5, 5]
Maximum Doppler shift (kHz) 20
Average path gain (dB) [0, -10, -10]
Signal interference

The interference to wireless communication signals during transmission can be categorized into intentional and unintentional interference. Unintentional interference mainly consists of two kinds, one is the interference caused by natural climatic phenomena in the transmission process, and the other is the interference unintentionally generated by other electronic equipment. Natural sources of electromagnetic interference mainly include lightning, solar activity, cosmic rays, and others. Interference unintentionally generated by other electronic equipment. This mainly comes from the fact that when multiple devices are operating simultaneously in the same environment, these devices may be operating in the same frequency band, resulting in interference with each other. This type of interference can usually be considered in the design phase of application scenarios, in advance of the treatment, to reduce the impact of each other.

Intentional interference mainly refers to the enemy specifically generated to disrupt the normal communication of the interference signal. Among them, deceptive interference is to simulate the real signal in the communication process of the other party, so that the other party’s receiver receives the wrong information to interfere with its normal reception. Suppression jamming is the launch of high-power jamming signals capable of covering the target signal, if you know the time the enemy sends the other side as well as the frequency band of the signal transmission, you can generate jamming signals in the time or frequency domain, the target signal suppression, the target signal submerged in the jamming, as a way of interfering with its normal reception.

The next five common signal interferences considered in this paper are described in detail.

single-tone interference

Single-tone interference in the time domain as a single frequency sinusoidal signal, in the frequency domain, all the power is concentrated in a single frequency point, the mathematical model is shown below: J(t)=Aej(2πfjt+φ) \[J(t)=A{{e}^{j(2\pi {{f}_{j}}t+\varphi )}}\] where A denotes the amplitude of the single-tone interference, fJ denotes the frequency point where the single-tone interference is located, and φ is the initial phase.

Multi-tone interference

Multi-tone interference, i.e., multiple single-tone interference superposition of interference signals, its time-domain characteristics of several different frequencies of the superposition of sinusoidal waveforms, in the frequency domain is characterized by the concentration of all the power in a few different frequency points. The mathematical model is shown below: J(t)=i=1NA(i)ej[ 2πfj(i)t+φ(i) ] \[J(t)=\sum\limits_{i=1}^{N}{A}(i){{e}^{j\left[ 2\pi {{f}_{j}}(i)t+\varphi (i) \right]}}\] where N is the number of single-tone interferences included in the multi-tone interference, and A(i),fj(i),φ(i) is the amplitude, center frequency point, and initial phase of the i rd single-tone interference, respectively.

Linear Frequency Sweep Interference

The frequency of linear frequency-sweeping interference varies linearly with time, and its time-domain characteristic is a sinusoidal wave whose frequency varies periodically with time, and its frequency-domain characteristic is a concentrated distribution of power in a continuous frequency band, whose mathematical model is shown below: J(t)=Aej(2πf0t+πμt2+φ),0tTL \[J(t)=A{{e}^{j(2\pi {{f}_{0}}t+\pi \mu {{t}^{2}}+\varphi )}},0\le t\le {{T}_{L}}\] Where, A denotes the amplitude of the interference, f0 denotes the starting frequency, μ denotes the sweep rate, and φ denotes the initial phase. TL represents the frequency sweep period. The cutoff frequency of the linear sweep interference is f0 + μTL, and μTL is the sweep bandwidth.

Partial Band Interference

Partial band interference is known as partial band noise interference, its time domain is characterized by continuous noise, and its frequency domain is characterized by almost all the power is concentrated in a certain frequency band, and its mathematical model can be expressed as follows: J(t)=U(t)ej(2πfJt+φ) \[J(t)=U(t){{e}^{j(2\pi {{f}_{J}}t+\varphi )}}\] where U(t) is the band-limited complex Gaussian noise, fJ denotes the center frequency of the partial band noise interference, and φ denotes the initial phase.

Artificial Intelligence Based Signal Processing Methods
Signal classification

In complex electromagnetic environments where signals are subjected to strong interference, signal classification techniques are required to extract the target signal from the mixed signals. Single-channel mixed signal classification is a key problem in the field of wireless communications. It involves recovering multiple independent signal sources from a single received signal. In this process, Bayesian models are widely used for their ability to provide a natural probabilistic framework to characterize the uncertainty and complexity of signals. Through the minimum mean square error (MSE) or maximum a posteriori probability (MAP) criterion, the a posteriori probability distribution of the unknown variables can be estimated to achieve effective classification of signals.

In this paper, through artificial intelligence techniques, the trained reinforcement learning model is used to learn the prior distribution of the signal, which is used to calculate the gradient of the MAP objective function in the classification algorithm, and the Q-learning algorithm is used to solve the objective function, estimate the target signal, and realize the signal classification.

Noise suppression

In order to make full use of the features of the modulated signal in different transform domains to enhance the ability of the convolutional neural network model to eliminate noise, this paper proposes a noise reduction algorithm that fuses the features of the signal in the time domain and time-frequency domain, using two convolutional neural networks to extract the deeper features of the time-domain signal and time-frequency signal respectively, and achieve more effective noise reduction through feature fusion.

First, the time-domain signal to reduce noise is preprocessed. The time-domain IQ signal is transformed into the real and imaginary part of the time-domain dual-channel real signal, the time-domain one-dimensional signal is transformed into a two-dimensional time-frequency map by discrete short-time Fourier transform (DSTFT), and the complex time-frequency map is transformed into the real and imaginary part of the dual-channel real time-frequency map.

Next, the preprocessed data are fed into two different branches, the time domain encoder branch with the IQ signal as input and the time-frequency domain encoder branch with the time-frequency spectrogram obtained after the passage of the IQ signal as input.

Finally, the fused features are fed into the convolutional layers of the two branches in the time domain and time-frequency domain, respectively, to recover the signal after noise removal.

The noise suppression performance is measured using the signal-to-noise ratio gain (SNRG) and error vector magnitude (EVM), with SNRG defined as the difference in SNR before and after noise reduction, which provides a visual measure of the signal quality. Calculation formula: SNRGdB=10lg(1Ni=1N| si |21Ni=1N| sisin |21Ni=1N| si |21Ni=1N| sisi |2) \[SNR{{G}_{dB}}=10\lg \left( \frac{\frac{1}{N}\sum\limits_{i=1}^{N}{{{\left| {{s}_{i}} \right|}^{2}}}}{\frac{1}{N}\sum\limits_{i=1}^{N}{{{\left| {{s}_{i}}-s_{i}^{n} \right|}^{2}}}}-\frac{\frac{1}{N}\sum\limits_{i=1}^{N}{{{\left| {{s}_{i}} \right|}^{2}}}}{\frac{1}{N}\sum\limits_{i=1}^{N}{{{\left| {{s}_{i}}-s_{i}^{\prime } \right|}^{2}}}} \right)\] where s denotes the noise-free modulated signal, s denotes the modulated signal after noise reduction, and s denotes the modulated signal before noise reduction.

The error vector magnitude can be a comprehensive measure of the amplitude error and phase error of the modulating signal, defined as the ratio of the mean square deviation of the error vector signal to the mean square deviation of the ideal signal, and expressed as a percentage. The calculation method is: EVM=1Ni=1N| sisi |2×100% \[EVM=\sqrt{\frac{1}{N}\sum\limits_{i=1}^{N}{{{\left| {{s}_{i}}-s_{i}^{\prime } \right|}^{2}}}}\times 100%\]

Channel estimation

In the presence of complex electromagnetic interference in software radio communication systems, estimation methods that only consider signal and noise are not directly applicable, but data-assisted artificial intelligence-based models can be applied to channel parameter estimation in interference environments.

In the environment with complex electromagnetic interference, the sequence model of the signal at the receiving end of the software radio system after sampling can be expressed as: r(n)=as(n)+w(n)+J(n) \[r(n)=a\cdot s(n)+w(n)+J(n)\] where r(n) denotes the received signal, s(n) is the data signal, a is the attenuation caused by passing through the channel, w(n) is the noise, and J(n) is the interference signal. When the data signal is a known sequence of mode 1, the following equation can be obtained: E(r(n)s(n))=a+(E(w(n))+E(J(n)))E(s(n)) \[E(r(n)s(n))=a+(E(w(n))+E(J(n)))E(s(n))\] E(r(n))=aE(s(n))+(E(w(n))+E(J(n))) \[E(r(n))=aE(s(n))+(E(w(n))+E(J(n)))\]

The association immediately yields an estimate for attenuation a: a^=E(r(n)s(n))E(r(n))E(s(n))1E2(s(n)) \[\hat{a}=\frac{E(r(n)s(n))-E(r(n))E(s(n))}{1-{{E}^{2}}(s(n))}\]

Once the attenuation estimate is obtained, the data signal can be stripped from the received sequence: x(n)=r(n)a^s(n) \[x(n)=r(n)-\hat{a}s(n)\] Where x(n) is the set of noise and interference signals. The sequence X(k) is obtained by applying FFT to x(n) and sorting it according to amplitude, and by dividing X(k) into m segments, each of which is li,i = 1,2,…,m long, the power value of each segment of the sequence can be calculated as follows: Pi=j=0li1| X(j) |2li,i=1,2,...,m \[{{P}_{i}}=\frac{\sum\limits_{j=0}^{{{l}_{i-1}}}{{{\left| X(j) \right|}^{2}}}}{{{l}_{i}}},i=1,2,...,m\]

Since the Gaussian white noise power bands are independent of each other and the spectrum is flat, the power of the undisturbed band is smaller than that of the disturbed band, so the smallest k segments are selected from the m segment power and averaged as the noise power, and the interference power is the x(n) power minus the noise power. The number of selected segments will have an impact on the estimation of noise power, the number of small segments will make the estimate low, the number of segments will make the estimate high, based on the typical interference bandwidth in this paper is not more than half of the total bandwidth, this paper selects m/ 2 segments as the noise power.

In summary, the overall flow of channel parameter estimation in a complex electromagnetic interference environment is shown in Figure 2.

Figure 2.

Channel estimation process under complex electromagnetic interference environment

Results and discussion
Experimental platform and environment setup
Experimental platforms

In the experiments in this chapter, the main equipment used is the software radio system USRP RIO and the personal host PC.In this experiment, the basic frequency conversion, analog-to-digital/digital-to-analog conversion, and RF driver functional modules use the toolkit that comes with the USRP RIO platform, and the rest of the functional modules are implemented according to the design in this chapter.

The communication system for this experiment operates in the 800MHz spectrum range and divides this 800MHz spectrum into eight 100MHz wide bands. In order to better demonstrate the ability of the designed system to learn a good band selection strategy, single-tone, multi-tone, and swept interference signals are set up in the experiment, where the swept interference signal can scan the 800MHz wide spectrum within 40 seconds. The program for the software part of the experiment was designed and written using Labview2015, and the codes for the PC side and FPGA side of the user, the jammer, the information processing center, and the Q-learning cognitive engine were written respectively. The test band range used for the test system of this experiment is 2.15GHz-2.85GHz. The transmit gain and receive gain at the user side are 0dBm, and the transmit gain at the jammer side is 10dBm.

Environmental settings

The deployment diagram of the experimental system is shown in Figure 3. Due to the limited conditions of laboratory equipment, a user (consisting of a transmitter and a receiver), a jammer, a Q-learning cognitive engine (containing convolutional neural network and reinforcement learning modules, signal classification, noise suppression and channel estimation modules, information processing nodes, and a number of perception nodes) are set up in the experiments in this chapter. During the experiment, the jammer program is adjusted so that it interferes with the user communication with a certain jamming strategy as a way to test the signal decoding, BER reduction, and anti-jamming capabilities of the software radio system.

Figure 3.

Experimental deployment of anti-jamming system based on Q learning

Analysis of experimental results
Complex electromagnetic interference analysis

For the electromagnetic interference model covered in Section 2.4, the above interference signals under certain signal-to-noise ratio conditions are sequentially added to the Gaussian channel in 18.5kbps rate mode for the frequency points within the range of the base hopping frequency band to analyze the hard anti-interference capability of the test system and the performance improvement of the link after interference detection and suppression have been added. The false alarm probability of the link under several thresholds is shown in Table 2, and this paper takes the false alarm rate below 10−5 as the criterion, and chooses Th=12 as the threshold for the subsequent experimental simulation.

Link false alarm probabilities with different threshold values

Threshold value Th Average false alarm probability of links
9 4.5 × 10-5
10 3.1 × 10-5
11 1.6 × 10-5
12 6.2 × 10-6
13 2.7 × 10-6
14 7.6 × 10-7
Single-tone interference

In order to obtain the anti-interference performance of the system, one frequency point is randomly selected from 50 frequency hopping points for interference during simulation, and 1000 frames of simulation tests are performed for each signal-to-noise ratio (SNR) test point in the frequency hopping mode, and the obtained performance curves are shown in Fig. 4. It can be seen that the performance of the non-smooth waveform communication link degrades greatly when there is single-tone interference in the communication band. When the single-tone interference exceeds 20 dB, the performance degrades substantially and the BER is around 0.11. However, the BER does not deteriorate significantly again when the interference power is increased further. Because the frequency of single-tone interference is always limited, when the signal frequency hopping to the interference frequency will be completely covered and make the link performance deterioration, but when the frequency hopping to the uninterfered frequency, after filtering, will not cause too much impact on the communication link.

Figure 4.

The 18.5kbps rate mode system anti-single tone interference capability

Multi-tone interference

In the multi-tone interference mode, in order to get the system anti-jamming performance, 5 frequency points are randomly selected from the 50 frequency hopping points of the communication link for interference during the simulation, and 1000 frames of simulation tests are performed for each SNR test point in the frequency hopping mode, and the performance curves obtained are shown in Fig. 5. It can be seen that when multi-tone interference exists in the communication band, the performance of the non-smooth waveform communication link degrades to a great extent. When 10 dB of multi-tone interference is applied and the SNR is -14 dB, the performance deteriorates by 1.91 dB.When the multi-tone interference exceeds 20 dB, performance deteriorates considerably, and it is almost impossible to receive and demodulate normally. Because the applied multi-tone interference itself is in the frequency hopping frequency point, when the power is low, the system can still “hard resistance”. But when the power increases to 20dB, the system is unable to receive normally.

Figure 5.

The 18.5 kbps rate mode system hard resistance to multi-tone interference capability

Partial band interference

In the partial-band interference mode, in order to obtain the system anti-jamming performance, the partial-band interference of 500KHz bandwidth centered on the frequency points of 65MHz, 80MHz, and 95MHz is randomly selected from the basic frequency-hopping band of 60MHz-100MHz of the communication link during the simulation, in which the maximum allowable attenuation of the passband of the filter in the partial-band is 0.25dB, and the minimum allowable attenuation of the stopband is In the frequency hopping mode, 1000 frames are simulated for each SNR test point, and the anti-interference performance curve of the system is shown in Fig. 6. When the interference signal is within 10dB, the link can “resist” part of the band interference to a certain extent, and the performance regression is about 4.3dB. When the band interference signal is greater than 20dB, the link performance deteriorates more, and the system becomes difficult to receive and demodulate normally.

Figure 6.

The 18.5 kbps rate mode system hard resistance to Partial-band interference capability

Anti-jamming decision analysis

In order to better validate the utility of the algorithm for real-time communication, we have conducted real-world communication anti-jamming decision-making experiments on the software radio platform USRP RIO.

The constellation diagrams of the software radio system in both uninterfered and interfered cases are shown in Fig. 7. Figure 7(a) shows the constellation diagram of the receiver in the case of no interference, we can see that the data points mapped in the constellation diagram are more concentrated in two frequency bands, without interference, at this time, the system’s modulation mode is BPSK modulation. Figure 7(b) is the constellation diagram of the receiver when it is interfered, at this time, the data points mapped in the constellation diagram are very decentralized, at this time, the quality of the user’s communication decreases seriously, and it starts to appear lagging, splash screen, and even communication interruption. After the user is interfered by the jammer, the transmission frequency of the system is frequency hopped from 2.2 GHz to 2.1 GHz, i.e., from one channel to another.

Figure 7.

Constellation diagram of a software radio system in undisturbed and disturbed conditions

Fig. 8 shows the BER comparison of the systems adopting different frequency hopping band selection schemes. Among them, Fig. 8(a) is the BER of the system adopting the scheme of determining the frequency hopping band selection, from which it can be seen that the system adopting this scheme can not change the transmission frequency band in time when it encounters interference. In this case, it will lead to an increase in the BER, and the average BER reaches to reduce the quality of communication. Fig. 8(b) shows the system BER graph of the frequency hopping band selection scheme based on convolutional neural network for signal classification and utilizing Q-learning algorithm. The application of reinforcement learning algorithm Q-learning can learn the complex electromagnetic environment model, and after encountering the interference, it can carry out the correct frequency hopping band selection according to the Q-table to realize the effective suppression of the interference frequency.

Figure 8.

Comparison of bit error rates of different frequency band selection schemes

Conclusion

In the field of communication confrontation, it is increasingly difficult to obtain good jamming effect using traditional communication jamming technology. With the continuous development of computer and artificial intelligence technology, communication jamming technology has emerged as a new research direction. Based on this, this paper designs an anti-jamming system framework based on the USRP RIO software radio platform, and further researches the signal processing methods based on various machine learning models such as reinforcement learning and convolutional neural network. Finally, through the analysis of experimental results, it is verified that the communication system based on the proposed scheme can indeed learn the jammer’s interference pattern and achieve the optimized frequency hopping band selection to reduce the probability of being jammed. The test results confirm that the method in this paper meets the expectations and meets the project requirements.

As the research time for the project is relatively short, there are indeed many shortcomings in the research of the project.Therefore, this paper makes some plans and prospects for the future research content. The reinforcement learning algorithm used in this paper is the Q learning algorithm, and the subsequent machine learning algorithms that are more suitable for communication interference scenarios can be searched for or designed to accelerate the learning process and to cope with more complex communication confrontation scenarios. In addition, this paper uses convolutional neural networks for spectrum prediction, and other deep learning methods can be considered to improve the efficiency and accuracy of prediction.

Language:
English