A study of data visualisation methods for reconstructing the symbolic transmission of Aboriginal oral traditions

Data visualization refers to the use of charts, maps, dashboards and other visual elements to transform data into a form that is easy to understand and analyze. In today’s era of information explosion, data visualization has become an indispensable and important tool in the field of data analysis [1–4]. Through data visualization, people can understand the meaning of the data more intuitively and find the connection between the data so as to make more informed decisions, and it is of great significance to use it in the reconstruction of Aboriginal Oral Traditional Symbol Communication [5–8].

Aboriginal oral tradition symbols refer to the history, myths, legends, and living habits passed down in many Aboriginal communities. These symbols are usually passed down from elders to the next generation through oral tradition. For example, the oral traditions of the Aboriginal people of Australia include myths and legends about the origin of the universe, the origin of mankind, and land rights [9–12]. These traditions are very important to Aboriginal communities. They are not only the transmission of history and culture but also an important link to maintaining community cohesion and identity and an important way for Aboriginal people to continue to reproduce and develop. Aboriginal oral traditions are consistent with history as one of the important components of human culture [13–16]. Through oral tradition, people pass down historical events, beliefs and cultural traditions that have survived and developed through different cultures and eras. These stories not only enrich human culture but also provide us with an important way to understand ancient societies, beliefs and cultures [17–20].

This paper analyzes the connotations and characteristics of Aboriginal people, traditional cultural expressions, and Aboriginal oral traditional symbols. The impact of Aboriginal oral narratives on the transmission of traditional symbols is discussed, with three aspects being outlined: symbolic tools, national cultural reproduction, and the cohesion of the national spirit. Proposes digital technology for intelligent translation and recognition of aboriginal oral materials, i.e., a multilingual real-time end- to-end speech translation model, with interactive attention and gating mechanisms added to the model design in order to explore the multilingual types of aboriginal oral narratives. Digital projection technology is proposed to visualize the aboriginal oral data, and the phenomenon of uneven brightness in laser dot-matrix images is optimally handled to obtain the optical center parameter coordinates. Combined with the evaluation system of oral digital communication, the comprehensive score of digital transformation of oral traditional symbols is obtained.

2

Meaning and role of oral modes of communication

2.1

Traditional Aboriginal Cultural Expressions

2.1.1

Aboriginal people

In a narrow sense, “indigenous” is used to describe only minorities living in a country or region dominated by an alien majority. In a broader sense, the United Nations Sub-Commission on Prevention of Discrimination and Protection of Minorities describes Indigenous communities, peoples and nations as “those who, in historical continuity with the societies that developed in their territories prior to occupation or colonization, regard themselves as distinct from the other groups of people who now dominate the territory of the State or part of the State”.

According to these definitions, Indigenous communities can be understood as marginalized people and minorities that were historically colonized but still have their original territories and social systems. Their cultural traditions are thus distinct from what the public generally refers to as traditional or folk culture. The main indigenous communities at present include the Indians of the Americas, the Maori of Oceania, the Sami of Northern Europe, and the Inuit near the Arctic Circle.

2.1.2

Traditional cultural expressions

After years of research and investigation, the World Intellectual Property Organization (WIPO) has established a series of legal terms for the definition and protection of traditional tribal folk culture and art, which is a major theoretical achievement in the international intellectual property legal system. The three main terms used are “traditional knowledge”, “traditional cultural expressions” (TCEs) and “genetic resources”. Among them, “traditional cultural expressions” is also known as “expressions of folk culture and art”. It is defined by the World Intellectual Property Organization (WIPO) as “the product of creations reflecting the characteristics of a traditional cultural heritage developed and held by a community or ethnic group in a country, or by certain individuals representing the traditional cultural expectations of that group”, and includes such forms as oral expressions, musical expressions, and behavioral expressions. These terms are now widely used in the international legal context of intellectual property.

2.1.3

Aboriginal oral tradition symbols

1)

Traditional Folk Dance

Oral history is an emerging concept of educational research in recent years, which effectively crosses the historical divide and cultural boundaries, fully demonstrates the care of humanism, and provides reliable conditions and methods for the inheritance and education of folk dance. Through oral history, Jiangsu folk dance enhances the inheritors’ knowledge and understanding of intangible cultural heritage, and accelerates the dissemination and promotion of folk dance culture.

The cultural content is explained through the oral history method. Inheritance and education of intangible cultural heritage belong to a kind of “group memory”, through which the inheritors spread the cultural content to the next generation by “oral transmission”, thus realizing the reproduction of the cultural content, increasing the richness and wholeness of the cultural content, and improving the cultural value of intangible cultural heritage and folk cultural heritage. The cultural value of intangible cultural heritage and folk culture heritage is enhanced.

2)

Oral literary and artistic works

Oral works, in the legal sense, refer to art forms that are expressed orally and are also a form of language. It is mainly an auditory, temporal and fluid art, which uses language as a means and sound as a carrier, and is inspired, created on the spot, improvised, and instantly finished, and uses orality as the original form of expression of the work.

Combining the definition of folk literature and art works in the legal sense and the meaning of oral works, oral literature and art works of ethnic minorities refer to the works created by an unidentified member of a specific ethnic group, reflecting the cultural characteristics and values of that group. Literary and artistic works that are oral and passed down from generation to generation for an ethnic group are a means of expressing folk art works.

Aboriginal oral works have a wealth of expression, including a riddle or a short story. They contain great truths of life, anthropological backgrounds, perspectives of thinking, and methods of observation. Although there is no fixed expression, they contain attributes that are rich in connotations.

2.2

Influence of Aboriginal oral narratives on the transmission of traditional symbols

2.2.1

Symbolic tools

From the standpoint of semiotics, cultural transmission is based on symbols as the most basic means of communication and exchange. Among them, language is a symbol that is commonly agreed upon and shared by members of the nation. Language stores the memory of the nation and unites the will and centripetal force of the nation. Language serves as the record of national development and progress, the most significant cultural component of a nation, and the expression of national identity. Those oral creations widely circulated in folk culture can reflect the cultural characteristics of a nation from different aspects. Cultural inheritance can be divided into language inheritance, behavioral inheritance, artifact inheritance, psychological inheritance, and other forms of inheritance according to the form of cultural composition. One of the major signs of difference among nationalities is language. Oral archives, which transmit national culture through language, have naturally become symbolic tools.

2.2.2

Reproduction of national culture

Oral archives are responsible for producing and reproducing national culture. As a means of preserving national culture, oral archives are an important witness to national culture and a means of accumulating and disseminating history and culture. For example, the Guoyu and Zuozhuan of the pre-Qin period were written on silk and simplicity through the process of oral transmission, and were continuously processed and embellished. This type of oral archives has become a part of cultural accumulation, cultural inheritance and cultural protection that cannot be ignored.

With the progress and popularity of science and technology, audio-visual and other new media forms of oral archives have become a new force in the inheritance of national culture. The current oral archives, due to their flexible recording methods and diverse means of expression, have advanced cultural transmission from a purely traditional narrative to an advanced form that combines text and digital images. The methods of human expression have been enhanced by the vividness of its images and the operation of the original cultural transmission has been altered. Therefore, oral archives have provided a new platform for the inheritance and development of the cultural symbols of the indigenous peoples.

2.2.3

Cohesion of the national spirit

Oral archives are the source of precipitation and cohesion for the national psyche and national spirit. Oral archives are used to enhance traditional national cultural resources that rely on human memory. These national cultures, whether recorded in words or images, often express a strong national sentiment. The contents of oral archives include historical oral archives, literary oral archives, religious oral archives, ethical oral archives, and folklore oral archives. Many of them embody the psychological consciousness, values, and even emotional tendencies of a people, and together they contribute to the formation of a sense of national identity and cohesion.

3

Digital dissemination of Aboriginal oral tradition symbols

3.1

Changes in the development of oral history brought about by digital technology

1)

Digital technology diversifies oral history collection methods

Digital technology provides more possibilities for the collection of oral history data. Through online video calls, both parties in an oral history interview can be in different spaces at the same time. This greatly facilitates the oral history researcher and saves interview time.

In addition, with the power of digital technology, researchers can organize oral history interviews. Using artificial intelligence technology to recognise a wide range of dialects, interviewers can quickly convert audio recordings into textual information with a high degree of accuracy, which has significantly reduced the workload of oral history researchers. With the emergence of video and audio noise reduction functions, the degree of reproduction of oral history materials has increased, and more accurate information can be preserved for oral history research.

2)

Digital technology makes oral history dissemination visualised

The rapid development of oral history cannot be dissociated from the advanced digital communication methods that rely on fast and effective communication technology. The influence of oral history content has been rapidly improving. The original single boring oral history data can be digitally presented in a wonderful form, the interest in oral history has been enhanced, and more and more people are beginning to pay attention to this kind of historical record.

The majority of traditional oral history materials are based on written records, and their contents are mostly disseminated among oral history researchers. With the development of digital technology, it has been found that oral history materials can be further processed. The boring and abstract textual, historical materials in the past are visualised to restore the historical scenes, and with the help of a unique narrative perspective and emotional, artistic expression, a vivid and graphic historical scroll is presented to the audience so that the audience is completely immersed in the real and virtual fusion of the scene, and participates in the process of oral history as if in the real world, which enhances the sense of involvement of the audience.

3.2

Digitisation techniques

3.2.1

Intelligent translation of oral information

1)

Basics of speech conversion

Speech conversion refers to converting the source speaker’s voice, which contains personalized features, to sound like the target speaker’s voice through a conversion model without changing the information of the speech content.

The process of cross-language speech conversion involves allowing individuals from two or more domains to comprehend each other’s meaning of what they are saying in different languages without altering the semantics. In this paper, cross-language speech conversion is investigated for oral materials in multiple languages, so that the oral materials can be understood by more people and, at the same time, speed up the development of communication with Aboriginal people.

Mel Frequency Cepstrum Coefficients (MFCCs) are the most common acoustic features that do not correspond linearly with frequency in the processing of speech signals. To wit: 1 $M e l (f) = 2595 \log_{10} (1 + \frac{f}{700})$

In equation (1), f refers to the frequency.

The purpose of pre-emphasis is to avoid the loss of energy in the high frequency part and to increase the energy in the high frequency part. In the process of pre-emphasis, a high pass filter is used as: 2 $H (Z) = 1 - μ z^{- 1}$

In Eq. (2), μ∈(0.9,1.0). The input-output equation for the pre-emphasis network is: 3 $x' [t] = x [t] - a x [t - 1]$

Where x[t] represents the t nd sample point of the audio data, a usually takes the value of (0.95,0.99). The speech signal is then windowed, and the commonly used window functions are rectangular window, Hamming window, and Hanning window. They are defined as: 4 $\begin{array}{l} Re c \tan g u l a r w i n d o w \\ w (n) = {\begin{array}{l} 1 & 0 < n \leq N - 1 \\ 0 & O t h e r \end{array} \end{array}$ 5 $\begin{array}{l} H a n \min g W i n d o w \\ w (n) = {\begin{array}{l} 0.54 - 0.46 \cos (2 π n / (N - 1)) & 0 < n \leq N - 1 \\ 0 & O t h e r \end{array} \end{array}$ 6 $\begin{array}{l} H a n n i n g W i n d o w \\ w (n) = {\begin{matrix} 0.5 [1 - \cos (2 π n / (N - 1))] & 0 < n \leq N - 1 \\ 0 & O t h e r \end{matrix} \end{array}$

Where N denotes the window length and n denotes the n rd sampling point.

Frequency domain information is extracted from each windowed audio, fast Fourier transform is performed on each frame of the speech signal, and then the power spectrum is calculated and discrete Fourier transform (DFT) is performed. Namely: 7 $X_{a} (k) = \sum_{n - 0}^{N - 1} x (n) e^{- j 2 π k n / N}, (k = 0, 1, 2, \dots, N - 1)$

In Eq. (7), X(k) denotes the transformed data, x(n) the input speech signal, and N the number of points of the DFT. Obtain the logarithmic energy within each filter, using a triangular bandpass filter as: 8 $\begin{array}{l} H_{m} (k) = \\ {\begin{matrix} 0 & k < f (m - 1) \\ \frac{2 (k - f (m - 1))}{(f (m + 1) - f (m - 1)) (f (m) - f (m - 1))} & f (m - 1) \leq k \leq f (m) \\ \frac{2 (f (m + 1) - k)}{(f (m + 1) - f (m - 1)) (f (m) - f (m - 1))} & f (m) \leq k \leq f (m + 1) \\ 0 & k \geq f (m + 1) \end{matrix} \end{array}$

In Eq. (8), $\sum_{m = 0}^{M - 1} H_{m} (k) = 1$ . A discrete cosine transform (DCT) is done on the logarithmic energy output from the filter bank. The Meier frequency cepstrum coefficient is then obtained as: 9 $c (n) = \sqrt{\frac{2}{M}} \sum_{k = 0}^{M - 1} S (m) \cos (\frac{π n (m + 0.5)}{M})$

In Eq. (9), n = 1,2,3,…,L, 0≤m≤M. M refers to the number of triangular filters and L refers to the Mel spectral parameters. The first order differential MFCC coefficients are: 10 $Δ c (n) = \frac{\sum_{k = - k}^{k} k \cdot c (n + k)}{\sum_{k = - k}^{k} k^{2}}$

2)

Multilingual real-time end-to-end speech translation

End-to-end real-time speech translation is a streaming speech translation system implemented on top of end-to-end speech translation. It is applied in certain scenarios with high real-time performance and ensures that the decoding of the translation starts before the input has been completed, and it aims to generate high-quality and low-latency translations. Compared to the offline end-to-end speech translation model, the main additions are a pre-decision module on the encoding side and a read/write module on the decoding side.

The use of simultaneous training in multiple languages to learn common knowledge among multiple languages has been proven effective in machine translation, i.e., multilingual machine translation. Multilingual end-to-end speech translation is based on an end-to-end speech translation model, which can be achieved on a single model by training multiple language pairs. The multilingual end-to-end speech translation model is mainly divided into three kinds: one-to-many translation, many-to-one translation, and many-to-many translation.

The multilingual speech translation in this paper is for real-time scenarios, and the training method is different from the offline multilingual speech translation, which is highly challenging as it adopts the MULTIPLE-WAY training method in real-time.

(1)

Multilingual real-time speech translation model design

In order to explore the individual characteristics of multilingual real-time language translation, two different multilingual real-time speech translation models are designed in this paper, which are divided into multilingual multi-decoder structure and multilingual single-decoder structure. The read-write model is set up in a wait-k manner, i.e., the coding side first reads in k units packed by speech frames, and after that, for every unit received, different languages are decoded to generate a word, respectively.

The multi-language multi-decoder architecture involves sharing a common encoder for several language pairs and using one decoder for every target language. The acoustic coding side contains several layers of convolutional neural networks and a multilayer Transformer structured coding layer stacked on top of the convolutional neural network layers, which is a set of acoustic coding side parameters shared by all languages, and a different set of parameters for each language in the decoding side.

In the multilingual single decoder structure, there is a common encoder and a common decoder. The acoustic encoder part is as above, and in order to take full advantage of the multilingualism and to reduce the number of parameters of the model, a common set of parameters is used for all target languages at the decoding end.

In order to explore multilingual interaction, we also added interactive attention and gating mechanisms to the experiment.

The interactive attention module is introduced in multilingual real-time speech translation to obtain information about words that have already been generated in another language when decoding the current word. If the current language generates three words in the wait-k approach, the decoding of the other language will also generate three words without finishing. In this case, when decoding the fourth word, we can use the interactive attention module to pay attention to the information of the first three words generated by the other language.

When decoding and generating each word, the attention to the words already generated in the current language through the self-attention module is different from the attention to the words already generated in the other language through the interactive attention module. The purpose of introducing the gating mechanism is to regulate and control the weighting of the self-attention part and the interactive attention in the process of decoding and generating the current word, which is calculated as follows: 11 $H = λ H_{s e l f} + (1 - λ) H_{i n t e r a c t i v e}$ 12 $λ = σ (W_{s} + W_{int e r a c t i v e})$

Here σ is implemented via a sigmod function.

Pre-training is performed using the source speech and the corresponding transcribed text, the pre-training is done offline, and the acoustic encoder parameters of the multilingual real-time model are initialised with the parameters obtained at the acoustic encoder end after the model training for speech recognition has converged. Consistent with the previous section, in order to further reduce the overfitting of the model, the speed perturbation technique and SpecAugment strategy are also used here.

(2)

Multilingual Real-Time Speech Translation Decoder

The real-time speech translation system is trained by extracting features uniformly from all the speech, and the acoustic coding side simulates a real-time scene during computation, which is implemented with one-way self-attention, i.e., the current speech frame can only be attended to this frame as well as previous speech frames. The decoding test is generally real-time speech in real scenes, so it is necessary to design a decoding test tool that supports multi-language real-time speech translation, so as to achieve real-time speech extraction in multi-language real-time scenarios, as well as communication between different target languages.

The overall structure of the real-time speech translation decoder is shown in Figure 1. The multilingual real-time speech translation decoder designed in this paper is based on a client-server architecture, where the server sends the source speech to the client, receives the words returned from the client’s decoding, and evaluates the results of the decoding of each sentence as well as the results of the entire test set after all the decoding is completed. The client follows the given policy. In the read policy, input will continue to be received. In the write policy, it updates the decoding state and sends a delay at the moment.

3.2.2

Scene visualisation of oral history

1)

Visual design presentation

The types of information in oral history are varied, and it can be encoded into various forms for transmission and interpretation. The types of information are mostly texts, exhibits, images, installations, and so on. The information visualisation design starts from the visual elements, and by recording and combining the information and giving it a unique form of presentation, it can enhance the user’s emotional experience while strengthening the efficiency of information transmission.

The visual representation of oral information is mainly based on two elements: text data and image images. And with the development of times, it gradually presents the characteristics of multi-dimensional information and diversified forms. The expression of information visualisation can be either two-dimensional, three-dimensional, or interactive. It can also be a combination of text data, images, and images, in which the main forms of presentation are timelines, maps, statistical charts, graphic illustrations, and digital media.

Digitization refers to the conversion of complex information into a series of binary codes, usually consisting of “bits”, using computer translation techniques. Unlike traditional media, such as optical film-based videos and photographs, digital media is a reinvention, whereby other media that were not digital in the past are digitally translated through computer technology. The advantage of digital is that it can be easily stored and transmitted, and, combined with computer technology, can be transformed and optimized by people. Commonly, there are video restoration and editing, photo cutting and splicing, and so on, all of which are realized on the basis of digital technology.

2)

Digital projection technology composition and working principle

In order to meet the laser projection system in a wide range of three-dimensional space for high-precision projection needs, the first need for laser projection system calibration, in view of the laser projection system cannot actively access the external three-dimensional information, this paper adopts a monocular vision to obtain the three-dimensional information of the projected image, the vibration mirror to assist in the calibration.

Aiming at the influence of various types of noise in the laser spot image on the extraction accuracy of the laser spot, this paper proposes a local adaptive image enhancement algorithm based on morphology to enhance the features of the laser spot image. For the phenomenon of uneven brightness in laser dot-matrix images, this paper adopts the watershed algorithm based on local extremes to extract the laser spot features and obtains the coordinates of the centre point of the laser spot in the coordinate system of the camera image by means of the ellipse fitting algorithm.

In image morphology, expansion is to replace the grey value of the target pixel point in the image with the local maximum value of the region covered by the convolution kernel to increase the coverage of the target feature and expand the image feature boundary, so as to merge multiple discrete image features to achieve the effect of image filling. Its mathematical definition equation is: 13 $A \oplus B = {x, y | B (x, y) \cap A \neq \emptyset}$

Where, A denotes the source image. B denotes the structure of the convolution kernel.

Corresponding to expansion, erosion is the operation opposite to thresholding. Its mathematical definition equation is: 14 $A ▯ B = {x, y | B (x, y) \subseteq A}$

If there is noise in the image, the expansion will increase the noise. Define the difference between the original image and the result of image erosion as the bright spots in the image that have been eroded by defining this P₁ difference as follows: 15 $P_{1} = I - E (I)$

Where I denotes the source image and E(I) denotes the result of the image after the corrosion operation. Similarly, the difference p₂ is defined as follows: 16 $P_{2} = D (I) - I$

Where, D(I) represents the result of the image after the expansion operation. The result P after enhancement is as follows: 17 $P = I + α P_{1} - β P_{2}$

In the formula, the larger the values of α and β, the larger the contrast of the image, but the larger the contrast of the image is not better. The local maxima and local minima of the image computed by the expansion and erosion operations respectively. Therefore, the image bias variance is defined as follows: 18 $V (x) = \frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - e)}^{2}$

Where x_i denotes the pixel value at point i: e is the local extreme value of the image, when solved α, e denotes the local extreme value, when solved β, e denotes the local extreme value, then α and β are defined as follows: 19 $α = \frac{A_{e}}{\sqrt{V_{e}}}, β = \frac{A_{d}}{\sqrt{V_{d}}}$

Where A_c indicates the number of highlights, A_c increases indicating an increase in the bright areas of the image. A_d represents the number of dark areas, and A_d increase indicates the increase of dark parts of the image. The final image enhancement formula is as follows: 20 ${\begin{array}{l} P = I + α' (I - E (I)) - β' (D (I) - I) \\ α' = \frac{A_{e}}{\min (t o p, \max (f l o o r, \sqrt{V_{e}}))} \\ β' = \frac{A_{d}}{\min (t o p, \max (f l o o r, \sqrt{V_{d}}))} \end{array}$

Due to the high brightness of the spot in the laser array, it is mainly needed to attenuate the influence of the blank jumping line without additional enhancement, so the value of A_d should not be too large, otherwise it will make the luminance of the interference information enhancement, so the value of A_d is taken to be 0. The value of A_c is taken to be 30, in order to attenuate the influence of the interference information of the blanking jumping line.

For the input image firstly needs to be segmented into a collection of binary images under different thresholds by a series of successive thresholds to extract the connectivity domain of the binary images under different thresholds, the following equation shows the principle of dual threshold method segmentation. That is: 21 $g (x, y) = {\begin{matrix} 1, & f (x, y) \in [t_{1}, t_{2}] \\ 0, & f (x, y) \in [t_{1}, t_{2}) \cup (t_{2}, 255] \end{matrix}$

Where g(x, y) is the grey value of the input image at point p(x, y) after threshold segmentation, f(x, y) is the grey value of the input image at point p(x, y), t₁ is the low threshold, and t₂ is the high threshold.

In the field of image processing, the grey value of an image can be regarded as a two-dimensional density distribution function, then the characteristics of the image connectivity domain can be described by the moments, and the image moments are defined as follows: 22 $m_{p, q} = \sum_{i = 1}^{N} I (x_{i}, y_{i}) x^{p} y^{q}$

Where m_p,q represents the sum of all pixels in the image, where the pixel value I(x_i, y_i) of each pixel x, y is multiplied by a factor x^py^p. When denoting the zero-order moment m₀₀, this factor is equal to 1, i.e., m₀₀ denotes the region of the image where all the pixel values are non-zero, and hence the zero-order moment of the image can represent the area of the connected domain. Then the centre of mass coordinate $(\bar{x}, \bar{y})$ of the connected domain can be represented by this expected value: 23 ${\begin{array}{l} \bar{x} = m_{10} / m_{00} \\ \bar{y} = m_{01} / m_{00} \end{array}$

Assuming that the connected domain is longer in one direction and shorter in its orthogonal direction, the longer direction of the connected domain can indicate the direction in which the domain is orientated. The moment of inertia E of the connected domain is defined as follows: 24 $E = \sum_{i = 1}^{N} I (x_{i}, y_{i}) r_{i}^{2}$

Where r_i denotes the distance from the i nd pixel point on the connected domain to the rotary axis. From the parallel axis theorem, the rotary axis of minimum rotational inertia crosses the centre of mass $(\bar{x}, \bar{y})$ . Let the angle between the rotary axis and the x-axis be θ, then the linear equation of the rotary axis is: 25 $(x - \bar{x}) \sin θ - (y - \bar{y}) \cos θ = 0$

Then, the distance r_i from the i st pixel point on the connected domain to the rotary axis can be expressed as: 26 $r_{i} = (x_{i} - \bar{x}) \sin θ - (y_{i} - \bar{y}) \cos θ$

Then the moment of inertia E can be expressed as: 27 $E = \sum_{i = 0}^{N} I (x_{i}, y_{i}) {[(x_{i} - \bar{x}) \sin θ - (y_{i} - \bar{y}) \cos θ]}^{2}$

Where the value of the image centre distance μ_p,q is defined as follows: 28 $μ_{p . q} = \sum_{i = 1}^{N} I (x_{i}, y_{i}) (^{x - \bar{x}) p} (^{y - \bar{y}) q}$

Then, substituting Eq. (27) into Eq. (28) gives: 29 $E = μ_{20} \sin^{2} θ - μ_{11} \sin θ \cos θ + μ_{12} \cos^{2} θ$

Therefore, the problem for solving the minimum rotational inertia of the connected domain can be regarded as a quadratic optimisation problem with constraints, and its optimisation model is: 30 ${\begin{array}{l} F i n d & s = {(\sin θ, \cos θ)}^{T} \\ \min & E = s^{T} A s \\ s . t . & 1 - s^{T} I s = 0 \end{array}$

Where $A = [\begin{array}{l} μ_{20} & μ_{11} \\ μ_{11} & μ_{02} \end{array}]$ , I are second order unit matrices. This quadratic optimisation problem with constraints can be solved by the Lagrange multiplier method by first constructing the Lagrange equation with multipliers as: 31 $L = s^{T} A s - λ (1 - s^{T} I s)$

The partial derivative of L with respect to s can be obtained by making 1 zero: 32 $\frac{\partial L}{\partial s} = 2 A s - 2 λ I s = 0$

The above equation reduces to: 33 $A s = λ s$

Substituting Eq. gives the moment of inertia E as: 34 $E = s^{T} A s = λ s^{T} s = λ$

Where λ is the eigenvalue of matrix A. Therefore, the solution of the rotational inertia of the connected domain is equivalent to solving the eigenvalue of the matrix A, where the eigenvector of the matrix A is the direction of the rotational inertia of the connected domain.

In order to achieve the calibration of the laser projection system, it is also necessary to match the light spots in the laser dot matrix collected by the camera with the control points of the laser dot matrix in the laser projection plane. In this paper, image matching of the laser spot array is carried out using the laser spot topology. Furthermore, the spot topology information can be used to further filter out any interference information that remains.

3.3

Feasibility validation of digital communication methods

3.3.1

Spoken Speech Recognition Translation

To verify the effectiveness of the design method, the oral translator speech recognition platform is constructed. The hardware construction of the oral interpreter speech recognition system is carried out through end-to-end speech feature analysis and the network-centric storage model structure analysis method.

On the basis of the above experimental platform, the performance of this paper’s method in aboriginal speech recognition is verified through simulation experiments. The transmission delay should not be greater than 500 ms, the input signal-to-noise ratio of the speech array sensor is -22 dB, the frequency deviation is 0~1500 kHz, the number of snapshots is 1500, the length of the sampling sequence is 2200, and the carrier frequency is 16 kHz. The start frequency of the 1st speech sensing sequence is 100 Hz, and the cutoff frequency is 150 Hz. The start frequency of the 2nd speech signal x₂(t) = sin[2π(100t + 100t²) is 100 Hz, and the cut-off frequency is 300 Hz. According to the above parameter settings, the detected translator speech sequence is obtained, as shown in Fig. 2, and the duration of the speech signal is 0~1s.

Taking the signal sequence of Fig. 2 as the research object, the designed method is compared with A system recognition based on sound signal, B recognition based on LM-BP neural network, and C system recognition method based on HMM to test the accuracy of different methods for speech recognition, and the comparison results are shown in Fig. 3.

The four speech recognition methods were tested for 500 trials, respectively, and it can be seen from the figure that the method in this paper has the highest speech recognition accuracy, and with the increase in the number of trials, the speech recognition accuracy remains stable. The mean values of speech recognition for the system recognition based on acoustic signals, recognition based on LM-BP neural network, HMM-based lineage recognition method, and the method in this paper are 0.78, 0.81, 0.84, and 0.92, respectively.

3.3.2

Projection system visualisation simulation

The oral history materials exhibition’s designers used a range of digital devices, including giant screen projection, display screens, and touch devices, to enable visitors to immerse themselves in the richness of folk culture. In the digital art exhibition “Qingming Riverside Drawing 3.0”, in the part of “Long Scroll of the Sheng Shi”, the designers made thousands of static elements in the original painting into dynamic images, which showed the details of the painting completely and vividly in front of people. Combined with the oral history data of “Qingming Riverside Drawing”, the visualization design expands the information presentation of the exhibition.

The laser 3D projection calibration system mainly consists of a laser 3D projection system (referred to as a projector), a multi-parameter calibration system, and a laser tracker. The calibration wall in the multi-parameter calibration system fixes multiple reflective target heads and the target sphere of the laser tracker. The positions of each reflective target head and target sphere have been calibrated. At the same time, the multi-parameter projection frame is equipped with an iGPS receiver, whose vertex sphere seat and laser tracker target sphere seat have the same function and are also used to calibrate the center parameters of the projector.

In the whole giant screen projection, the projector projection range needs to cover the entire calibration wall, the projector outgoing laser on the calibration wall, reflective target head positioning, the laser tracking instrument on the calibration wall and the target seat on the projector positioning, through the solution of the relationship to get the projector coordinate system is located, i.e., the center of the projector, so as to achieve the optical center parameters of the laser 3D projector, and derive the calibration accuracy.

In this paper, matlab is used to simulate and analyze the accuracy and stability of this calibration model method. First of all, through the actual laser 3D projector working situation in the space arrangement of about 2m × 3m calibration wall, the calibration wall distribution of 10 × 10 target reflective head, in about 3m, 4.5m, 5m at the placement of the laser projector, the distribution of its distribution into the cubic situation, for 50 times simulation experiments.

In the simulation, station 1 out of 50 times, i.e., S1 position, is selected as the typical data enumeration. At the same time, the typical 6 target points out of 10×10 target points are presented. The theoretical target points for station S1 are shown in Table 1 for the position parameters before applying noise. Where H and V are the horizontal and pitch angles of the target point in the coordinate system of the laser 3D projector. The ^WP(x_{P_W},y_{P_W},z_{P_W}) of target point P3 is (0.0001, 0.0005, 0.0001), respectively.

Table 1.

The location parameter of the noise is applied to the point of S1

Target point	H/(°)	V/(°)	x_{P_W}/(mm)	y_{P_W}/(mm)	z_{P_W}/(mm)
P1	-23.6511	22.5402	2153.4266	2342.5112	-7.5422
P2	17.0778	22.1314	-84.2114	2355.0778	-0.0896
P3	15.4503	-19.7563	0.0001	0.0005	0.0001
P4	-22.9044	-17.0423	2201.6022	79.6621	-9.5030
P5	-23.0506	-19.6048	2173.7051	1104.0503	-8.5255
P6	15.5207	4.0035	-45.9354	1296.3561	0.4236

After setting the projector position state, this H and V angle can be solved to obtain, and a random measurement error with mean 0 and standard deviation σ = 1" obeying normal distribution is applied to the H and V corresponding to these 6 groups of target points as the true H and V angles observed by the projector. ^WP(x_{P_W},y_{P_W},z_{P_W}) are the 3D coordinate values of the target points under the world coordinate system {W}. The position parameters of the theoretical target points at station S1 after applying noise are shown in Table 2. Also taking the target point P3 as an example, its position parameters ^WP(x_{P_W},y_{P_W},z_{P_W}) after applying noise are (0.0001, 0.0005, 0.0001), which are consistent with those before applying noise.

Table 2.

The location parameter of the noise after the point of S1

Target point	H/(°)	V/(°)	x_{P_W}/(mm)	y_{P_W}/(™)	z_{P_W}/(mm)
P1	-23.6351	22.5432	2153.4265	2342.5103	-7.5421
P2	17.0241	22.1114	-84.2101	2355.0775	-0.0892
P3	15.4113	-19.7243	0.0001	0.0005	0.0001
P4	-22.9120	-17.0423	2201.6013	79.6612	-9.5029
P5	-23.0174	-19.4120	2173.7002	1104.0501	-8.5254
P6	15.5365	4.0075	-45.9299	1296.3559	0.4233

In the simulation, the Newton iterative algorithm, Levenberg-Marquardt algorithm (L-M algorithm for short) and hybrid particle swarm algorithm are used to solve the observations respectively, and the results of the three solving algorithms are shown in Table 3. In terms of computational speed, the Newton iterative algorithm is the fastest, followed by the L-M algorithm and the BreedPSO algorithm. The observed values of the target point P1 in the Newton iterative algorithm are x_{P_p}/(mm) =-1365.6521, y_{P_p}/(mm) =996.852, and z_{P_p}/(mm) =3215.0231, respectively.

Table 3.

Three solutions to the algorithm

Method	Target point	x_{P_p}/(mm)	y_{P_p}/(mm)	z_{P_p}/(mm)
Newton	P1	-1365.6521	996.852	3215.0231
	P2	968.5905	1253.2011	3126.4504
	P3	996.3712	-1124.5044	3124.5699
	P4	-1352.4241	-1135.0020	3221.2448
	P5	-1342.5005	-1.0033	3386.7504
	P6	966.9867	301.5066	3259.5698
L-M	P1	-1365.2311	996.7960	3215.0135
	P2	968.5652	1253.2536	3126.4321
	P3	996.3286	-1124.5521	3124.5124
	P4	-1352.4469	-1135.0124	3221.2321
	P5	-1342.5125	-1.1254	3386.6562
	P6	966.9164	301.5698	3259.5569
BreedPSO	P1	-1365.1221	996.6563	3215.0031
	P2	968.5600	1253.5466	3126.4564
	P3	996.0251	-1124.0001	3124.0674
	P4	-1352.4006	-1135.4569	3221.0304
	P5	-1342.4511	-1.0586	3386.5897
	P6	966.1253	301.5044	3259.3289

According to the simulation design, 50 simulations are carried out, and the hybridised particle swarm method is applied to solve the calibration parameters and calibration errors in the simulation to obtain the calibration error results for each simulation. The simulation results are in the range of 0.00458 to 0.00926, while the fused RMS results are 0.00315, so the method can eliminate the coarse error and reduce the random error. Among them, the fused coordinates are shown in Table 4, still taking the above six groups of target points as an example. The optical centre calibration parameter of station S1 is $S 1 = [\begin{array}{l} - 0.98691 -0 .03596 0 .019864 963 .5219 \\ - 0.05203 0 .95238 -0 .00091 -1024 .5633 \\ - 0.02494 -0 .00186 -0 .99865 3125 .0426 \\ 0 0 0 1 \end{array}]$ .

Table 4.

Merge the target point coordinates

Target point	x/mm	y/mm	z/mm
P1	2135.6211	2404.6200	-8.7695
P2	-93.7895	2299.0687	-0.1236
P3	-0.00264	-0.00354	-0.0038
P4	2235.7968	84.6359	-10.6591
P5	2178.7942	1124.3628	-7.2305
P6	-52.3629	1293.2354	0.4269

Based on the arithmetic average of the 50 groups of optical centre parameters and the three coordinates of the calibration points under the respective coordinate system is the final optical centre parameter coordinate ^PV(x_{V_p},y_{V_p},z_{V_p}), which is the centre parameter coordinate of the giant screen projection scene.

3.4

Effectiveness of oral digital communication

Oral narratives are an important vehicle for cultural heritage, which exhibit a rich variety of forms of expression. For example, the interview materials of the inheritors of cultural heritage fuse with cultural heritage, which creates an immersive sense of immersion. The secondary creation of cultural heritage is not simply the accumulation of cultural heritage symbols. To achieve good secondary creation, you need to have a certain level of artistic literacy. To create an excellent work with a strong resonance recognized by the market, you have to have a certain understanding and recognition of Chinese cultural heritage, symbols, and images. The dissemination effect of oral materials can be expanded by intelligent translation and the use of projection technology to present them.

According to the communication effect evaluation model, the raw data is normalized by cleaning and organizing the collected oral data. Then, the weight value of each indicator is found so as to complete the assignment of the indicators, and the communication effect evaluation system of the digital transformation of the oral communication mode of the aboriginal people can be obtained. The communication effect evaluation system of the digital transformation of the oral way is shown in Table 5. The communication content dimension (24.47%) encompasses time, place, people, examples, and concepts. The dissemination subject dimension (21.57%) includes the number of people, the popularity of dissemination, and the number of times of dissemination. The dissemination object dimension (53.96%) includes the sentiment index, number of comments, number of views, number of retweets and recommendations, number of favorites, and number of willingness to pay for viewing.

Table 5.

The propagation effect evaluation system of oral mode digitization

The propagation effect evaluation system of oral mode digitization	Communication content dimension	24.47%	Time	7.35%
			Site	7.04%
			Figures	4.13%
			Instance	1.96%
			Concept	3.99%
	Communication body dimension	21.57%	Propagating number	10.20%
			Spread popularity	6.33%
			Propagation number	5.04%
	Object dimension of propagation	53.96%	Affective index	8.31%
			Comment number	9.07%
			Viewing frequency	9.46%
			Forwarding and recommended times	15.45%
			Favorite times	8.27%
			Pay to watch	3.4%

This article was searched on a platform using “Aboriginal oral communication methods” as a search term. As of September 5, 2023, 183 entries were obtained on the platform using “Aboriginal oral communication methods” as the search term. The removal of data that was not related to the content of Aboriginal oral communication methods resulted in 168 entries. The collected data was then crawled from the three dimensions of communication subject, communication content, and communication object dimension to provide data support for the follow-up of this paper.

The digital communication effect of the oral communication method is shown in Figure 4, where the data distribution range is [71.5,75.0], and the mean value of the oral digital communication effect is 73.5 points. Compared to the communication effect of aboriginal oral, the presentation of digital technology promotes the multi-form development of oral communication methods.

4

Conclusion

This paper analyzes the impact of oral narratives of Aboriginal people on the transmission of traditional symbols. It discusses the recent developments in digital technology that have impacted the development of oral narratives and utilizes digital technology to provide intelligent translation and scene visualization of aboriginal oral materials. Retrieves and evaluates the effect of digital transmission of oral narratives in conjunction with digital transformation methods for aboriginal oral transmission. The communication subject dimension, communication object dimension and communication content dimension are divided to evaluate the oral digital communication effect comprehensively, and the oral digital communication effect score is 73.5, which is at a good level. Digital technology provides more ways of expressing Aboriginal oral communication, and it is used to further develop Aboriginal oral materials and pass on traditional symbols.

Language:: English

Publication timeframe:: 1 times per year
Journal Subjects:: Life Sciences, Life Sciences, other, Mathematics, Applied Mathematics, General Mathematics, Physics, Physics, other

Journal RSS Feed

A study of data visualisation methods for reconstructing the symbolic transmission of Aboriginal oral traditions

Qi Chen

Published Online: Feb 03, 2025

Received: Sep 29, 2024

Accepted: Jan 08, 2025

DOI: https://doi.org/10.2478/amns-2025-0039

KeywordsSpeech translation model, Digital projection, Transmission effect evaluation model, Aboriginal oral

© 2025 Qi Chen, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
Speech translation model, Digital projection, Transmission effect evaluation model, Aboriginal oral