A Practical Study of Creative Stage Performance in the Integration of Vocal Singing and Artificial Intelligence
Online veröffentlicht: 17. März 2025
Eingereicht: 31. Okt. 2024
Akzeptiert: 17. Feb. 2025
DOI: https://doi.org/10.2478/amns-2025-0220
Schlüsselwörter
© 2025 Yu Yu, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
With the continuous progress of science and technology, Artificial Intelligence (AI) has penetrated into all aspects of our lives, including the field of art. Especially in the creation, education and performance of vocal music, the application of AI is becoming more and more widespread, bringing new changes to the vocal music industry and being reflected in stage performances [1-4].
In vocal music composition, the application of AI is mainly reflected in composition and arrangement. Using algorithms,A1 can analyze a large number of vocal compositions, learn their styles and structures, and then create brand new vocal compositions [5-6]. For example, AI composition software can automatically generate vocal fragments, or even entire pieces, based on specified styles, rhythms and melodies. This not only provides inspiration for composers, but also offers the possibility of creating vocal music for people without vocal background [7-9]. The application of artificial intelligence (AI) in vocal music education is also increasing.AI can not only customize personalized learning plans according to students’ learning progress and abilities, but also evaluate students’ performance and give suggestions for improvement through intelligent analysis. In addition, some AI applications can recognize users’ singing or playing mistakes and provide targeted practice materials to help users improve their skills quickly [10-13]. In vocal performance, AI also shows its unique value. Through machine learning technology, A1 can mimic the playing style of famous vocalists, even to the extent of fake to real [14-16]. In addition, AI can also analyze performance data in real time and provide instant feedback to performers to help them improve their skills. In some large-scale vocal concerts, AI has also been used to control stage lighting and sound effects to create a more immersive viewing experience [17-19].
This paper provides an in-depth study of digital media design and the reality of the vocal performance stage based on its foundation. The innovative practice of virtual reality and augmented reality technology in vocal performance stage is proposed. The Unity3D simulation system is constructed, and 3d Max is used to build the scene, characters, objects, photography and lighting and other peripheral environments in the simulation system. Then modify and optimize the built scene content to successfully complete the simulation system. Taking the time axis as the main line of combing, describing the development trend of vocal singing. This virtual reality creative stage is applied to vocal performance, and the realization method of the simulation system is deeply analyzed to explore the stability and effectiveness of the system performance.
With the continuous development of Internet technology, online concerts have become increasingly popular, providing a brand new platform for musical performances. Artists and bands are able to connect directly with audiences around the world through online platforms, breaking the limitations of geography and physical space. Today, online concerts have become an important way for musicians to interact with their audience. Through this format, audiences can enjoy high-quality musical performances from the comfort of their own homes, while providing a way for artists to maintain their prestige and income.
As one of the important research topics in the field of artificial intelligence, virtual reality technology comes with the reproducibility, reorganization, computation, and transmission, the interactivity, immersion, virtuality, and intelligence of the social communication process, the diversity of forms of artistic creation, the non-linearity of artistic narrative, the real-time nature of the information transmission, and the convenience of the creation tools. The application of virtual reality art in the era of artificial intelligence has a rich and colorful performance. The technical relationship is shown in Figure 1, the application of virtual reality and augmented reality technology is bringing music performance into a whole new dimension. Virtual reality technology can simulate realistic three-dimensional music performance scenes, and through VR helmets, listeners can be immersed in an all-encompassing music experience, as if they were actually standing in front of the stage or in a fictional music world, and feel the charm of the music performance.AR technology allows virtual elements to be integrated into real-world scenes, adding extra visual effects to live concerts. With these two technologies, the boundaries of musical performances are expanded, allowing audiences to experience performance forms in the virtual world that are not possible in real life.

New technical schematic diagram
The rise of live-streaming platforms has provided another form of new platform for music performances. Platforms such as Twitch, Instagram Live, and Facebook Live allow artists to interact with their audience in real time, not just by performing music, but also by answering questions, sharing the creative process, and even improvising. In this interactive experience, musical performance practices take on a distinctly contemporary character.
In the art of contemporary musical theatre, the integration of virtual reality technology has brought revolutionary changes to the construction of stage space. The virtual reality stage space not only expands the boundaries of traditional stage art, but also creates an unprecedented interactive and immersive experience for the audience and actors. In this process of technology-driven innovation, three core elements – “virtual scenes”, “action spaces”, and “viewing spaces” – become key to building a virtual reality stage space. Virtual reality technology makes “virtual scenes” possible, reshaping the audience’s perception of the stage space through a highly realistic three-dimensional environment.This technology can not only accurately reproduce scenes in the real world, but also create a fictional world that completely transcends reality according to the creator’s imagination. This spatial flexibility and expandability greatly enriches the expressive power of the musical and the depth of the audience’s experience. The concept of “action space” has been newly interpreted on the virtual reality stage. In the virtual reality environment, the actor’s performance is no longer limited by the constraints of physical space, the action space is greatly expanded. In this environment, actors are able to move and perform freely in a wider virtual space, and at the same time interact with virtual elements, bringing more possibilities and dynamics to the performance. The “viewing space” occupies a crucial position in the construction of virtual reality stage space. This concept reshapes the traditional viewing experience, enabling the audience to freely choose their viewing position and angle in the virtual environment, and even interact with the performance. This personalized viewing experience not only enhances the immersion of the audience, but also allows each audience member to enjoy a unique performance experience. Through an in-depth discussion of these three core elements, we are able to more fully understand how virtual reality technology has redefined the stage space in the art of musical theater and opened up a new field of artistic exploration. This not only demonstrates the infinite possibilities of the integration of technology and art, but also points out a new direction for the future development of musical theater art.
Unity3D is a virtual reality presentation by Unity Technologies Unity3D can be said to be the overthrow of the traditional way of presentation, growing the expressive power, and most of all to be able to benefit the colleagues of the working group in the practical application, saving time and at the same time to be able to see the realistic effect. Under this demand, we use modern technology, combined with the choreography, of course, at the same time it also brings challenges to the work of the choreographer, to be very detailed consideration of every little detail, at the same time this is their opportunity to perform.
3dsMax is used to build the scene, modeling, character modeling software, as well as photography and lighting, etc., can be more realistic to show the required scene content. (1) Simulation model construction: the modeling lines have been done with the “extrusion” option, to be able to turn the line into a body. (2) props simulation construction: the grass, trees, balconies, chairs, etc. to build a collection of grass, trees, balconies, chairs, and then modify the content. To use 3dsMax system TreeStorm to simulate the simulation. (3) task simulation: at first, we should start from the skeleton structure, and then some points connected to the line using the command to build into the surface. (4) In the creation panel of 3dsMax, then take the light and camera projection, and edit the surface.
Not only need to build the stage three-dimensional three-dimensional model, but also need texture data. That is, by establishing the one-to-one correspondence between the spatial coordinates of the feature and the spatial coordinates of the texture, the two-dimensional feature texture image is mapped onto the surface of the three-dimensional feature model. The essence is to use the 2D plane image to replace the simulatable or non-simulatable details in the feature model. Typically, 2D texture data can be defined as an object with one gray value in texture space, either by analytic representation of a mathematical function or by discrete definition using a digitized image. Mostly, we realize the affine relationship between 2D texture data and 3D object space by affine variation.
For a larger number of detail parts, complex structure and easy to increase the number of required surfaces, a mapping texture can be used instead. This not only reduces the overall number of surfaces, but also reduces the overall landscape model complexity and improves the display speed when rendering the landscape model. In order to reduce the number of surfaces and the possibility of errors, the use of Boolean operations and cutting tools should be minimized when modeling fine models. At the same time, make sure that the normal direction of the two face models of Boolean operation is the same, to prevent the generation of bad faces and waste faces. In order to reduce the size of the output file in the later stage, you should try to use as many straight lines and straight faces as possible, and minimize the number of fixed points and faces on curves and surfaces. In order to prevent operational problems with translations and rotations, the model should be changed to stack collapse as much as possible when exporting the 3D model. Delete redundant, unused faces and invisible backs, minimize redundant polygons, reduce the number of 3D model face points, reduce the overall number of 3D scene polygons, and achieve landscape model optimization. The use of the 3DMAX object’s associated copy (Instances) algorithm makes it possible to increase only the number of similar objects without increasing the 3D scene running overhead and the overall number of rectangular face models. However, if you change any of the attributes of a similar object, such as size, pose, material, and texture mapping, the attributes of the other similar objects will change as well. The use of associative copying can greatly reduce the size of the document.
The human-computer interaction stage design process is shown in Figure 2, where the theater uses visual images to realize different sensory experiences. It can also combine infrared human-computer interaction technology, mobile interaction technology, augmented reality technology, so that when the viewer communicates with the screen, it can drive the image and thus form a randomized digital art effect. The introduction of new media technology into the opera performance stage mainly broadens the presentation mode of opera performance. For example, with the help of new media technology, the surround screen outside the theater plant carries out the real-time interaction of the character image in the play following the viewer’s movements. When the walking viewer stays in front of the screen and makes movement changes to the screen, the new media equipment will capture the viewer’s movements, so that the viewer’s movements can be fed back to the screen in a timely manner, and the viewer’s body movements and gestures can be captured by the 3D somatosensory camera, and the new technological art is presented through the new technology. Through the 3D body sensing camera to sense and capture the viewer’s body movements and gestures, the visual image is shown through the new technology and art presentation and data analysis to achieve the human-computer interaction mode.

Human-computer interaction presentation technology flowchart
Inside the theater, there is a large LED screen combined with screen projection to show the real and unreal performance effects, through multimedia, three-dimensional modeling technology, real-time video transmission and display control technology, multi-sensor fusion technology, etc., the scene presented in the plot will be virtualized and superimposed on the real theater space. Through the new media AR technology with the help of computer vision presentation method, establish the mapping relationship between the real world and multiple screens, 3D modeling of the drawn scenes in the play attached to the large screen, 3D image projection in the 2D screen and add positioning of the virtual scenery or characters in the three-dimensional stage space.
In recent years, musical stage art creators are constantly exploring new forms of performance, and are committed to exploring new forms of interpretation so that the immersion of the performance can be fully developed. The development of stage performance forms is shown in Table 1, “Thunderstorm” is one of the classics of Chinese drama performance, which was premiered in December 2, 1934 in Zhejiang Province and has been brought to the stage many times since then, and it is the reserved repertoire of Beijing People’s Art Theatre. The Phantom of the Opera premiered at Her Majesty’s Theatre in London on October 9, 1986 and won seven Tony Awards in 1988, making it one of the most successful musicals of all time. The musical has been a box office hit to this day whenever there is a touring production. The realistic feel of the stage design, the vivid performances of the actors, and the beautiful music keep the audience clapping. Since its premiere in London on May 11, 1981, the musical “Cats” has been translated into more than 20 languages and has been performed in every corner of the world.In 2019, it was re-performed in Kunming, and due to the big ticket sales, the number of performances was increased by several days, and the performance venue of the actors was extended from the stage to the audience, so that the audience could watch the actors’ performances up close and personal.The end of the Kunming tour of the musical “Cats” received a lot of positive reviews. After the tour in Kunming, “Cats” received a lot of favorable comments.
Overview of stage performance development
Performance time | Title | Form of performance |
---|---|---|
1986/10/9 | Phantom of the opera | Musicals |
1934/12/2 | Thunderstorm | Musicals |
1981/5/11 | Cats | Musicals |
2016/12/14 | Sleepless nights | VR Musicals |
2015/10 | Alice adventures adventures | VR Musicals |
2019/4/21 | Three body | 3D Musicals |
The immersive musical Alice’s Adventures in Wonderland was first officially staged in October 2015 at the Watts Theatre, located in Waterloo, London, U.K., and earned an excellent sell-out crowd.The emergence of the immersive show Sleepless Nights on December 14, 2016, explored a new narrative form for the development of stage performances. Immersive musical performances allow the audience to watch actors perform up close and personal, and watch different branches of the plot by following different actors. This approach is more in line with VR movie viewing, where the user can choose which performance they want to see in the open world, and follow different branching routes to see different plot developments. This way of viewing gives the audience more space for choice, transforming the passive mode of viewing into an active collection of information, which greatly mobilizes the audience’s curiosity and desire to explore.The 3D sci-fi stage play “Three Bodies II” premiered in Shanghai on April 21, 2019, and the show extends the stage space by means of digital media technology and light and shadow technology, and some of the scenes in the performance process require audiences to wear special 3D glasses to view the show. The 3D sci-fi stage play is another exploration of stage performance forms, although the form is very innovative, due to the limitations of theater viewing, the placement of stage props in the space affects the viewing experience of the audience in the back rows and the immersive experience. Despite the mixed feelings, the development of the performance form has shown a more inclusive and open state in front of the audience. The development of stage performance presents a more open form with the advancement of technology, and stage creators are constantly integrating new technologies and combining new platforms to find a more marketable interpretation form for stage performances.
In order to verify the effect of this paper’s 3d Max method for designing 3D virtual stage scenes, this paper’s method is used to design a stage landscape. In order to highlight the quality of the stage virtual scene designed by the method of this paper, the stage virtual design method based on Vega and the stage virtual design method based on Web3D are used to compare with the method of this paper. The experiments were conducted using the three methods for virtual simulation of the five sub-models of the stage model, and the comparison of the fidelity of the virtual stage scene designed using each method. The comparison results are shown in Table 2, the average value of the fidelity of the stage virtual design images using this paper’s method is about 24.8% higher than the average value of the fidelity of the stage virtual design based on the Vega method, and about 26.8% higher than that of the stage virtual design method based on Web3D. The fidelity of each sub-model designed by this paper’s method is not less than 90%, which plays a stable role, and relatively speaking, the 3D virtual stage scene designed by this paper’s method is more realistic.
Contrast of fidelity of stage virtual design(%)
Stage submodel | This method | Based on vega’s stage virtual designer method | Web 3d stage virtual designer method |
---|---|---|---|
Master stage model | 96 | 64 | 53 |
Light model | 93 | 73 | 74 |
Curtain model | 97 | 77 | 75 |
Prop model | 95 | 70 | 70 |
Replacement support model | 91 | 64 | 66 |
Since the 2012 Spring Festival Gala, three-dimensional simulation has been applied to all aspects of choreography production. The simulation of stage mechanical movement and the playback of stage background video in a virtual scene combines the two to view the overall running effect of the stage as well as to check the accuracy of the stage video production. In this paper, the proposed virtual reality stage space technology is used to practice in a large vocal singing program. The further added elements of camera control, OB simulation, and actor rehearsal simulation optimize the process of choreography production while achieving a better preview effect.
The average processing time and simulation platform test results are shown in Table 3, the virtual display stage technology test, the input is 27 Maya files, each file size of about 30M. File animation length of about 310 seconds, the final output is required for two data files, respectively, the initial moment of the stage static stage data and stage model motion data. The average CPU utilization rate of each file processing is 48%. The virtual scene simulation technology models the entire CCTV No. 1 studio hall. There are 312 stage lifting platforms, 5 large-scale mechanical arms, 102 aerial suspension and other motion units in the scene. The number of controllable units reaches more than 800, most of which are controlled by the program according to the motion profile.
Systematic test
Categories | Duration (seconds) | Categories | Value |
---|---|---|---|
Model reading | 2.147 | Average frame rate | 54fps |
Model check | 0.896 | Mean memory usage | 1312M |
Data extraction | 21.63 | Average cpas | 48% |
Data processing | 14.796 | Scene loading duration | 10s |
Data output | 0.294 | ||
Total length | 40.597 |
For the LED simulation part, the stage motion module mapping in accordance with the real stage threading UV expansion, after the expansion of the mapping can directly correspond to the uncut video files. Each program requires 21 video files, most of which are about 4 minutes long. The original size of each program video file is 40G, after processing, the size of the video used to play in the 3D environment is about 800M, in the format of Webm. Video parsing is carried out by using a video plug-in specially developed for CryEngine, and the video is pasted onto the 22 maps. For efficiency, the 22 maps were spread over 410 models. The average frame rate reached 54fps.
After the stage movement production and stage background video are completed, the background video needs to be cut and reorganized in-frame before it can be handed over to the broadcast control equipment for display on the real stage. In general, it takes 4 to 8 hours to cut and render a 3-minute vocal program, and with the many modifications to the vocal program, it will take a lot of time to check the video on the real stage after each cut. In order to check whether the background video can be harmonized with the stage motion without any interruption, standard stage motion data and stage background video can be added as inputs to the 3D simulation module for previewing the video before cutting, and generating error lists and change suggestions for the parts that are wrong.
Applying virtual scenes to the stage production cycle not only accelerates the production speed and reduces the number of reworks, but also forms the basis for image review, which becomes the validation basis for each stage. An optimization comparison is shown in Figure 3. The virtual scene can play a validation role in all seven stages, and is an important basis for the evaluation of each stage. In the model animation production stage, the virtual scene can show the stage movement and stage effect in advance, coupled with the stage background video played in the virtual scene, can complete the stage overall preview. In the process of complex choreography production, the virtual scene for each link in the results of the show, the production can play a role of layer by layer check. Each link in the end of the production, whether due to errors in the production of errors, or due to creative not rigorous and produce bad results, can be found in a timely manner, timely modification. Do not have to wait until after the completion of the entire choreography production, in the rehearsal stage only to find the problem, resulting in the overall rework.

The process optimization based on 3d simulation
In this paper, in order to realize the creative stage design of vocal singing and meet the requirements of realism and intelligence of its scene design, 3d Max is used to complete the virtual stage scene modeling and virtual design. This technology is applied to actual musicals, vocal singing programs, etc., and the performance of its system, the degree of realism is compared and analyzed. Compared with the Vega method and the stage virtual design method of Web3D, the 3d Max analog simulation system in this paper does have many advantages, and its fidelity reaches an amazing 90%. The CPU usage is high in all aspects of the practice. The virtual scene can serve as a layer of control for each process of the production, with a good preview effect.