Otwarty dostęp

Study of terpene biosynthetic pathways in medicinal plants based on big data analysis

, , ,  oraz   
23 wrz 2025

Zacytuj
Pobierz okładkę

Introduction

Terpenoids are one of the most structurally diverse classes of natural products widely found in nature and can be obtained from plants, microorganisms, and marine animals, with over 100,000 terpenoids identified to date. Based on the number of isoprene units that make up the terpene skeleton, terpenoids can be categorized as hemiterpenes, monoterpenes, sesquiterpenes, diterpenes, dibencherpenes, triterpenes, and polyterpenes [12]. Terpenoids play vital roles in living organisms, including participation in the composition of cell membranes, information exchange, and defense against pests and diseases [3]. Meanwhile, terpenoids also have important economic value and are widely used in food, medical and daily chemical industries, etc [4]. It is worth our attention that terpenoids are also important sources of medicines with anti-malarial, anti-tumor and anti-inflammatory effects [5].

The rich and diverse functions of terpenoids are mainly derived from their complex and diverse structures, so analyzing the biosynthetic pathways of terpenoids is of great significance in expanding their applications. Despite the rich structural diversity of terpenoids, their biosynthesis starts from two simple isoprenoid precursors, namely isopentenyl pyrophosphate (IPP) and its double bond isomer dimethylallyl pyrophosphate (DMAPP) [67]. And terpene skeleton formation can be achieved via two different substrate activation mechanisms, class I terpene synthases and class II terpene synthases [89]. After a series of modifying enzymes, terpene skeletons are formed into terpenoids with rich structural diversity and multiple biological functions.

Big data analytics technology is a major breakthrough in the development of new drug discovery, which will facilitate the integration and mining of valuable research data. By simulating the drug properties of small molecule compounds, big data analysis technology can select the best simulated compounds for synthetic tests in a shorter period of time, dramatically increase the speed of chemical synthesis route design, and effectively reduce the operation cost [1012]. With the rapid development of modern molecular biology sections such as genomics, proteomics and bioinformatics, and the emergence of high-tech technologies such as high-throughput and high-connotation screening, big data and artificial intelligence, the synthesis and discovery of novel compounds have shown unprecedented prosperity [1315].

Tetko, I. V. et al. discussed the challenges encountered in the application of big data technologies to polypharmacology prediction and phenotypic screening, which requires mining millions of compounds in chemical and biological data through advanced machine learning algorithms for effective exploration [16]. Lusher, S. J. et al. showed that big data technologies offer incredible opportunities for medicinal chemistry research, and that data-driven approaches to medicinal chemistry can help to discover novel drugs and support research programs that were previously impossible [17]. Bhattacharjee, A. K. et al. designed a pharmacophore training machine learning model, which utilizes intelligent augmentation techniques to better enable virtual screening of a small number of compounds, which complements high-throughput screening of specific protein compounds [18]. Zeng, T. et al. proposed a bio-inspired strategy (TeroGen) based on physical simulation and deep learning models to validate the accuracy and efficiency of the synthesis process of terpenoids in the cyclization and decoration phases and to further estimate their synthetic accessibility and chemical interpretation [19]. Balkrishna, A. et al. stated that high-throughput, large-scale analytical techniques can help plant biologists to break through the limitations of plant genomics resources and make biosynthetic pathways of important plant-based medicinal compounds possible [20]. Leferink, N. G. et al. illustrated techniques such as genome mining, computational modeling, high-throughput screening, and machine learning for the predictive engineering of terpene synthases (TSs), which, due to their high degree of functional plasticity, can generate diverse natural terpene structures through sequence-function relationship analysis [21]. Ye, J. et al. successfully screened and identified miRNAs involved in terpene trilactone (TTL) metabolism from a database of compounds by predicting the targets of structural genes involved in biosynthesis using high-throughput sequencing and bioinformatics methods [22].

The study took the ecologically adaptable and widely cultivated Salvia miltiorrhiza as the research object, and adopted high performance liquid chromatography (HPLC) and transcriptome sequencing technology to take the biosynthesis mechanism of jujube pentacyclic triterpenoids as the entry point, and made use of physiological and molecular biological means such as high performance liquid chromatography HPLC, real-time fluorescence quantitative PCR, GC-MS, stable genetic transformation, GUS staining, dual-luciferase activity analysis, and yeast monohybridization, etc., to The spatio-temporal metabolism of triterpenoids in Salvia miltiorrhiza was clarified, and the key genes (ZjFPS and ZjSQS) and transcription factors (ZjMYB39 and ZjMYB4) for triterpenoid biosynthesis in Salvia miltiorrhiza were identified, their functions in triterpenoid biosynthesis were verified, and the mechanism of the transcription factors in the triterpenoid synthesis in Salvia miltiorrhiza was elucidated.

Materials and methods
Experimental principles
Terpenoid synthesis pathways in medicinal plants

Terpenoids are a class of natural hydrocarbon compounds that are widely distributed in plant tissues, with complex skeletons and diverse structures. As active ingredients of medicinal plants, terpenoids have important biological functions and pharmacological activities. More than 70,000 terpenoids have been discovered and identified. According to the number of isoprenoids contained, terpenoids can be categorized as monoterpenes, sesquiterpenes, diterpenes, triterpenes, tetraterpenes and polyterpenes [23].

The biosynthetic pathways of terpenoids in medicinal plants mainly include the mevalonate pathway and the 2-C-methyl-D-erythritol-4-phosphate pathway, and can be divided into three phases: the generation of the terpenoids' co-precursors isopentenyl pyrophosphate and dimethylallyl pyrophosphate, the synthesis of the direct precursor substances, and the terpenoids generation and modification, as shown in Figure 1. The synthesis of terpenoids is mainly influenced by the activity of a series of synthetic enzymes in the synthetic pathway and the expression of the corresponding enzyme genes, which in turn are regulated by upstream transcription factors, thus forming a dynamic regulatory network that crosses each other [24].

Figure 1.

Terpenoid biosynthetic pathways in plants

Transcriptome sequencing technologies

The transcriptome is broadly defined as the sum of all RNAs that can be transcribed from a particular tissue or cell at a certain growth stage or functional state, including coding RNAs (messenger RNAs, ribosomal RNAs, and transporter RNAs) and non-coding RNAs, and narrowly refers to the sum of mRNAs only. Transcriptome research is to study the function and structure of genes at the overall level, to reveal the transcriptional regulation of genes, and to promote the molecular mechanism of complex biological pathways and trait regulatory networks [25]. As a key technology for transcriptome research, transcriptome sequencing (RNA-Seq) is rapidly developing to facilitate the research in this field. Since most of the medicinal plants lack genomic information, and RNA-Seq technology can detect the overall transcriptional activity of any species and analyze it, this technology has been widely used in the study of medicinal plant resources.

Application of Transcriptome Sequencing Technology in the Study of Terpenoids in Medicinal Plants

RNA-Seq technology has become an important technology for the study of medicinal plants because of its high sequencing depth and sensitivity. Through transcriptome sequencing analysis of plants with different tissue parts and induction conditions, key regulatory genes of related metabolic pathways can be effectively discovered and identified.

Experimental materials

In April 2022, Salvia miltiorrhiza, which had been grown in the same environment for 6 years, was selected and cultivated in the experimental field of Salvia miltiorrhiza at the University of Traditional Chinese Medicine using root segment propagation. Fresh danshen roots with root diameters of 3-6 mm were taken from the early stage of accumulation of tanshinones at the end of flowering and the late stage of accumulation of tanshinones 65 days after the end of flowering, respectively, and were immediately put into liquid nitrogen for rapid freezing, and then placed in an ultra-low-temperature refrigerator at -80°C for storage and reserve.

Experimental apparatus and reagents
Experimental apparatus

The apparatus used for the experiments in this chapter is shown in Table 1.

Experimental instrument

Instrument name Instrument Model
Agilent 1290 type super. high performance liquid chromatograph 1290 Infinity
Electron fraction Sartorius BT125D
Kq5200 super KQ5200
Youku. Pure water ULUP-IV-10T
Liquid nitrogen research JXFSTPRP-64
Automatic pressure. Sterilized pan SANYO MLS-3020
coagulator DYY-III 33B
Gels. Like systems GELD.OCEQ
ptera T100™ The.ermal cycler
High density centrifuge D-37520
trace Rese.arch plus
Nanophotm-n60 micro. Volume spectrophotometer N60 Touch
Experimental reagents

The reagents used in the experiments in this chapter are shown in Table 2.

Experimental reagent

Reagent name Reagent type
acetonitrile chromatogram
Phosphoric acid chromatogram
Methyl alcohol Analytical purity
Tiangen polysaccharide polyphenol plant total RNA extraction kit DP441
Nebnext ltram RNA library prep kit #E7530
DNA 1000 assay kit 5067-1504
Agencourt AMPure XP 63881
Agarosey (AGAR sugar) TSJ001
Experimental Methods
Total RNA extraction and detection

Total RNA from roots and leaves of Salvia miltiorrhiza was extracted using TRIzol® reagent according to the instructions.The concentration of RNA was measured by Nanodrop 2000 and the purity of RNA was determined by OD260/OD280.RNA integrity was determined by detecting the brightness of 18SrRNA and 28S rRNA using agarose gel electrophoresis of 1.2%.

Construction of cDNA libraries

The c DNA libraries were constructed using NEBNext® Ultra™ RNA Library Prep Kit for Illumina® using 1μg RNAs for each set of samples. Eukaryotic mRNA was first enriched by magnetic beads with Oligo, then Fragmentation Buffer was added to randomly interrupt the mRNA, and then the first and second strands of c DNA were prepared using the m RNA as a template. c DNA purification was completed, and the purified double-stranded c DNA was finally subjected to end modification, addition of A tails, and ligation of sequencing junctions. Then, the fragment size of double-stranded c DNA was selected by AMPure XP system, and the c DNA library was enriched by PCR. After the library was constructed, the Qubit 3.0 fluorescence quantitative analyzer was used for preliminary quantitative analysis, and the effective concentration should be more than 1 ng/μL, then the Qsep400 high-throughput analysis system was used to check the insertion fragments of the library, and when the insertion fragments reached the expected level, then Q-PCR was used to accurately quantitatively analyze the library's effective concentration (effective concentration of the library >2 nM). The quality of the library was ensured by quantitative analysis using Q-PCR.

Transcriptome sequencing

The cDNA library of Salvia miltiorrhiza was sequenced in PE150 mode (2 × 140 bp) using Illumina Hiseq 2000 platform. Sequencing yielded a large amount of data, which is called Raw Data.Raw Data contains reads from both ends of all c DNA fragments.Quality control of Raw Data is required before data analysis.Raw data in fastq format was first processed by a customized Perl script, which filtered out the low-quality and splice-containing reads, and ultimately yielded The Q20, Q30 and GC contents of the high quality data were calculated and used as the basis for downstream analysis.

Transcriptome data assembly, annotation

Unigene sequence files were obtained by assembling Clean Data using Trinity software, with min_kmer_cov set to 2 in the assembly parameters and all other parameters set to the default values.Trinity is a software specially designed for the assembly of high-throughput transcriptome sequencing data, which can interrupt the Clean Data into smaller fragments, then extend These small fragments are extended and the overlap between fragments is utilized to obtain a fragment collection, and finally Unigene sequence files are obtained by homology clustering and splicing based on De Bruijn plots and sequencing information.

BLAST was used to compare the Unigene sequence files with eight major databases, including NR, KE, Pfam, GO, Swiss-Prot, and KOG/COG/eggNOG, and the threshold was set at E value < 1 × 10-5 to obtain the annotated information of homologous genes and proteins of the Unigene sequences, which were then analyzed in terms of their The function of the Unigene sequence was analyzed.

Determination of saponin and terpenoid content
Determination of Salvia divinorum saponin content

Based on the previous study method, we used high performance liquid chromatography to determine the content of salvia saponins. 1.000g of fresh salvia root was crushed and added to 10mL of 70% ethanol aqueous solution, the mixture was sonicated in cold water for 40min, and then filtered with 0.45 μM filter membrane. The filtrate was collected and the filter residue was extracted in two replicates. The three filtrates were combined and concentrated by evaporation at 40 °C with a rotary evaporator, and the concentrate was blown with nitrogen until dry. The dried extract is dissolved in 5 mL of methanol, and the solution is filtered with a 0.22 μM filter membrane. The filtrate was transferred using a Shimadzu HPLC system in Japan, and a SIL-20ACTH autosampler. The analytical mobile phase was 0.1%v/v, and the gradient elution procedure was 0-13min, 19%B; 12-25min, 23-31%B; 30-60min,29-31%B; 55-65min,30-31%B; 70-80min,31-32%B; 100-110min,32-35%B; 90-110min,35-55%B; 110-120min,55-60%B; 110-120min,60-70%B; 120-135min,60-100%B. The detection wavelength was 180 nm, the flow rate was 1.0 mL/min, and the column temperature was set to 25°C. The salvia saponin standard was purchased from the State Food and Drug Administration and stored in a -20°C freezer. The standard was prepared in methanol solution, and the concentration of salvia diol saponins was Rg1lmg/mL, Re0.70mg/mL, Rf0.2mg/mL, F1l.lmg/mL, Rg20.1mg/mL; Salvia triol saponins were Rb10.2mg/mL, Rc0.21mg/mL, Rb20.21mg/mL, Rb30.2mg/mL, Rd0.3mg/mL, Rh20.23mg/mL. The correlation coefficient of the calibration curve of each standard solution was greater than 0.985. The standard HPLC peak is shown in Figure 2.

Figure 2.

Dadin standard product high performance liquid chromatography

Determination of terpenoid content

Determination of abscisic acid content

1.00g of fresh Salvia miltiorrhiza was washed, ground with liquid nitrogen and quartz sand, and the ground powder was added to 3mL of extraction solvent (acetone:water:acetic acid, 50:10:1 (v:v:v)) and sonicated for 20min.Subsequently, the mixture was centrifuged at 3,000rpm for 10min at 4°C.The supernatant was collected, and the lower precipitate was extracted twice according to the same method, and the combined supernatant was diluted to 8mL with extraction solvent.The abscisic acid in the solution was determined by enzyme immunoassay. The supernatant was diluted with extraction solvent to 8 mL. Abscisic acid in the solution was determined by enzyme immunoassay.

Determination of gibberellin content

After 0.50g of fresh Salvia divinorum root was washed, ground with liquid nitrogen and quartz sand, and the ground powder was added to 4mL of water and sonicated for 20min, then the mixture was centrifuged at 4000rpm for 10min at 4°C, the supernatant was collected, and the lower precipitate was extracted twice according to the same method. The supernatants were combined three times and diluted with extraction solvent to 15 mL. After adjusting the pH of the mixture to 2.8 with 15% acetic acid, the aqueous solution was extracted with an equal amount of ether. The upper organic layer was collected, evaporated and dried at room temperature and resuspended with 1mL of 10% methanol aqueous solution. The erythromycin in the solution was determined by enzyme immunoassay.

Determination of other terpene components

In order to minimize the volatile loss of small molecule terpenoids, we used supercritical CO2 extraction for the extraction of terpenoids and other compounds in Salvia miltiorrhiza. For each run, an 8 cm3 extraction kettle was filled with approximately 3 g of freeze-dried danshen root powder. It was determined by preexperimentation that 20 mL of ethanol was used as the entraining agent, the CO2 flow rate was set at 1.5 L/h, the extraction time was 110 min, and the temperature and pressure in the extraction kettle were set at 45 °C and 300 bar, respectively.

The extracts obtained were analyzed and determined by Shimadzu GC-MS-QP2010 gas chromatography-mass spectrometry system. The GC conditions used were set as follows: the programmed temperature range was 30~310 °C, the temperature increase rate was 5 °C/min, and the temperature was maintained for 10 min after reaching 310 °C. The inlet and detector temperatures were set at 280 °C. The temperature of the sample inlet and detector was set at 5 °C, and the temperature of the sample inlet and detector was set at 5 °C. Helium was used as the carrier gas at a constant flow rate of 1.0 mL/min.A total of 10 μL of supercritical extracted extract was injected into the GC according to a 1:10 chromatographic separation pattern.The MS conditions used were set as follows: ionization voltage, 70 eV; ion source temperature, 220 °C; and mass range, 40-750 m/z.A 14.0 NIST database was used as a reference for the identification of the mushroom compounds by mass spectrometry.

Results and analysis
Transcriptome sequencing analysis and annotation analysis
Sequencing analysis and ab initio assembly

Danshen was divided into 9 samples, and the sequencing results showed that a total of 586.27 Mb raw read length was obtained from the 9 samples, with an average of 68.2 Mb raw read length per sample. The sample sequencing quality indexes are shown in Table 3, after filtering, 574.68Mb clean read lengths and 40.31Gb clean bases were obtained, with an average of 4.48Gb clean bases per sample.The Q20 of the nine samples was above 95%, the Q30 was above 80% on average, and the proportion of clean read lengths was as low as 82.21% and as high as 90.08%, which indicated that the quality of the sequencing was good and could be used for the subsequent research and analysis.

Quality indicators of sample sequencing

Sample Original reading (Mb) Clean reading (Mb) Clean base (Gb) Q20 (%) Q30 (%) Ratio (%)
R1-1 67.92 64.96 4.59 95.35 87.47 90.83
R1-2 70.46 63.74 4.47 95.71 88.72 86.05
R1-3 67.97 64.54 4.55 95.71 88.68 90.2
R2-1 65.48 64.17 4.51 95.76 88.95 92.96
R2-2 65.42 63.69 4.46 95.74 88.78 92.36
R2-3 65.48 63.32 4.42 95.57 88.35 91.74
R3-1 65.48 63.03 4.4 95.61 88.44 91.32
R3-2 65.42 63.56 4.45 95.6 88.31 92.16
R3-3 65.48 63.67 4.46 95.69 88.71 92.25

The quality indexes of transcripts are shown in Table 4. 676255 transcripts were obtained after Trinity assembly of clean read lengths, with an average length of 654 nt and N50 length of 1053 nt.

Quality indicators of transcripts

Sample Genome Total length Mean length N50 N70 N90 GC(%)
R1-1 46420 29653187 467 736 429 180 40.15
R1-2 84358 55318339 500 844 460 181 39.47
R1-3 63630 41105112 484 794 443 180 39.92
R2-1 77553 61553228 631 1161 632 214 40.05
R2-2 99127 72837938 580 1085 552 192 40.19
R2-3 99391 75704223 606 1082 588 209 38.88
R3-1 80400 63786716 632 1167 631 214 40.05
R3-2 73577 58385161 629 1162 631 213 40.09
R3-3 77548 61459178 630 1136 634 218 39.92
Total 702004 - 654 1053 - - -

The quality index of unigene is shown in Table 5, 116731 Uingene were obtained after de-redundancy, with an average length of 978 bases; the values of N50, N70 and N90 were 1456nt, 978nt and 397nt, respectively, and the GC ratios were all over 39.55%, with the highest being 40.26%.

Quality indicators of Unigene

Sample Genome Total length Mean length N50 N70 N90 GC(%)
R1-1 43889 24414215 455 610 401 202 40.23
R1-2 31285 43781698 527 774 486 217 39.59
R1-3 39983 32687437 494 696 448 210 40.05
R2-1 55141 49431632 688 1113 681 270 40.1
R2-2 56317 58566372 618 1041 589 225 40.26
R2-3 41928 61193397 641 1012 605 249 38.89
R3-1 38344 50372121 683 1122 669 262 40.12
R3-2 40886 47304575 676 1106 667 261 40.16
R3-3 43889 49739930 678 1070 663 274 39.98
All-Unigene 116731 118032214 978 1456 978 397 39.55

The results of gene function annotation are shown in Table 6. uingene sequences were compared with NR, GO, Nt, KEGG, COG, InterPro and SwissPprot databases, and the number of unigenes with annotated information was 67,564, which accounted for 57.88% of the total 116,731 unigenes, of which 61,243 were annotated in NR database, accounting for 57.88% of the total 116,731 unigenes, of which 6,243 were annotated in NR database, accounting for 57.88% of the total 116,731 unigenes. unigenes, accounting for 52.47% of the total 116731 unigenes; 27632 unigenes in Nt database, accounting for 23.67% of the total 116731 unigenes; 34326 unigenes in GO database, accounting for 29.41% of the total 116731 unigenes; 52155 unigenes in COG database, accounting for 44.68% of the total 116731 unigenes. unigene of 44.68%; KEGG database annotations 46,542, accounting for 39.87% of the total 116731 unigenes; Swissprot database annotations 41,652, accounting for 35.68% of the total 116731 unigenes; Interpro database annotations 51,432, accounting for a total of 44.06% of the 116,731 unigenes.

Annotation results of gene function

Values Total NR Nt Swissprot KEGG COG Interpro GO Intersection Overall
Quantity 116731 61243 27632 41652 46542 52155 51432 34326 13757 67564
Total ratio 100% 52.47% 23.67% 35.68% 39.87% 44.68% 44.06% 29.41% 11.79% 57.88%
NR database annotation analysis

According to the similarity sequence comparison of closely related species in the NR database, Salvia miltiorrhiza had the highest percentage of unigene to boleback comparison (15322, 26.24%), followed by Lotus root (10212, 14.2%), Grapevine (3156, 3.58%), Walnut (996, 1.21%), Rubber (Hevea brasiliensis) (707, 1.12%), and 12.44% for other species.

Functional classification of unigene
COG functional classification of unigene

The homology classification of unigene was organized through the COG database, and 73,292 unigenes were annotated, and the classification statistics of 73,292 unigenes were analyzed, and the statistical results are shown in Fig. 3. The results showed that unigenes were categorized in COG functions with rich and diverse functional categories, involving participation in many life activities, among which: the general functional prediction category had the highest number of unigenes up to 15,673, which only accounted for 21.4% of the total 73,292 unigenes, which was mainly due to the small number of Huang Cao Wu genes uploaded to the database by the research reports; ranked The second is the signaling mechanism category with 9134, accounting for 12.5% of the total 73,292 unigenes; molecular chaperone function, post-translational modification and protein conversion category with 5891, accounting for 8% of the total 73,292 unigenes; in addition, the unknown function category with 5121 accounting for 7%; and the transcription category with 4910 accounting for 6.7%. The ungene of interest in this study were categorized in the catabolism, secondary metabolism biosynthesis and transport category, with 2,109 or 2.9% of the total 73,292 unigenes. The smallest number was in the cellular modification category with 108, accounting for only 0.15% of the total 73,292 unigenes.

Figure 3.

Statistical diagram of COG function distribution

GO classification of unigene

Blast2GO analysis software was used to compare all the unigene results from the comparison on NR database to GO gene function database, and the genes with different biological features were categorized as shown in Fig. 4, Fig. (a) (b) (c) are biological process class, cellular component class, and molecular function class, respectively. The results showed that 170089 unigenes were annotated to GO, which were categorized into 3 major classes and 57 subclasses of molecular function, biological process and cellular component, among which 42803 unigenes were in the biological process class, 94,037 in the cellular component class, and 42,803 in the molecular function class.In the biological process class, the number of 14351 unigenes in the metabolic process category, 4426 unigenes in the bioregulation category, and 16722 unigenes in the cellular process category. The lowest number was 4 in the cell killing category. In the cell group classification, the higher numbers were 12118 in the organ group, 14752 in the cell membrane group, 16224 in the cell group, 15121 in the cell part group, 13652 in the cell membrane part group, and the lowest numbers were 2 each in the other microorganisms group and other microorganisms part. Among the molecular functional classes, the higher numbers were 2245 in the transport activity class, 18,261 in the catalytic activity class, and 18,685 in the binding action class, and the lower numbers were 2 in the translational regulatory activity class and 6 in the protein labeling class. This study focuses on the unigenes related to the metabolism of diterpenoid alkaloids, classified in the metabolic process class with 14351 unigenes.

Figure 4.

Statistical diagram of GO function distribution

KEGG classification of unigene

The statistical map of KEGG function distribution is shown in Fig. 5, there are 47106 unigenes annotated KEGG database, which are categorized into 5 major classes and 19 subclasses, among which, the major classes include 2267 cellular processes, 3221 environmental information processing, 11702 genetic information processing, 30678 metabolism, and 1722 organic systems. Among the 19 subclasses, the unigene distribution was high: 3432 unigenes in the folding, sorting and degradation class, 4214 unigenes in the carbohydrate metabolism class, 4345 unigenes in the translation class, 12121 unigenes in the global map class, and the least was only 889 unigenes in the membrane transport class. Diterpenoid alkaloid-related unigenes are classified in the metabolic class of terpenoids and anthrone compounds with 1011 unigenes, and the annotated information and classification of these 1011 unigenes is the focus of the study of diterpenoid alkaloid metabolism, which allows for the identification of unigenes involved in the metabolism of diterpenoid alkaloids. Information from these genes is also an important genetic resource for promoting diterpenoid alkaloid accumulation in future molecular breeding.

Figure 5.

Statistical diagram of KEGG function distribution

Correlation analysis

In order to further explore the relationship between terpenoids and candidate genes, a correlation analysis was done between the terpenoid content and the expression of their candidate synthetic genes. Since Salvia divinorum compounds are the main triterpenoid types in the fruit, we, in order to further explore the relationship between Salvia divinorum and candidate genes, a correlation analysis was done between the Salvia divinorum compound content and the expression of their candidate synthetic genes as shown in Fig. 6. The results showed that salvia korosolic acid, betulinic acid and ursolic acid were significantly and positively correlated with the expression of ZjSQS (evm.model.Contig42.302), ZjCYP450 (evm.model.Contig66.109), ZjHMGR (evm.model.Contig21.0.64), ZjOSC (evm. Model.Contig63.27) and ZjFPS (evm.model.Contig34.195) gene expression showed a significant positive correlation with r-values above 0.5. The above genes also showed positive correlation with the accumulation of other terpenes.ZjHMGR3 (evm.model.Contig73.486), ZjAACT3 (evm.model.Contig63.92), and ZjCYP450/2 (evm.model.Contig108.402) were also correlated with corosolic acid, betulinic acid, ursolic acid, but correlated negatively with other monomer triterpene contents. ZjOSC2 (evm.model.Contig63.19) and ZjSQS2 (evm.model.Contig75.307) expressions were weakly correlated with the content of amygdalic acid, oleanolic acid, oleanolic keto acid, and 3-keto acid ursolic acid, and negatively correlated with corosolic acid, betulinic acid, and ursolic acid. The expression of ZjSQE1 (evm.model.Contig21.0.559), ZjSQE2 (evm.model.Contig57.343), ZjSQE3 (evm.model.Contig11.123), ZjHMGS1 (evm.model.Contig64.0.510) and ZjHMGR2 (evm.model.Contig112.45) expression was negatively correlated with seven triterpenoids. Overall, ZjSQS, ZjCYP450, ZjOSC, ZjFPS, and ZjHMGR were highly correlated with the triterpene content, suggesting that these genes may be the key genes involved in the biosynthesis of danshen triterpenes. The correlation analysis between the triterpene content of Salvia divinorum and the expression of candidate transcription factors is shown in Figure 7. In addition, the expression of transcription factors ZjMYB39 and ZjMYB4 was highly positively correlated with the triterpene content. In addition, ZjWRKY11 expression was highly correlated with the content of american teichoic acid and oleanolic acid, while it was weakly correlated or not correlated with the content of other triterpenes. Other transcription factors showed low or negative correlation with most of the compound contents. Correlation analysis showed that the expression of ZjMYB39, ZjMYB4, and ZjWRKY11 was highly correlated with triterpene content. In summary, we screened five structural genes and three transcription factors that may be involved in the biosynthesis and regulation of triterpenes in Danshen, which will lay the foundation for the next step to study the mechanism of triterpene biosynthesis in Danshen.

Figure 6.

The correlation analysis of the amount of danginseng and triterpenes

Figure 7.

Correlation analysis

Expression of key genes and transcription factors for terpene synthesis

To further understand the expression patterns of key synthetic genes and transcription factors throughout the development of Salvia miltiorrhiza, we analyzed the transcript levels of the six structural genes and the above candidate eight transcription factors in Salvia miltiorrhiza terpenoids and in different tissues and different developmental periods as shown in Figure 8. The results showed that the expression patterns of the six structural genes, ZjAACT, ZjMGR, ZjFPS, ZjSQS, ZjOSC and ZjCYP450, were similar in different tissues of jujube, with all of them having higher expression in buds and young leaves, and relatively lower expression in mature leaves, and that among the different developmental stages, ZjAACT, ZjFPS, ZjSQS, ZjCYP450 ZjAACT, ZjFPS, ZjSQS, and ZjCYP450 were highly expressed in the middle and late stages of Salvia miltiorrhiza (DAP80-110), the expression of ZjHMGR genes in Salvia miltiorrhiza terpenoids reached a peak at DAP80, and ZjOSC was mainly concentrated in the period of expansion (DAP50) and white ripening (DAP80) of 'Qingjian Sour Jujube'. The expression patterns of the screened transcription factors during the development of Salvia miltiorrhiza are shown in Figure 9. The results showed that ZjMYB39 and ZjMYB4 were highly expressed in the middle and late stages of bud and shoot development (DAP80-110).ZjWRKY11 was highly expressed in the buds of Salvia miltiorrhiza, and reached the peak of the transcript level at the expansion stage of fruit development.It is worth noting that ZjWRKY40 was different from other transcription factors in that its expression was mainly in the date flower.

Figure 8.

Candidate structure gene in the dany standard ± standard deviation (sd)

Figure 9.

Expression patterns of transcription factors during the development of Zhi dan sheng

Terpene biosynthetic pathways in medicinal plants

Correlation analysis of genes and metabolites was carried out, and the screened functional genes were correlated with danshen synthesis precursors danshen and de-benzoyl danshen, and finally 21 CYP450, 13 2ODD and 14 UGT genes were obtained as candidate genes.

Based on metabolome analysis and transcriptome analysis, the synthetic pathway of Salvia divinorum was roughly divided into three stages, the first stage was the synthesis of precursors, and a total of 21 genes were screened for possible involvement in the synthesis of isopentenyl pyrophosphate and dimethylallyl pyrophosphate, among which 12 genes encoded key enzymes in the mevalonate pathway, including 1 acetyl-coenzyme A acyltransferase, 2 3-hydroxy-3-methylglutaryl monoacyl coenzyme A synthase, 3 3-hydroxy-3-methylglutarate monoacyl-coenzyme A reductase, 3 mevalonate kinase, 1 mevalonate phosphate kinase, and 2 pyrophosphate-phosphate mevalonate kinase, and 9 genes encoding the methylerythritol-4-phosphate pathway, including 2 deoxyxylulose-5-phosphate synthase, 1 deoxyxylulose-5-phosphate reductase, 2 2-methylerythritol-4-phosphate cytidylyltransferase, 1 2C-methylerythritol-2,4-pyrophosphate synthase, and 3 1-hydroxy-2-methyl-2-butene-4-pyrophosphate synthase; the second stage was the synthesis of the single-paste skeleton, and a total of seven genes were screened for possible involvement in the synthesis of a-pinene, and the key enzymes encoded by the seven genes included four isoprenyl diphosphate isomerase, two geranylgeranyl pyrophosphate synthases, one pinene synthase; the third stage is the synthesis of paeoniflorin catalyzed by a-pinene through a series of modifying enzymes, and a total of 48 genes were screened for possible involvement in the synthesis of Salvia miltiorrhiza, including 21 CYP450, 13 2ODD and 14 UGT genes. The bicyclic structure is the main structure of Salvia miltiorrhiza, and the modification of the moiety is generally carried out on the basis of the integrity of the main structure, therefore, this study speculated that the hydroxylase enzyme carries out the formation of the bicyclic skeleton first, followed by the modification of glycosylation and acylation, and since the sequential order of the glycosylation modification and acylation modification was not completely determined, 2 synthetic pathways could exist in the post-modification stage of Salvia miltiorrhiza. In summary, a complete construction of the ab initio synthesis pathway of Salvia miltiorrhiza was carried out.

Conclusion

Screening obtained 21 structural genes in the terpene biosynthesis pathway and 8 transcription factors that may be involved in the synthesis of Salvia divinorum, which have differential transcript levels during development, and correlation analysis showed that the expression patterns of ZjFPS, ZjSQS, ZjMYB39, and Zj MYB4 are highly correlated with the accumulation of triterpene synthesis in jujube.

ZjFPS and ZjSQS are key genes for date pentacyclic triterpene biosynthesis, and the transcription factors ZjMYB39 and ZjMYB4 are involved in the biosynthesis of danshen triterpenes.

Danshen and de-benzoyl danshenoside were significantly related to 21 CYP450, 13 2ODD, and 14 UGT, and five highly expressed UGT candidate genes were successfully cloned, and vector construction and crude protein extraction were completed.

Anhui Province University Natural Science Research Project “Development and Demonstration Application Research of Quality Identification Technology for Top Ten Wan Medicine (Chanthemum)” (NO: 2023AH052268).

Język:
Angielski
Częstotliwość wydawania:
1 razy w roku
Dziedziny czasopisma:
Nauki biologiczne, Nauki biologiczne, inne, Matematyka, Matematyka stosowana, Matematyka ogólna, Fizyka, Fizyka, inne