Global transcriptomic analysis

To characterize patterns of gene expression during development, 18 libraries were generated using samples from the six developmental stages of St. Adulis (Chart 1). A total of 774.59 million raw reads were generated by Illumina paired sequencing. It is 150 base pairs with a sequential ending. After cleaning and quality checks, 742.29 million clean reads were obtained, with an average of 41.23 million reads per iteration (Supplementary Table S1). More than 77.56% of the readings per iteration can be mapped to a file St. Adulis genome; s30 The percentage of all sequences in 18 libraries was >91%.

shape 1
shape 1

Development stages Sarcomyxa edulis. (aThe pie grows to half a bag. (B) Mcelium in cold stimulation after complete cyst. (c(The appearance of fungi in primordia and primordia,)Dr) Fungi in the harvest stage and the body of the ripe fruit.

Gene expression analysis

FPKM values ​​greater than 1 indicate that the gene is expressed, and higher FPKM values ​​indicate higher expression. The results of the gene expression analysis showed that the average of 18 samples was similar in values ​​and scoring10 FPKM values ​​ranged from 3.82 to 4.59. The highest gene expression levels were at stage B2 (log10 FPKM, 16.13) (Fig. 2; Supplement to Table S2).

Figure 2
Figure 2

Violin map of the horizontal distribution of gene expression (FPKM) in different samples. The coordinate is the sample name and the log10 coordinate (FPKM). The values ​​from top to bottom represent the maximum, the highest quartile, the mean quartile, the lower quartile, and in turn the lowest quartile. The width of each fiddle represents the number of genes under the same expression.

Identification of differentially expressed genes across different developmental stages

The largest number of DEGs was observed between B5 and B6 stages (3,171), followed by B4 vs B5 (2,478) and B1 vs B2 (2,243) (Fig. 3A). These results indicated that the expression profiles of B4 and B6 are significantly different compared to B5. The largest number of unique DEGs was observed in the B5 vs B6 comparison (903). On the other hand, only 153 DEGs were unique compared to B3 versus B4 (Fig. 3b). There were 215 DEGs common to all five comparisons made between the six stages. These were mainly enriched in the GO class response to catalytic activity (GO: 0003824) (Supplementary Fig. S2).

Figure 3
Figure 3

Differentially expressed genes across different developmental stages. (a) Histogram of differentially expressed genes. (B) Venn diagrams of differentially regulated genes.

Functional classification of differentially expressed genes

All DEGs were classified into three categories: biological process (BP), cellular component (CC), and molecular function (MF). The significantly enriched terms for DEGs in the B1 versus B2 comparisons, B2 versus B3, B3 versus B4, B4 versus B5, and B5 versus B6 comparisons were very similar (Supplement. Figure S3A–E).

DEGs were mapped to the Kyoto Encyclopedia of Genes and Genome (KEGG) database for assessment of their functionality. Significantly enriched pathways for DEGs in B1 versus B2 comparisons, B2 versus B3, B3 versus B4, B4 versus B5, and B5 versus B6 comparisons share high similarity. All pathways significantly enriched were mapped to ‘Overview and maps’ and ‘Carbohydrate metabolism’ within ‘Metabolism’ pathways (Supplementary Table S3).

Building gene co-expression networks

The lowest power value was used in subsequent analyzes when the correlation coefficients reached plateau values ​​(or values ​​greater than 0.8; shown to the left of the Appendix. Fig. S4). We determined changes in the mean gene connectivity under different power values ​​(shown to the right of the Appendix. Fig. S4). The minimum power value of 8 was used for the following analysis.

19 modules (marked in different colors, Fig. 4a) were analyzed, each corresponding to a branch of the gene assembly tree. The 19 units were associated with different stages of development, indicating that the expression profiles were specific to the developmental stage. The number of genes in each unit is shown in Figure 4b. The largest number of genes (2312) was observed in the turquoise unit, while the lowest number (one gene) was observed in the gray unit.

Figure 4
Figure 4

Nineteen different units have been identified. (a) Gene assembly number of the co-expression network and standard cutoff. Dynamic Tree Cut is a unit divided according to the results of assembly. Embedded dynamic is the division of a module into embedded modules with similar expression patterns according to the similarity of the module. Subsequent analysis is carried out according to the combined units. In the case of trees, the vertical distance represents the distance between two nodes (between genes), and the horizontal distance is meaningless. (BThe number of genes per unit. The ordinate represents each unit, and the ordinate represents the number of genes.

Correlation analyzes were performed between unit eigenvalues ​​and specific traits and phenotype data to identify units with potentially higher associations with traits and phenotypes. Correlation analyzes were performed between four physiological and biochemical traits (activities of lactase, acid xylase (ACX), cellulase (CL), and lignin peroxidase (LIP)) for samples with the above-mentioned units under different growth stages (Supplementary Table S4). Certain modules were closely associated with physiological and biochemical features (Fig. 5). For example, the blue module was significantly positively correlated with CL and ACX (r = 0.96, r = 0.88, respectively). Furthermore, a significant positive correlation was observed between the dark orange unit and laccase (r = 0.93), laccase and ACX (r = 0.82, r = 0.86), salmon and lip (r = 0.91) unit, and salmonid unit. and Lip (r = 0.91) genes in these four units were further evaluated.

Figure 5
Figure 5

Correlation analysis of common gene expression network units with physiological and biochemical features. The horizontal axis represents different properties, and the vertical axis represents each unit. The red network represents a positive association of physiological traits with loneliness, while the green network represents a negative association.

GO Explanation for target units

We mapped the genes in each module to the GO database (http://www.geneontology.org/), to explore their function further. We counted the number of genes for each term to obtain the list of genes and statistics associated with GO functions. Genes in these four modules were significantly enriched in several GO pathways in BP, MF and CC (Supplementary Fig. S5). They were enriched in ‘catalytic activity’ and ‘splicing’, indicating that WGCNA effectively classifies genes into co-expression units of biological interest. These units were the focus of our subsequent studies.

Screening and functional analysis of key genes in target units

A typical feature of a scale-free network is that most nodes in the network are connected to only a small number of nodes, and only a few are connected to most of them. Therefore, these nodes are the main nodes to be identified, and they contain the so-called hub genes. These hub genes have a high degree of connectivity in their subunits, making them more biologically important than other nodes. Genes with a MM value >0.8 and s<0.01 of the above four modules were screened as hub genes. The blue, dark, and cyan units contain 675, 146, 175, and 131 axis genes, respectively. These hub genes were then compared to the CAZY database (http://www.cazy.org, for carbohydrate enzyme genes)20, reducing the number of key genes to 64. Finally, along with the annotation of the NR, GO and KEGG databases, a total of 12 key genes likely to be associated with lignocellulose degradation showing the highest associations with the target traits, were selected from the four modules (Table 1). Eleven major genes were identified from the blue module, and one major gene was identified from the steelblue module.

Table 1 is a table of basic gene information.

The 12 major genes are shown in Table 1. They were all annotated in the CAZy database, indicating that they all encode carbohydrate enzymes. This further confirmed the robustness of our analysis. Six genes belong to the AA family (extra-activity), and six are from the GH (glycoside hydrolase) family. One to the family of CBMs (carbohydrate-binding units), and one to the family of PLs (Polysaccharide Lyases). Of these, five endoglucanases and two exoglucanases, which are cellulose-degrading enzymes, have been identified.21. There were manganese peroxidase and two lactase, which are fetal-degrading enzymes22. Pyranose dehydrogenase, which acts as a coenzyme for lignin degrading enzyme, has been identified23. There was also one arabinofuranase involved in the hydrolysis of hemicellulose24.

Differential expression of genes related to lignocellulose degradation

The differential expression of the 12 genes described above was analyzed by TBtools (version: 0.674) (https://github.com/CJ-Chen/TBtools/releases)25 (Fig. 6). SE.1A3347 and SE.1A4339 were grouped together and displayed high expression levels in all six stages. SE.A1616, SE.1A1237, SE.1A8861, and SE.1A9306 were grouped together and displayed low expression. When comparing the six growth stages, most genes were relatively highly expressed in A1, indicating that they were significantly associated with lignocellulose degradation. SE.1A8947 and SE.1A9306 are both lacase enzymes clustered together, indicating that they exercise their biological functions together.

Figure 6
Figure 6

Compilation of the chromatogram map of the major genes in the six stages. Expression levels are represented by log2 (FPKM) values ​​after centrality correction. Genes with similar expression patterns cluster together.

Screening and functional analysis of transcription factors (TFs) in target units

Using the filtering method described in the “Sorting and Functional Analysis of Key Genes in Target Units” section, combined with the Plant Transcription Factor Database (http://planttfdb.gao-lab.org/), a total of 37 TFs potentially associated with Lignocellulose degradation was identified from the four modules showing the highest correlations with the target traits (Table 2). They belong to eight types of TF families, namely bHLH (helix-helix-helix core region), bZIP (zip leucine core), C2h2 (The zinc-finger sequence contains a cysteine ​​and two histidine), C3H (Cys3Its zinc finger domain), GATA (proteins that interact with conserved WGATAR motifs), HB (homeo-square), MYB (v-myb avian myeloblastosis viral oncogene homolog) and NF-YB (nuclear factor YB). The bZIP family was the most populous, with 10 transcription factors, and the HB family the least, with 2 transcription factors.

Table 2 Table of basic TFs information.

Validation of DEGs by quantitative real-time PCR (qRT-PCR)

To check the reliability of the transcriptome data, eight DEGs associated with lignocellulose degradation were selected for qRT-PCR analysis (Supplement. Fig. S6). The expression patterns of the eight selected genes assessed using qRT-PCR were consistent with the expression patterns during the six developmental stages. However, some genes differed at certain stages. These discrepancies may be due to systematic differences between transcriptome sequencing and qRT-PCR, which have been shown to have some degree of inconsistency (~30-40%)26. Therefore, the results in general did not exceed the expected range of deviation, as the qRT-PCR results were in agreement with the transcriptome data for most genes.

Leave a Reply