• 2019-10
  • 2019-11
  • 2020-03
  • 2020-07
  • 2020-08
  • 2021-03
  • br Gene set variation analysis


    2.6. Gene set variation analysis (GSVA)
    GSVA is a gene set enrichment method that Luteolin estimates the varia-tion of pathway and biological process activity over a sample popula-tion in an unsupervised manner [31]. The gene set files of “c2.cp. kegg.v6.2.symbols” and “h.all.v6.2.symbols,” downloaded from the “Molecular Signatures Database,” were employed for GSVA using “GSVA” packages for R. The significance threshold was set at an ad-justed P b 0·05.
    2.7. Statistical analysis
    The normality of the variables was tested via the Shapiro-Wilk nor-mality test for comparisons of two groups. The statistical significance of differences between normally distributed variables was estimated using the unpaired Student's t-test, and non-normally distributed variables were analysed via the Mann-Whitney U test. For comparisons of more than two groups, Kruskal-Wallis and one-way ANOVA tests were used as non-parametric and parametric methods, respectively. Correlation was computed using Spearman's and distance correlation analyses. Sur-vival rates were calculated using the Kaplan–Meier method, and the sig-nificance of differences between survival curves was determined using the log-rank test. In regard to the heterogeneity between different types of cancers, the best cut-off values for each continuous prognostic marker were recalculated using the “survminer” R package separately for different tumour types. Uni- and multivariate analyses were per-formed using Cox proportional hazard models with the stepwise method “LR forward”. Nomogram construction and validation were per-formed using Iasonos' guide [32]. Survival predictive accuracy of prog-nostic models was assessed based on a time-dependent receiver operating characteristic curve (ROC) analysis and Harrell's concordance index (c-index) analysis. All statistical analyses were conducted using
    the R software (version 3.5.0) and SPSS software (version 25.0) and P values were two-tailed. Statistical significance was set at P b 0·05.
    3. Results
    3.1. Colon cancer patient characteristics and robust prognostic gene identification
    A summary of the information of all datasets used in this study is provided(Supplement Table S1). Detailed patient characteristics are listed (Supplemental Table S2). A total of 990 patients diagnosed with stage I–III colon cancer from five GEO datasets (GSE17538, GSE33113, GSE37892, GSE38832, and GSE39582) were retrospectively analysed in this study. The median age at diagnosis was 69·0 years (range, 22·0–97·0 years) and 481 (48·6%) of the patients were male. Among them, RFS information was available for 990 patients, wherein mean survival was 146 months. OS information was documented in 678 pa-tients and the mean survival was 130 months. Through the 2-step anal-ysis described in the “Materials and methods”, 1746 of 5952 genes passed the first filter and 797 genes, the expression levels of which were stably and significantly correlated with prognosis, were eventually identified and defined as robust prognostic genes (Supplemental Table S3).
    3.2. Construction of molecular subgroups using TME-relevantrobust prog-nostic genes
    First, we used unsupervised clustering methods in order to classify 990 tumour samples into different molecular subgroups based on
    797 robust prognostic genes. The “ConsensusClusterPlus” package was used to evaluate clustering stability and select the optimal cluster number. Two distant patient clusters, termed as the tumour
    microenvironment cluster 1 (TMEC1) and TMEC2, were finally identi-fied (Fig. 1a–b), and comparison of the proportion of patients from dif-ferent GEO series between two TMEC clusters showed no significant differences (Supplemental Table S4). Via the log-rank test, the Kaplan–Meier curve indicated significant survival differences between the two clusters for both OS (Fig. 1c) and RFS (Fig. 1d). We further fo-cused on GSE39582 datasets, which provided the most comprehensive patient information, characterising biological and clinical differences among these clusters. Samples in TMEC2 exhibited a more advanced tu-