Donor effects on bone marrow-derived MSC gene expression
A total of seven bone marrow-derived MSC (BM-MSC) samples from six donors were thawed and processed for scRNA-seq analysis (Tables 1, 2). First, we focused our analyses on post-freeze MSC products from BM and CT origins, since these are being widely used in numerous clinical trials.
Table 1 summarizes the sample data for all BM samples. Samples BM1 and BM2a were processed in laboratory A, while samples BM2b, BM3, BM4, BM5 and BM6 were processed in laboratory B (Table 1). An average of 305 cells were profiled per sample, with an average read depth of 29,703, representing 12,348 UMI (Unique Molecular Identifier) and 3836 expressed genes per cell (Table 1). The profiles were clustered with both Seurat [27] and SC3 pipelines [26]. Since the latter is optimized for relatively small experimental designs, we present the results of SC3, but note that similar findings were obtained with Seurat (Additional file 1: Fig. S2). Before characterizing differential expression among samples, we confirmed that the MSC-identity markers established by the ISCT, namely NT5E (CD73), THY1 (CD90) and ENG (CD105) were detected in the majority of cells in the bone marrow and umbilical cord tissue derived MSCs (Additional file 1: Fig. S3). Furthermore, CD34, CD14, CD19 and PECAM1—all markers of hematopoietic or lymphoid lineages, were absent.
Projecting each cell against the first two Principal Components (PC) of gene expression, three clusters of single cell profiles were observed (Fig. 1A). PC1 is highly negatively correlated with the total UMI count per cell. Accordingly, the smallest cluster located to the right, consists of 17% of the cells, all of which had low UMI counts, typically fewer than 1000 detected genes, and low ribosomal protein transcript counts (Fig. 1B). These low UMI counts' cells were more prevalent in two donors studied—one from each of the two laboratories (Lab A and Lab B; BM1 and BM4, respectively: Fig. 1C), suggesting the low UMI count may not be related to the lab in which they were manufactured. It is not clear whether the unusual profile of these cells is a technical artefact, or has a biological basis, but they appear to be of low quality and were excluded from all subsequent analyses.
Two clusters identified by SC3 in the remaining high-quality datasets largely differentiate along PC3 (Fig. 1D). Three of the five samples expanded in laboratory B (BM3, BM4 and BM6) were predominantly found in cluster BM-High_b; the other two samples along with both of the samples expanded in laboratory A (BM1, BM2a, BM2b and BM5) were predominantly found in cluster BM-High_a (Fig. 1E). Both samples from the donor whose cells were cultured in each of the laboratories are in cluster BM-High a (BM2a and BM2b), suggesting that the difference is more likely to be donor-related than due to a laboratory or technical effect. Nevertheless, Fig. 1E shows that even between the two laboratories, the cells from this donor tend to separate along PC3.
Single cell differential expression analysis for bone marrow-MSC samples
After normalizing gene expression values to counts per 10,000 UMI, differential expression analysis between clusters BM-High_a and BM- High_b was performed in Seurat using the Wilcoxon rank sum test, yielding 1667 genes at a adjusted P value of 5%. There were 767 genes upregulated in cluster 2a, and 900 upregulated in cluster 2b (Fig. 2A). Gene ontology analysis detected strong enrichment for multiple pathways involved in cell cycle regulation in cluster BM-High_b (Fig. 2B). By contrast, cluster BM-High_a showed upregulation of multiple pathways related to immune signaling and other processes expected to be characteristic of functional MSCs (Fig. 2B). On the basis of the cell cycle gene expression, cells in cluster BM-High_b may be preparing for, or undergoing cell cycle division, whereas the cluster BM-High_a MSCs are more likely to be in G0 phase. Alternately the two populations may simply express cell cycle related genes at different levels, without this reflecting cell cycle stages.
Next, we probed the magnitude of donor contributions to variance within clusters, by performing analysis of variance with donor as the fixed effect of interest. Within just the high-quality cluster BM-High_a cells, donor effects accounted for 8.5% of the variance. A similar result was observed for cluster BM-High_b.
Donor effects on umbilical cord tissue derived MSC gene expression
Umbilical cord tissue derived MSC (UCT-MSC) samples from four donors were used for scRNA-seq analyses. A total of six scRNA-seq samples were prepared, all from the same lab, including three biological replicates of one donor sample (UCT1a, UCT1b and UCT1c). An average of 349 cells were profiled per sample, with an average read depth of 27,417, representing 9700 UMI and 3057 expressed genes per cell (Table 2). The UCT-MSC gene expression profiles were also analyzed with the SC3 pipeline.
Two major clusters of single cell profiles were again observed in the projection of the first two principal components of the UCT-MSC data (Fig. 3A). The smallest of these, consisting of 13% of the cells, was characterized by cells with low UMI counts, typically fewer than 2,000 detected genes (Fig. 3B), similar to the BM-MSC analysis. These low UMI-count cells were present in every sample but again were more prevalent in two of the samples (UCT1c and UCT3: Fig. 3C). It is not clear whether the origin of these cells is a technical artefact, or has a biological basis, but they also appear to be of low quality and were again excluded from all subsequent analyses. Within the high-quality cells, SC3 once more identified two clusters of cells, UCT-High_a (UCT2 and UCT3) and UCT-High_b (UCT1a, UCT1b, UCT1c and UCT4), though in this case they did not clearly correspond to one of the Principal Components. The three replicates of donor UCT1 were primarily captured within the UCT-High_b cluster, suggesting consistency of technique.
Implementation of the Wilcoxon rank sum test for detecting differential gene expression between the two UCT-MSC clusters, after removing the low-quality cells, detected 587 genes at an adj P-value of 5%. Directional up-regulation of established marker genes for mitosis is evident in cluster UCT-High_b as indicated by blue points in the volcano plot Fig. 4A. Gene ontology analysis (Fig. 4B) indicates enrichment for up-regulation of collagen biosynthesis, integrin signaling, extracellular matrix (ECM) organization, and protein translation pathways in the cluster UCT-High_a MSCs, whereas the cluster UCT-High_b cells are enriched for cell cycle regulation, degradation of mitotic proteins, as well as various processes related to cell cycle progression, including CDC20 mediated degradation of Securin, and auto degradation of CDH1, suggesting potential donor-dependent heterogeneity in the gene expression profile of UCT-MSC.
Comparison between bone-marrow and cord-tissue derived MSC single-cell gene expression profiles
Direct comparison of MSCs derived from bone marrow and MSCs from umbilical cord tissue was performed by combining the analyses of the previous two datasets. As expected, Principal Component Analysis (PCA) of the raw single cell profiles again identified two major clusters of low and high UMI-abundance cells along PC1, but in this case PC2 of the joint analysis cleanly differentiates the BM and CT samples (Fig. 5A). This result implies that there are significant differences in gene profile between MSCs derived from the two source tissues.
2612 genes were found to be up-regulated in the BM-MSC, and 689 genes down-regulated, compared to UCT-MSCs. We highlighted the top 16 differentially expressed genes (Additional file 1: Fig. S4). Gene ontology analysis of all the significantly differential expressed genes indicates enrichment for metabolism of lipids and lipoproteins, cholesterol biosynthesis, mitochondrial translation, and metabolic pathways in the BM-MSCs, whereas the UCT- MSCs were enriched for ECM organization, collagen biosynthesis and signal transduction (Fig. 5B). UCT-MSC also showed relative up-regulation of mitotic cell cycle pathways, but this likely reflects the greater ratio of UCT-High_b to UCT-High_a cluster cells than of BM-High_b to BM-High_a cells, rather than a consistent trend favoring cell division in the UCT-MSC.
Pathway enrichment analysis also showed that the differences between the two clusters in each tissue are not consistently maintained. The chord diagram (Fig. 5C) highlights pathways overexpressed in BM-High_a and UCT-High_a, which are not the same. These data confirm differences in the gene expression of non-dividing cells as a function of tissue of origin and suggest that the two types of MSCs are likely to have divergent regulatory and functional potentials. Importantly, the UCT-High_a population exhibit higher expression of genes involving pro inflammatory mediation, ECM organization and collagen biosynthesis, whereas the BM-High_a population had higher expression of steroid biosynthesis, the citric acid cycle and neutrophil degranulation genes (Fig. 5C).
Next, we examined the expression level of genes that play important roles in the immunomodulatory response induced by MSCs. Focused comparison of expression of genes that are associated with cell adhesion, migration, immunosuppression and immunostimulation between the BM- and UCT-derived MSCs suggests tissue-of-origin and donor differences in gene activity (Additional file 1: Fig. S5). BM-High_a cells characteristically overexpress transcripts encoding the membrane proteins prostaglandin synthase (PTGES2) and Endoglin (ENG) as well as the lysosomal protein CD63, compared to BM-High_b, whereas BM-High_b cells overexpress the genes CD46 (encoding a complement cofactor), CD47 (an integrin-associated protein), and CD146 (MCAM, cell adhesion molecule), compared to BM-High_a. These genes are in general overexpressed in BM derived MSC compare to UCT derived MSC. BM derived MSCs have higher expression of the cell surface glycoprotein coding genes CD44 and CD59, as well as the nucleotidase NT5E and immune checkpoint molecule CD276 compared to UCT derived MSC. Conversely, when comparing UCT-High_a and UCT-High_b clusters, UCT-High_a cells overexpress the tetraspanin regulators of motility CD151, and cell surface protein coding genes CD99, THY1 and CD248, while UCT-High_b overexpressed CD9. The cell surface protein coding gene CD81 is not significantly differentially expressed between the clusters UCT-High_a and UCT-High_b. We looked at two genes associated with immunostimulation: CCL2 and CD109, which are overexpressed in UCT and BM derived MSCs, respectively. Post thaw BM-MSC or CT-MSC, co-cultured with THP-1-derived macrophages, (BioRxiv Pradhan, BM1 and BM6 and CT1-3 (hyperflask-expanded)), downregulated macrophage-mediated TNF alpha production. We also evaluated multiple pluripotent and stemness genes (Additional file 1: Fig. S6), none of which were found to be significantly differentially expressed between bone marrow and umbilical cord tissue.
We briefly compared two of the BM-MSC samples between pre-freeze and post-freeze conditions to understand freeze–thaw effects on MSC gene expression. Our analysis indicated a shift in the genetic profile between pre-freeze and post-freeze conditions (Additional file 1: Fig. S1). Pre-freeze samples showed significant overexpression of 1,743 genes relative to post-freeze samples, at the 5% false discovery rate (FDR) threshold, while 310 genes were significantly overexpressed in the post-freeze samples compared to the pre-freeze samples. Some of the pathways overexpressed in the pre-freeze samples are cytokine signaling (FOS, MMP2, TLN1, FOSB), cell proliferation and cell adhesion (ZYX, ITGA5, CLIC1 etc.), while the pathways over-expressed in the frozen, post-freeze samples are carbohydrate interconversions (UGP2), cholesterol/steroid biosynthesis and regulation of apoptosis (PSMA2, PSMB1). We acknowledge that to understand the impact of these differences we need further extensive study with more data and statistical power along with validation.