今天分享的文章是bioRxiv的预印本文章，文章研究概括了在PABC模型致癌过程中出错的发育机制。文章题目是：Single-cell RNAseq uncovers involution mimicry as an aberrant development pathway during breast cancer metastasis
使用Drop-seq，于pregnancy-associated breast cancer (PABC) transgenic mouse model (Elf5 overexpreesion)，展示了其breast tumor中的cellular composition和functional diversity：1. 推断出了mammary epithelial cells各个subpopulation的lineage；2. 揭示了PABC中cancer progression的机理：由alveolar milk secretory cells主导，经由多种Tumor microenvironment (TME) 中细胞的协助，形成的异常involution过程；3. 展示了TME中的cellular & molecular pathway network：involution过程中不乏各类细胞之间的interactions，并具有ECM remodeling与inflammation的特征
1. General cell population characterization
1.1 Unbiased high-resolution scRNAseq captures cell heterogeneity of MMTV-PyMT mammary tumours
A) Experimental workflow showing a schematic representation of the transgenic MMTV-PyMT/Elf5 mouse model, the number of tumours analysed and the number of cells passing the QC filter in each genotype. B) Distribution of variable genes defined by expression and dispersion, highlighting typical canonical makers for each of the main lineages. C) Heatmap showing differential expression of the top expressed genes contributing to the epithelial, stromal and immune signature. The top right panel shows the score of the signature and the percentage of cells classified to each main cell lineage analysed. The tSNE visualisation shows the coordinates of each analysed cell after dimensional reduction coloured by its main cell lineage. Roman numerals define each of the spatially formed clusters (inset). The dot plot shows the top differential markers form each of the main cell lineages and their level of expression. Bottom left panel shows a representative contour plot of the cell composition of a MMTV-PyMT tumor analysed by FACS defined by EpCAM antibodies (epithelial cells), CD45 (leukocytes) and double negative cells (stroma). Violin plot shows distribution of the number of genes per cell in each of the main cell lineages. D) Feature tSNE plots showing the expression of typical canonical markers of each of the main cell lineages.
A) tSNE plot showing cell clusters defined in each of the main cell lineages and their relative frequency. Far right column depicts the main cell lineage of origin for each cluster, showing 6 clusters of epithelial origin, 5 immune and 5 stromal. B) Heatmap showing the top differentially expressed genes that define each of the clusters. C) Cluster tree modelling the phylogenic relationship of the different clusters in each of the main cell types compartments at different clustering resolutions. Dashed red line shows the resolution chosen. Coloured circles in the cluster tree represent the origin of the clusters represented in the tSNE plot shown in panel A, the full cluster tree can be found in SuppFig 4A. D) Visualisation of the top differential genes for each of the defined clusters in the immune lineage (top) and in the stroma (bottom). E) Cell identification using score values for each of the metasignatures of the xCell algorithm in the immune and stromal compartment divided by cluster.
2. Epithelial cell population: assign lineages for subpopulations and compare Elf5 OE vs WT models
2.2. PyMT cancer cells are organised in a structure that resembles the mammary gland epithelial hierarchy
A) tSNE visualisation of the cell groups defined by k-means clustering analysis. Bottom panel shows a gene-expression heatmap of the top expressed genes for each cell cluster. B) Distribution of cells by genotype in the defined tSNE dimensions. The Sankey diagram shows the contribution of each of the genotypes to the cell clusters. Cluster numbers are coloured by the dominant genotype (>2-fold cell content of one genotype), Elf5 (red), WT (green). Violin plots showing Elf5 (upper plot) and PyMT (bottom plot) expression in each cell cluster. C) Scatter plot showing FACS data to define the % alveolar versus luminal progenitors using canonical antibodies that define the epithelial mammary gland hierarchy (EpCAM, CD49f, Sca1 and CD49b), in PyMT tumours. Each dot represents one animal (WT n = 6 and Elf5 n = 5); bottom panels are representative FACS plots of one of the replicates for each genotype. D) Dot plot representing the expression level (red jet) and the number of expressing cells (dot size) of the transcriptional mammary gland epithelium markers in each PyMT cluster. These marker genes were grouped according to each mammary epithelial cell type as defined by Bach, et al.: Hormone sensing differentiated (Hs-d, dark pink), Hormone-sensing progenitor (Hs-p, light pink), Luminal progenitor (LP, orange), Alveolar differentiated (Alv-d, dark red), Alveolar progenitor (Alv-p, light red), Basal (B, light purple), Myoepithelial (Myo, dark purple), Undifferentiated (Multi, light blue). E) Dot plot of the expression level of the top differential marker genes in each of the PyMT clusters coloured by genotype. The yellow rectangles highlight the top genes represented by each cluster. The size of the dots represents the percentage of cells/cluster that express each particular gene (pct. exp) and the colour gradient shows the level of expression for each gene/cluster. Note both colours are shown only when the cluster was populated similarly by both genotypes according to panel B.
分析epithelial cells， :
2.2 Dynamic relationship and states of the malignant lineages of PyMT tumours: implications for the cell of origin of cancer
A) Pseudotiming alignment of the PyMT cancer epithelial cells along the gene signatures that define the main lineages of the mammary gland epithelial hierarchy using the DDRTree method in Monocle2. Right panels show the distribution of the cell states by genotype. B) Projection of the states defined by pseudotime analysis into tSNE clustering coordinates overall and per pseudotime state (miniaturised tSNE plots). Right panels show the projection by genotype. C) Overlay representation of the cell identities (k-means clustering as per Fig.3) and cell lineage identification (pseudotime analysis), the proportion of cells in each cluster that belong to each defined state is shown in the bar chart (right hand side). D) Enrichment analysis (GSVA score) for the gene signatures that define the main mammary gland lineages: Basal, Luminal Progenitor (LP) and Mature Luminal (ML) for each of the clusters. Bottom panels show the expression of each of the gene signatures at single cell resolution. The top bar shows the assigned mammary epithelial cell type as per section C. E) Frequency of the different cell lineages in each genotype. F)Cluster tree showing the phylogeny relationship of the different clusters. Red arrow shows the resolution used (0.7). G)Illustration of cell diversity of PyMT tumours based on the canonical structure of the mammary gland epithelial lineages.
Aim A - 使用另一套mammary gland hierarchy gene signature(Pal et al)建立trajectory并划分出7个pseudotime states；并对比这两套分类方式下，clusters之间的对应关系 (I personally consider it to be kind of redundant)
Altogether, the combination analysis of gene signatures, gene markers and pseudotiming enabled the precise annotation of PyMT cell clusters within the mammary hierarchy proposed by Pal et al., identifying a large luminal lineage that retains most of the cell diversity and strong plasticity, a basal/myoepithelial compartment and a hormone-sensing lineage
2.3. Elf5 OE vs WT: molecular effects => upregulated involution signatures
Molecular mechanisms of cancer progression associated to cancer cells of Alveolar origin
A) Cell cycle stages of the PyMT cancer cells as defined by gene expression signatures using tSNE coordinates and their deconvolution (middle panel). Circled area shows the cycling cluster (C5 in Fig. 3) characterised by a total absence of G1 cells. The quantification of the proportion of cells in each stage grouped by genotype is shown in the bar chart. B) Enrichment GSVA analysis of gene expression metasignatures of cancer-related and Elf5-related hallmarks associated to PyMT/WT (green) and /Elf5 (red) tumours. C) tSNE representation of the EMT gene expression metasignature at the single cell level. Right panel shows a western blot of canonical EMT markers (E-Cadh, E-Cadherin and Vim, vimentin) on PyMT/WT or ELF5 full tumour lysates. Note: the two images correspond to the same western blot gel cropped to show the relevant samples. D) Hypoxia metasignature at the single cell level is shown in the tSNE plot, bottom panel shows a bar plot of the extension of the hypoxic areas in PyMT/WT (green) and /Elf5 (red) tissue sections (tumours and lung metastasis) stained using IHC based on hypoxyprobe binding, representative images are shown in the right panels. E) Lactation and Late Involution (stage 4, S4) metasignatures at the single cell level is shown in the tSNE plots. Pictures show IHC with an anti-milk antibody in tissue sections from a lactating mammary gland at established lactation compared with a mammary gland from an aged-matched virgin mouse; and in PyMT/WT and Elf5 tumours. F) Kaplan-Meier survival curves based on Elf5 expression using the METABRIC cohort. Patient were segregated according to Elf5 expression levels based on tertiles. Elf5-high patients (red) were defined as the top-tertile and Elf5-low patients (green) as the bottom-tertile, Log-rank p values <0.05 are shown in red. The bar chart (bottom panel) corresponds to the distribution of the PAM50 classified breast cancer subtypes in the top and bottom Elf5 expressing tertile of patients. G) Upper panel: Survival analysis (Kaplan-Meier curves) for the expression of Elf5 (left hand side) and the Involution metasignature (right hand side) in luminal breast cancer patients as per section F) Bottom panel: Kaplan-Meier survival curves for the late involution metasignature in Elf5-high patients (left hand side, ELF5-H) and Elf5-low patients (right hand side, ELF5-L). Each group of patients (ELF5-H and ELF5-L) were segregated according to tertiles for the combined expression levels of the genes from the involution metasignature: Green, inv low, bottom third; Blue, inv mid, middle third and Red, inv high, top third. Log-rank p values <0.05 are shown in red.
至此，将Elf5 OE的下游影响focus on involution process
3. Fibroblasts cell population: assign lineages for subpopulations and compare Elf5 OE vs WT models
3.1 Characterisation of cancer-associated fibroblasts in PyMT tumours
A) tSNE plot groups defined by k-means clustering analysis showing a total of three cell clusters defined within the fibroblast subtype. B) Metasignatures of Cancer-associated fibroblast (CAF) signature (upper plot) and myfibroblasts (bottom panel) plotted in the fibroblast tSNE. The gene list of each metasignature was manually annotated from published scRNAseq data in human tumours 64,65. C) Desmoplastic (upper plot), Inflammatory (middle plot) and Contractile (bottom plot) metasignatures plotted in the fibroblast tSNEs from public data 67. D) Violin plots displaying marker genes for each of the three fibroblast clusters defined in section A: ECM-CAFs (0), immune-CAFs (iCAFs, 1) and myofibroblasts (2). E) Upper plot: tSNE illustration of the involution signature from 62. Middle plot: tSNE plot defined by k-means clustering analysis at resolution 1 of the fibroblast population showing a total of nine cell clusters. Bottom section: Violin plots on these nine cell clusters of the three out of four genes from the involution signature. F) Upper plot: Distribution of fibroblasts by genotype (Elf5: red), WT: green) in the defined tSNE dimensions. Bottom plot: Sankey diagram showing the contribution percentage of each of the genotypes to the cell clusters. Cluster numbers are coloured by the dominant genotype (>2-fold cell content of one genotype), Elf5 (red), WT (green). G) GSVA enrichment analysis of involuting mammary fibroblast metasignatures associated to PyMT/WT and /Elf5 tumours. Violin plots of Cxcl12, Mmp3 and Col1a1 genes in all fibroblasts of each genotype. PyMT/WT (green) PyMT/Elf5 (red).
A) Representative bright field images and quantification of total coverage of picrosirius red-stained PyMT/WT and PyMT/ELF5 tumours sections n=4 mice per genotype with 10 regions of interest (ROI) per tumour. B) Representative maximum intensity projections of SHG signal and quantification of SHG signal intensity at depth (µm) and at peak in PyMT/WT and PyMT/ELF5 tumour sections, n=6 mice per genotype with 6 ROI per tumour. C) Polarised light imaging of picrosirius red stained PyMT/WT and PyMT/ELF5 tumour sections, and quantification of total signal intensity acquired via polarised light. Thick remodelled fibres/high birefringence (red-orange), medium birefringence (yellow) and less remodelled fibres/low birefringence (green) n=4 mice per genotype with 10 ROI. D) SHG images of PyMT/WT and PyMT/ELF5 tumours assessed for differences in fibre orientation angle and quantification of frequency of fibre alignment ranging from the peak alignment. Different colours correspond to specific angles of orientation n=6 PyMT/WT and n=4 PyMT/ELF5. Inset shows the cumulative frequency of fibre alignment +/-10 degrees from peak.
4. Crosstalks between epithelial cells/fibroblasts/immune cells
Characterisation of the cell-to-cell interactions involved in the cancer-associated involution mimicry
A) Heatmap of the cell-cell interactions of all cell types from PYMT tumours based on Cellphone DB. Cell classification was based on the annotation from Figure 3 for the epithelial compartment; from Figure 6 at resolution 1 in the case of fibroblasts, where the cycling cluster (Cluster 6) and the residual cluster of 15 cells (Cluster 8) were removed; in addition, Clusters 7 and 2 were considered as a sole group annotated as “Myofibroblasts”. The rest of the cells from the immune and stromal compartments were classified according to the annotation done in Figure 2 (See Supplementary Figure 9C for global annotation). The Scale at the right-hand side shows the interaction strength based on the statistical framework included in CellphoneDB (count of statistically significant (p<0.01) interactions above mean= 0.3, see methods). B) Graphical representation of all significant cell-cell interactions identified by CellphoneDB using the parameters of more than 10 significant interactions with a mean score greater than 0.3, number cut as more than 10 connections and number split 10. The red circles correspond to the cell types from the epithelial compartment; the blue triangles represent the cells from the stromal compartment and the green squares are the cells from the immune compartment. The size of geometric figures is relative to the number of cells involved in the interactions (display as count). Different number splits were applied to establish the most significant interactions for the fibroblast and immune cell types. Fibroblast showed the strongest interactions (highlighted as blue lines) when a number split of 67 (1st Tier) and 50 (2nd Tier) were used. The immune system showed weaker interactions (highlighted as green lines) at a number split of 15 (3rd Tier) and 11 (4th Tier). C) Representative dot plots of ligand (no background)-receptor (red background) pairs. The size of the circles is relative to the number of cells within each annotated cluster that showed a positive expression of each gene and the blue gradient represents the average scaled expression. D) Violin plots of genes from canonical pathways known to recruit and expand MDSCs. E) Proposed molecular model of involution mimicry driven by Elf5 where CAFs and MDSCs are the major cell types involved.
通过已知的receptor - ligand pair结合其在细胞中的表达，推断intercelluar interactome，
此部分的分析最终构建了PyMT中，各类cell subgroups之间的interaction图景（但没有强调Elf5 OE和WT的区别）
这篇文章基于一个很好的模型 (PABC的preclinical transgenic mouse model)，该模型的tumor发生过程即可模拟pregnancy associated alveolar epithelium differentiation。虽然没有动态追踪Pregnancy-related tumor的发生过程，但于该过程中的惊鸿一瞥，仔细描述了一个完整的tumor ecosystem，并address了几点breast cancer field的大问题：
其中前两点几乎全部借助前人发现的signature来反复定义样本中的cluster identity - 虽然没有新marker/lineage的发现，但多种方式都指向相似function时，这种定义会更加solid，也有助于第三点对整个ecosystem的构建。这部分因此大多为explanatory，相当于对前人假想的TME interactions进行精密的定量描述。
这篇文章specific的点是Elf5 OE model，由于已知该model的部分biological facts，在做bioinformatics验证时更准确稳妥，也容易有wet lab的validation - 但多数都只是explanatory的results，新的发现是在CAF中也发现了involution related signature，并用collagen detection进行validation；其prognosis power是dependent on Elf5本身的overexpression的，所以并不算非常惊艳；而最后一部分描述interactome时，几乎没有区别展示Elf5 OE与WT（可能是发现没有strong的区别），也基本停留在descriptive层面。
本文分享自微信公众号 - 单细胞天地（sc-ngs），作者：Chelsea
原文出处及转载信息见文内详细说明，如有侵权，请联系 email@example.com 删除。