by a fellowship from the Swedish Research Council. this concept, PX-478 HCl we develop a computational platform (TransSyn) for identifying synergistic transcriptional cores that determine cell subpopulation identities. TransSyn leverages single-cell RNA-seq data, and performs a dynamic search for an optimal synergistic transcriptional core using an information theoretic measure of synergy. A large-scale TransSyn analysis identifies transcriptional cores for 186 subpopulations, and predicts identity conversion TFs between 3786 pairs of cell subpopulations. Finally, TransSyn predictions enable experimental conversion of human hindbrain neuroepithelial cells into medial floor plate midbrain progenitors, capable of rapidly differentiating into dopaminergic neurons. Thus, TransSyn can facilitate designing strategies for conversion of cell subpopulation identities with potential applications in regenerative medicine. Introduction Recent advances in single-cell RNA-seq technologies have allowed to classify cells into distinct cell subpopulations based on their gene expression profiles. The identity of these cell subpopulations PX-478 HCl can range from well-defined Vamp5 cell types, subtypes of a same cell type to cells with unclear character PX-478 HCl types. It has been observed that a handful of specific TFs is sufficient to maintain cell subpopulation identity1. Identification of such core TFs can facilitate the characterization and conversion of any cell subpopulation, including rare and previously unknown ones, opening thus novel functional applications2. However, this is a challenge since the core TFs that determine the identity of such novel cell subpopulations are largely unknown. Importantly, the definition of identity TFs is dependent on the cellular context in which it is employed3. In the context of cell/tissue types, for example between neurons and hepatocytes, the identity TFs are defined by the comparison between these largely different cell types. However, in the context of cell subpopulations within a cell type, such as different subtypes of dopaminergic neurons4, the definition of identity TFs becomes subtler due to the increased commonality between them. Existing methods for identifying TFs for cell identity or cellular conversions5C7 rely on a set of gene expression profiles of bulk cell/tissue types. Consequently, the application of these methods is limited to those bulk cell/tissue types, and cannot be applied to novel subpopulations of cells identified in a newly generated single-cell dataset. In addition, these methods detect potential identity TFs by focusing on properties of individual TFs, such as gene expression levels or the number of their unique target genes, rather than emergent properties of potential identity TFs themselves, such as transcriptional synergy among them. Combinatorial binding of specific TFs to enhancers is known to result in a synergistic activity essential for robust and specific transcriptional programmes during development8. The functionality of several TFs operating together to achieve a common output has been studied in detail in embryonic stem cells (ESCs), where a transcriptional core involving Pou5f1, Sox2, and Nanog controls pluripotency9. Furthermore, it has been observed in different systems that multiple TFs are required to function cooperatively to sustain the overall cellular phenotype10. Here, we propose the general concept that cell subpopulation identity is an emergent property arising from a synergistic activity of multiple TFs that stabilizes their gene expression levels. Based on this concept, we develop a computational platform, TransSyn, for the identification of synergistic transcriptional cores defining cell subpopulation identities. TransSyn does not depend around the inference of gene regulatory networks (GRNs), which are often incomplete and their topological characteristics not always capture the multiple direct and indirect interactions between genes. In addition, it only requires a single-cell RNA-seq data of distinct subpopulations as input (Fig.?1a), and does not depend on pre-compiled gene expression datasets or any other prior knowledge. Consequently, TransSyn infers subpopulation identities within a cell population, and aids in designing strategies to convert cell subpopulation identities, especially in cases of closely related subpopulations in functionally different says. Finally, as a direct application of TransSyn, we show that the knowledge of cell subpopulation-specific synergistic transcriptional cores enables experimental conversion of human hindbrain neuroepithelial cells into medial floor plate midbrain progenitors, which rapidly differentiate into DA neurons. Thus, TransSyn can facilitate conversion of cell subpopulation identities with potential applications in regenerative medicine. Open in a separate window Fig. 1 Theory of transcriptional synergy and method overview. a The method requires single-cell RNA-seq data classified into distinct subpopulations as input and identifies most synergistic transcriptional cores for each subpopulation. b Comparison of pair-wise MI between individual TF pairs with joint MI between two TFs together and a third one. For a combination of TFs to be synergistic, the sum of pair-wise MIs has to be less than the joint MI (i.e., unfavorable MMI). Any permutation of same set of TFs results in the same MMI value. c Dynamic search for identifying the most.