Fri, 22 Oct 2021
15:00 - 16:00
Pablo Camara
University of Pennsylvania

One of the prevailing paradigms in data analysis involves comparing groups of samples to statistically infer features that discriminate them. However, many modern applications do not fit well into this paradigm because samples cannot be naturally arranged into discrete groups. In such instances, graph techniques can be used to rank features according to their degree of consistency with an underlying metric structure without the need to cluster the samples. Here, we extend graph methods for feature selection to abstract simplicial complexes and present a general framework for clustering-independent analysis. Combinatorial Laplacian scores take into account the topology spanned by the data and reduce to the ordinary Laplacian score when restricted to graphs. We show the utility of this framework with several applications to the analysis of gene expression and multi-modal cancer data. Our results provide a unifying perspective on topological data analysis and manifold learning approaches to the analysis of point clouds.

Further Information

Pablo G. Cámara is an Assistant Professor of Genetics at the University of Pennsylvania and a faculty member of the Penn Institute for Biomedical Informatics. He received a Ph.D. in Theoretical Physics in 2006 from Universidad Autónoma de Madrid. He performed research in string theory for several years, with postdoctoral appointments at Ecole Polytechnique, the European Organization for Nuclear Research (CERN), and University of Barcelona. Fascinated by the extremely interesting and fundamental open questions in biology, in 2014 he shifted his research focus into problems in quantitative biology, and joined the groups of Dr. Rabadan, at Columbia University, and Dr. Levine, at the Institute for Advanced Study (Princeton). Building upon techniques from applied topology and statistics, he has devised novel approaches to the inference of ancestral recombination, human recombination mapping, the study of cancer heterogeneity, and the analysis of single-cell RNA-sequencing data from dynamic and heterogeneous cellular populations.

Please contact us with feedback and comments about this page. Last updated on 03 Apr 2022 01:32.