TDA analysis of flow cytometry data in acute lymphoblastic leukaemia patients

11 September 2020
Salvador Chulián García

High dimensionality of biological data is a crucial element that is in need of different methods to unravel their complexity. The current and rich biomedical material that hospitals generate every other day related to cancer detection can benefit from these new techniques. This is the case of diseases such as Acute Lymphoblastic Leukaemia (ALL), one of the most common cancers in childhood. Its diagnosis is based on high-dimensional flow cytometry tumour data that includes immunophenotypic expressions. Not only the intensity of these markers is meaningful for clinicians, but also the shape of the points clouds generated, being then fundamental to find leukaemic clones. Thus, the mathematics of shape recognition in high dimensions can turn itself as a critical tool for this kind of data. This is why we resort to the use of tools from Topological Data Analysis such as Persistence Homology.


Given that ALL relapse incidence is of almost 20% of its patients, we provide a methodology to shed some light on the shape of flow cytometry data, for both relapsed and non-relapsed patients. This is done so by combining the strength of topological data analysis with the versatility of machine learning techniques. The results obtained show us topological differences between both patient sets, such as the amount of connected components and 1-dimensional loops. By means of the so-called persistence images, and for specially selected immunophenotypic markers, a classification of both cohorts is obtained, highlighting the need of new methods to provide better prognosis. 

  • Applied Topology Seminar