Modern molecular biology research produces data on a massive scale. This
data
is predominantly high-dimensional, consisting of genome-wide measurements of
the transcriptome, proteome and metabalome. Analysis of these data sets
often
face the additional problem of having small sample sizes, as experimental
data
points may be difficult and expensive to come by. Many analysis algorithms
are
based upon estimating the covariance structure from this high-dimensional
small sample size data, with the consequence that the eigenvalues and eigenvectors
of
the estimated covariance matrix are markedly different from the true values.
Techniques from statistical physics and Random Matrix Theory allow us to
understand how these discrepancies in the eigenstructure arise, and in
particular locate the phase transition points where the eigenvalues and
eigenvectors of the estimated covariance matrix begin to genuinely reflect
the
underlying biological signals present in the data. In this talk I will give
a
brief non-specialist introduction to the biological background motivating
the
work and highlight some recent results obtained within the statistical
physics
approach.