Martin Weigt (LCQB, Sorbonne) : Protein Evolution in Sequence Landscapes - From Data to Models and Back

  Version imprimable de cet article RSS
18 mars 11:30 » 12:30 — C162

In the course of evolution, proteins diversify their sequences via a complex interplay between random mutations and neutral selection. As a consequence, we can today observe protein sequences of common evolutionary origin, with almost identical three-dimensional folds and biological functions, which however differ by as much as 70-80% of their amino acids. In my presentation, I will review our efforts to model protein evolution across multiple timescales, from the emergence of single mutations in a protein up to deep evolutionary time scales. To this aim, we first model protein fitness landscapes via generative probabilistic models trained on genomic data. Second, we describe evolution as a stochastic process in these landscapes. The proposed framework accurately reproduces the sequence statistics of both short-time (experimental) and long-time (natural) protein evolution, suggesting applicability also to relatively data-poor intermediate evolutionary time scales, which are currently inaccessible to evolution experiments. Our model uncovers a highly collective nature of epistasis, gradually changing the fitness effect of mutations in a diverging sequence context, rather than acting via strong interactions between individual mutations. This collective nature triggers the emergence of a long evolutionary time scale, separating fast mutational processes inside a given sequence context, from the slow evolution of the context itself. The model quantitatively reproduces the extent of contingency and entrenchment, as well as the loss of predictability in protein evolution observed in deep mutational scanning experiments of distant homologs. It thereby deepens our understanding of the interplay between mutation and selection in shaping protein diversity and novel functions, allows to statistically forecast evolution, and challenges the prevailing independent-site models of protein evolution, which are unable to capture the fundamental importance of epistasis.





ÉCOLE SUPÉRIEURE DE PHYSIQUE ET DE CHIMIE INDUSTRIELLES DE LA VILLE DE PARIS
10 Rue Vauquelin, 75005 Paris