“This is, in a way, a familiar idea. For example, I teach students in my evolutionary theory course that master chefs are experts at understanding and predicting epistatic effects. The specific challenge of cooking dishes with sophisticated recipes, or of being able to improvise new ones on the fly, is in understanding that ingredients can interact in surprising ways. And so there’s a long tradition of fields that address why the whole can be more than — or at least different from — the sum of its parts.”
— C. Brandon Ogbunu, How Genetic Surprises Complicate the Old Doctrine of DNA
🔗 https://www.quantamagazine.org/how-genetic-surprises-complicate-the-old-doctrine-of-dna-20230731/
#epistasis #genetics #biology #science
#Epistasis between #promoter activity and coding mutations shapes gene #evolvability | Science Advances
#evolvability #promoter #epistasis
#Epistasis lowers the #genetic #barrier to #SARS-CoV-2 neutralizing #antibody #escape https://www.nature.com/articles/s41467-023-35927-0
#epistasis #genetic #barrier #sars #antibody #escape
The first descriptor is the mean Hamming distance of the training MSA data to the mutated sequence (D in the figure). This quantifies the "quality" of the data for the given problem - the closer to the mutated sequence, the better.
We show the Hamming distance is connected to the statistical #bias of the inferred model, with a prefactor (J0) that depends by the amount of high-order #epistasis that is not explicitly accounted for by the model.
(in the figure: B = number of sequences)
🚨 🧬 Non-neuro paper alert 🧬🚨
Is more data always better? Or should we select less data of higher quality?
We explored this question in the context of 🧬 #protein mutation fitness prediction from #MSA sequence data, finding a scaling law that relates the performance of statistical models to two simple data descriptors:
📜 https://www.biorxiv.org/content/10.1101/2022.12.12.520004v1 📜
The first descriptor is the mean Hamming distance of the training MSA data to the mutated sequence. This quantifies the "quality" of the data for the given problem - the closer to the mutated sequence, the better.
We show the Hamming distance is connected to the statistical #bias of the inferred model, with a prefactor that depends by the amount of high-order #epistasis that is not explicitly accounted for by the model.
The second is the number of sequences in the training MSA. The more the merrier, given they are of similar quality. We show that the number of sequences is - perhaps unsurprisingly - connected to the #variance of the inferred statistical model.
Therefore, given a bunch of training data, there is a clear trade-off between selecting a few "good" training points and including more, but of lower overall quality. We provide some heuristics to select the optimal subset of data given a model and a prediction problem.
#protein #msa #bias #epistasis #variance
RT @GokhaleCS
An excellent review on #fitnesslandscapes and #epistasis - a topic I love - by the amazing @Claudiabank
https://www.annualreviews.org/doi/full/10.1146/annurev-ecolsys-102320-112153
RT @GokhaleCS@twitter.com
An excellent review on #fitnesslandscapes and #epistasis - a topic I love - by the amazing @claudiabank@twitter.com
https://www.annualreviews.org/doi/full/10.1146/annurev-ecolsys-102320-112153
🐦🔗: https://twitter.com/GokhaleCS/status/1589613885182869504
#introduction
I am a #PhD student, and I would love to be in the loop about the following topics:
#fungicide #resistance #adaptation #GWAS #PopGen #Epigenetics #TE #genomics #EvolutionaryBiology #machinelearning #Epistasis #Phenomics #phenotyping #high #throughput #screening #genome #geneticnetworks #effectors #plantpathogen #gene #smallRNA #RNA #R #python #Bash #Linux #conda #GitHub and more!!
#introduction #phd #fungicide #resistance #adaptation #gwas #PopGen #epigenetics #te #Genomics #EvolutionaryBiology #machinelearning #epistasis #phenomics #phenotyping #high #throughput #screening #genome #geneticnetworks #effectors #plantpathogen #gene #smallrna #RNA #r #python #bash #Linux #conda #GitHub