Florence Lenaers · @flloaers
44 followers · 49 posts · Server mathstodon.xyz

“This is, in a way, a familiar idea. For example, I teach students in my evolutionary theory course that master chefs are experts at understanding and predicting epistatic effects. The specific challenge of cooking dishes with sophisticated recipes, or of being able to improvise new ones on the fly, is in understanding that ingredients can interact in surprising ways. And so there’s a long tradition of fields that address why the whole can be more than — or at least different from — the sum of its parts.”

— C. Brandon Ogbunu, How Genetic Surprises Complicate the Old Doctrine of DNA

🔗 quantamagazine.org/how-genetic

🧬

#epistasis #genetics #biology #science

Last updated 1 year ago

Vivek Mutalik · @vivek_mutalik
656 followers · 796 posts · Server fediscience.org

between activity and coding mutations shapes gene | Science Advances

science.org/doi/10.1126/sciadv

#evolvability #promoter #epistasis

Last updated 2 years ago

Giuseppe Michieli · @GMIK69
54 followers · 969 posts · Server mstdn.science
Lorenzo Posani · @lorenzo
129 followers · 24 posts · Server neuromatch.social

The first descriptor is the mean Hamming distance of the training MSA data to the mutated sequence (D in the figure). This quantifies the "quality" of the data for the given problem - the closer to the mutated sequence, the better.

We show the Hamming distance is connected to the statistical of the inferred model, with a prefactor (J0) that depends by the amount of high-order that is not explicitly accounted for by the model.

(in the figure: B = number of sequences)

#bias #epistasis

Last updated 2 years ago

Lorenzo Posani · @lorenzo
84 followers · 4 posts · Server neuromatch.social

🚨 🧬 Non-neuro paper alert 🧬🚨

Is more data always better? Or should we select less data of higher quality?

We explored this question in the context of 🧬 mutation fitness prediction from sequence data, finding a scaling law that relates the performance of statistical models to two simple data descriptors:

📜 biorxiv.org/content/10.1101/20 📜

The first descriptor is the mean Hamming distance of the training MSA data to the mutated sequence. This quantifies the "quality" of the data for the given problem - the closer to the mutated sequence, the better.

We show the Hamming distance is connected to the statistical of the inferred model, with a prefactor that depends by the amount of high-order that is not explicitly accounted for by the model.

The second is the number of sequences in the training MSA. The more the merrier, given they are of similar quality. We show that the number of sequences is - perhaps unsurprisingly - connected to the of the inferred statistical model.

Therefore, given a bunch of training data, there is a clear trade-off between selecting a few "good" training points and including more, but of lower overall quality. We provide some heuristics to select the optimal subset of data given a model and a prediction problem.

#protein #msa #bias #epistasis #variance

Last updated 2 years ago

Admin Córdoba de León · @AdminCordoba
71 followers · 348 posts · Server mstdn.science

RT @GokhaleCS
An excellent review on and - a topic I love - by the amazing @Claudiabank
annualreviews.org/doi/full/10.

#fitnesslandscapes #epistasis

Last updated 2 years ago

Erik Svensson · @EvolOdonata
715 followers · 277 posts · Server ecoevo.social

RT @GokhaleCS@twitter.com

An excellent review on and - a topic I love - by the amazing @claudiabank@twitter.com
annualreviews.org/doi/full/10.

🐦🔗: twitter.com/GokhaleCS/status/1

#fitnesslandscapes #epistasis

Last updated 2 years ago

Guido Puccetti · @GuidoPuccetti
5 followers · 3 posts · Server ecoevo.social