FedSearch - Federated network search engine

Florence Lenaers · @flloaers

44 followers · 49 posts · Server mathstodon.xyz

“This is, in a way, a familiar idea. For example, I teach students in my evolutionary theory course that master chefs are experts at understanding and predicting epistatic effects. The specific challenge of cooking dishes with sophisticated recipes, or of being able to improvise new ones on the fly, is in understanding that ingredients can interact in surprising ways. And so there’s a long tradition of fields that address why the whole can be more than — or at least different from — the sum of its parts.”

— C. Brandon Ogbunu, How Genetic Surprises Complicate the Old Doctrine of DNA

🔗 https://www.quantamagazine.org/how-genetic-surprises-complicate-the-old-doctrine-of-dna-20230731/

🧬 #Science #Biology #Genetics #Epistasis

#epistasis #genetics #biology #science

Last updated 2 years ago

Original post

Vivek Mutalik · @vivek_mutalik

656 followers · 796 posts · Server fediscience.org

#Epistasis between #promoter activity and coding mutations shapes gene #evolvability | Science Advances

https://www.science.org/doi/10.1126/sciadv.add9109

#evolvability #promoter #epistasis

Last updated 3 years ago

Original post

Giuseppe Michieli · @GMIK69

54 followers · 969 posts · Server mstdn.science

#Epistasis lowers the #genetic #barrier to #SARS-CoV-2 neutralizing #antibody #escape https://www.nature.com/articles/s41467-023-35927-0

#epistasis #genetic #barrier #sars #antibody #escape

Last updated 3 years ago

Original post

Lorenzo Posani · @lorenzo

129 followers · 24 posts · Server neuromatch.social

Open media

The first descriptor is the mean Hamming distance of the training MSA data to the mutated sequence (D in the figure). This quantifies the "quality" of the data for the given problem - the closer to the mutated sequence, the better.

We show the Hamming distance is connected to the statistical #bias of the inferred model, with a prefactor (J0) that depends by the amount of high-order #epistasis that is not explicitly accounted for by the model.

(in the figure: B = number of sequences)

#bias #epistasis

Last updated 3 years ago

Original post

Lorenzo Posani · @lorenzo

84 followers · 4 posts · Server neuromatch.social

🚨 🧬 Non-neuro paper alert 🧬🚨

Is more data always better? Or should we select less data of higher quality?

We explored this question in the context of 🧬 #protein mutation fitness prediction from #MSA sequence data, finding a scaling law that relates the performance of statistical models to two simple data descriptors:

📜 https://www.biorxiv.org/content/10.1101/2022.12.12.520004v1 📜

The first descriptor is the mean Hamming distance of the training MSA data to the mutated sequence. This quantifies the "quality" of the data for the given problem - the closer to the mutated sequence, the better.

We show the Hamming distance is connected to the statistical #bias of the inferred model, with a prefactor that depends by the amount of high-order #epistasis that is not explicitly accounted for by the model.

The second is the number of sequences in the training MSA. The more the merrier, given they are of similar quality. We show that the number of sequences is - perhaps unsurprisingly - connected to the #variance of the inferred statistical model.

Therefore, given a bunch of training data, there is a clear trade-off between selecting a few "good" training points and including more, but of lower overall quality. We provide some heuristics to select the optimal subset of data given a model and a prediction problem.

#protein #msa #bias #epistasis #variance

Last updated 3 years ago

Original post