We had an excellent #CompMS session at the #ISMBECCB2023 conference last week.
Many thanks to keynote speakers @lgoracci01@twitter.com, @RenardLab@twitter.com, and @tomas_pluskal@twitter.com; all selected speakers; and poster presenters for showcasing the latest computational advances in mass spectrometry, with applications across #proteomics, #metabolomics, #lipidomics, and more.
#compms #ismbeccb2023 #proteomics #metabolomics #lipidomics
RT @: Keep calm, Pfam is still running!But now it's hosted on the InterPro website! At #ISMBECCB2023, we had the opportunity to learn more about @PfamDB and its integration with @InterProDB website. We even won these really cool t-shirts,Thanks!
Mark Gerstein at #ISMBECCB2023: Deep learning is exciting, but let's not forget about the physical and biological models underlying the science we're interested in. Let's make biomedical data science more like weather forecasting.
Névéol: What can we do?
Understand the stakes better.
Facilitate levers like data sharing, shared tasks, and policy.
Write more documentation, for protocols, etc.; elicit audits.
See Cohen-Boulakia et al 2017 Future Gen Comput Syst
Aurélie Névéol:
How can we make clinical NLP more reproducible? Can NLP also help with reproducibility? Even word or sentence tokenization can be inconsistent. Most NLP folks have, at least once, failed to repeat someone else's experiment, or even their own. Sometimes it's due to differences in preprocessing, software versions, training vs test splits, or other boring things. Availability issues, page limits, and the bias toward novelty don't help either.
I've think worked out the confusion, the partly overlapping hashtags #BOSC2023 #ISMBECCB2023 made me think that it was a satellite meeting for #SMBE2023
But it's actually completely separate?
or are #BOSC2023 and #ISMBECCB2023 also separate to each other?
some of it was recoded? how long is that available for?
is recording access registered only?
cc other mes @kirt@mastodon.social @kirt@ecoevo.social @kirt@genomic.social
and friends @quinsibell @TashTaylor
now I need to work out what I'm registered for … I might have registered for clashing things because I neglected to put #smbe2023 in my calendar…
#bosc2023 #ismbeccb2023 #smbe2023
One perk of attending #ISMBECCB2023 virtually: watching the recording of a keynote I missed instead of the talk I had planned to watch but turned out not to be interested in.
(I guess you could also plug in your headphones and do the same if you're there in person, but that's noticeably ruder.)
note to self, people and topics to follow from my @kirt@ecoevo.social profile
@OpenBio
@openbioeconomy@bird.makeup
KB: cell type matching across species https://github.com/kbiharie/TACTiCS #ISMBECCB2023
Sylwia Szymanska: Word embeddings capture functions of low complexity regions: scientific literature analysis using a transformer-based language model
Low-complexity regions in proteins are biologically important. But there isn't a database or even a list of these relationships. So let's extract them with a language model.
#ismbeccb2023
#textmining
Brett Beaulieu-Jones: Can we use large language models with clinical notes to estimate likelihood of seizure recurrence? Yes - and even with good results - but models are difficult to interpret. So can we build a model that includes things we really care about, then add an instructable layer? Yes! Use note metadata as weak supervision -> instructions for the model. A tuned T5-Flan model does really well.
Krallinger: Organizing shared tasks. Some processes can take years. Examples - CANTEMIST, CodiEsp, MESINESP, MEDDOCAN, MEDDOPROF, ClinSpEn, DisTEMIST. Most recently MEDDOPLACE, PharmaCoNER
#ismbeccb2023
#textmining
Krallinger: It's important to engage clinical experts from the beginning. That includes their considerations on the content sources.
Annotation guidelines are necessary. See the guides at http://zenodo.org/communities/medicalnlp
Translating these to languages beyond English helps the community.
Krallinger: Developing language models for clinical data in Spanish. Since clinical text varies so much in structure and content, you need a balance between general language and domain-specific optimization. Need some clear annotation guidelines too.
Really need a set of clear clinical use cases, too.
#ismbeccb2023 #textmining
Hi #ismbeccb2023.
I'm in Text Mining today.
Martin Krallinger: Unstructured text from clinical narratives is still underused. There are many other text sources too, like patient forums or drug leaflets, but clinical narratives are especially difficult. No out of the box NLP solution works. Need data, infrastructure, and reproducible benchmarks.
Day 4 recap from #ISMBECCB2023: gene regulation, single-cell data, and visualization of spatial transcriptomics. Papers/preprints/links for highlights are in the description.
https://youtube.com/shorts/TkKDmY6lmZU?feature=share
Oh today I saw more alternative splicing goodies at #ismbeccb2023 #ismb2023
* Zachary Flamholz: Unannotated *viral* proteins. There are many of them, and annotation is usually done by homology. See the PHROGs database of phage genomes - representations of these sequences can accurately identify functional category. Also enables identifying some novel protein families.
* Miguel Fernández Martín: Comparing bacterial protein interactomes to find antibiotic resistance genes. (Back In My Day, we did this with a lot of Y2H). An adaptation of ContextMirror that takes coevolutionary context into account should work. Spoiler: it does. Likely a good way to assemble experimental interactomes with better guidance.
#ismbeccb2023
Back to Function!
* Aysun Urhan: What to do with proteins of unknown function? A new species -> new genes. We can make protein sequence embeddings to try to infer homology, though most embedding approaches so far haven't focused on bacteria. Use what we know about operons (including predicting if they haven't been confirmed) and combine with protein embeddings. Then assign GO terms w/ cosine similarity. This does work better than using AA's alone.