RT @Pooya_r_m@twitter.com
Happy to announce that our work on 'Bot Detection in GitHub Repositories' with @NatarajanC37@twitter.com has won the Hackathon Best Paper Award of #msr2022
🏆🥳
🐦🔗: https://twitter.com/Pooya_r_m/status/1534245318019190790
RT @Pooya_r_m
Happy to announce that our work on 'Bot Detection in GitHub Repositories' with @NatarajanC37 has won the Hackathon Best Paper Award of #msr2022
🏆🥳
RT @msrconf
Let's give thanks to our #msr2022 general chair David Lo (@davidlo2015)! – a thread 🧵
RT @zacchiro
My last paper at @msrconf this year, co-authored with @ZeinabAbouKha and presented by her on Tuesday, is « The #GeneralIndex of #SoftwareEngineering Papers », inspired by @carlmalamud's General Index. Brief thread below with preprint link at the end. #msr2022
#msr2022 #softwareEngineering #GeneralIndex
The #dataset is available as a portable Postgres database dump and released as #opendata. To learn more about the dataset, checkout the #openaccess #preprint of the paper at: https://arxiv.org/abs/2204.03254 #msr2022
#dataset #opendata #openaccess #preprint #msr2022
The #dataset helps making #metaresearch in #SoftwareEngineering reproducible and independently verifiable, as opposed to what happens when they are conducted using 3rd-party and non-open scholarly indexing services (*cough* Google Scholar *cough*). #msr2022
#dataset #metaresearch #softwareengineering #msr2022
The #dataset serves use cases in the field of #metaresearch, allowing to introspect the output of #SoftwareEngineering research even when access to papers or scholarly search engines is not possible (e.g., due to contractual reasons). #msr2022
#dataset #metaresearch #softwareengineering #msr2022
We introduce the #GeneralIndex of #SoftwareEngineering Papers, a #dataset of fulltext-indexed papers from the most prominent scientific venues in the field of Software Engineering. #msr2022
#GeneralIndex #softwareengineering #dataset #msr2022
My last paper at @msrconf this year, co-authored with @ZeinabAbouKha and presented by her on Tuesday, is « The #GeneralIndex of #SoftwareEngineering Papers », inspired by @carlmalamud's General Index. Brief thread below with preprint link at the end. #msr2022
#softwareengineering #msr2022 #GeneralIndex
Thanks for presenting the discussion results today, @laci_noire@twitter.com! @msrconf@twitter.com
RT @azaidman@twitter.com
.@laci_noire@twitter.com summarising the results of an #msr2022 breakout group on making it easier to use executables for mining research. And this is also Caro’s first in-person conference action 😀👍💪
To learn more about this work check-out the #openaccess #preprint of the paper at https://hal.archives-ouvertes.fr/hal-03622621/ or, if you are at #msr2022, come and talk to me (in person, go figure) at a coffee break!
#openaccess #preprint #msr2022
Next up in my @icseconf+@msrconf trip: tomorrow I'll present at #msr2022 the paper « Geographic #Diversity in Public Code Contributions », joint work with D. Rossi from @unibo. https://conf.researchr.org/details/msr-2022/msr-2022-technical-papers/14/Geographic-Diversity-in-Public-Code-Contributions Brief thread with preprint link at the end.
An #openaccess preprint of the paper that accompanies the dataset (who has also won the #msr2022 Data and Tool Show Award!) is available at: https://hal.archives-ouvertes.fr/hal-03624198v1 (obviously it points to the dataset itself, which is released as #opendata).
#openaccess #msr2022 #opendata
If you are attending @msrconf you can drop by the session tomorrow to discuss more (or just find me at the conference!): https://conf.researchr.org/details/msr-2022/msr-2022-data-showcase/19/A-Large-scale-Dataset-of-Open-Source-License-Text-Variants #msr2022
The dataset serves use cases such as large-scale free/open source software license analysis, training of license-detection tools that are much in demand in the software industry, and NLP analyses of legal licensing document corpora. #msr2022
The dataset is distributed as a big tarball with all the license blobs (deduplicated by SHA1) + a set of portable CSV files that implement a relational data model with all the above information. #msr2022
We have further mined all blobs to detect MIME type (using libmagic), most likely FOSS license (using ScanCode), find a sample origin that distributed the license as well as the earliest known commit that did so (using swh-graph). #msr2022
We have retrieved from 150 million projects archived by @swheritage all the versions of all files whose names match patterns commonly used by developers to distributed software licenses (COPYING, LICENSE, …), obtaining a whooping 6.5 million license blobs. #msr2022