J. Manrique López · @jsmanrique
45 followers · 3056 posts · Server mastodon.social

RT @Pooya_r_m@twitter.com

Happy to announce that our work on 'Bot Detection in GitHub Repositories' with @NatarajanC37@twitter.com has won the Hackathon Best Paper Award of
🏆🥳

🐦🔗: twitter.com/Pooya_r_m/status/1

#msr2022

Last updated 2 years ago

Jesus M. Gonzalez-Barahona · @jgbarah
203 followers · 6428 posts · Server floss.social

RT @Pooya_r_m
Happy to announce that our work on 'Bot Detection in GitHub Repositories' with @NatarajanC37 has won the Hackathon Best Paper Award of
🏆🥳

#msr2022

Last updated 2 years ago

Jan Götze · @deelite
67 followers · 477 posts · Server dresden.network

Warten auf die Ersten...

#msr2022

Last updated 2 years ago

Jesus M. Gonzalez-Barahona · @jgbarah
203 followers · 6428 posts · Server floss.social

RT @msrconf
Let's give thanks to our general chair David Lo (@davidlo2015)! – a thread 🧵

#msr2022

Last updated 2 years ago

Jesus M. Gonzalez-Barahona · @jgbarah
203 followers · 6428 posts · Server floss.social

RT @zacchiro
My last paper at @msrconf this year, co-authored with @ZeinabAbouKha and presented by her on Tuesday, is « The of Papers », inspired by @carlmalamud's General Index. Brief thread below with preprint link at the end.

#msr2022 #softwareEngineering #GeneralIndex

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

The is available as a portable Postgres database dump and released as . To learn more about the dataset, checkout the of the paper at: arxiv.org/abs/2204.03254

#dataset #opendata #openaccess #preprint #msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

The helps making in reproducible and independently verifiable, as opposed to what happens when they are conducted using 3rd-party and non-open scholarly indexing services (*cough* Google Scholar *cough*).

#dataset #metaresearch #softwareengineering #msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

The serves use cases in the field of , allowing to introspect the output of research even when access to papers or scholarly search engines is not possible (e.g., due to contractual reasons).

#dataset #metaresearch #softwareengineering #msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

The includes bibliographic info and indexed n-grams (sequence of contiguous words after removal of stopwords and non-words, for a total of ~0.5 billion unique n-grams) with length 1 to 5 for 44'581 papers retrieved from 34 venues over the 1971–2020 period.

#dataset #msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

We introduce the of Papers, a of fulltext-indexed papers from the most prominent scientific venues in the field of Software Engineering.

#GeneralIndex #softwareengineering #dataset #msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

My last paper at @msrconf this year, co-authored with @ZeinabAbouKha and presented by her on Tuesday, is « The of Papers », inspired by @carlmalamud's General Index. Brief thread below with preprint link at the end.

#softwareengineering #msr2022 #GeneralIndex

Last updated 2 years ago

Christoph Matthies :verified: · @chrisma
116 followers · 1591 posts · Server mstdn.social

Thanks for presenting the discussion results today, @laci_noire@twitter.com! @msrconf@twitter.com

RT @azaidman@twitter.com

.@laci_noire@twitter.com summarising the results of an breakout group on making it easier to use executables for mining research. And this is also Caro’s first in-person conference action 😀👍💪

🐦🔗: twitter.com/azaidman/status/15

#msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

To learn more about this work check-out the of the paper at hal.archives-ouvertes.fr/hal-0 or, if you are at , come and talk to me (in person, go figure) at a coffee break!

#openaccess #preprint #msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

Next up in my @icseconf+@msrconf trip: tomorrow I'll present at the paper « Geographic in Public Code Contributions », joint work with D. Rossi from @unibo. conf.researchr.org/details/msr Brief thread with preprint link at the end.

#msr2022 #diversity

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

An preprint of the paper that accompanies the dataset (who has also won the Data and Tool Show Award!) is available at: hal.archives-ouvertes.fr/hal-0 (obviously it points to the dataset itself, which is released as ).

#openaccess #msr2022 #opendata

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

If you are attending @msrconf you can drop by the session tomorrow to discuss more (or just find me at the conference!): conf.researchr.org/details/msr

#msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

The dataset serves use cases such as large-scale free/open source software license analysis, training of license-detection tools that are much in demand in the software industry, and NLP analyses of legal licensing document corpora.

#msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

The dataset is distributed as a big tarball with all the license blobs (deduplicated by SHA1) + a set of portable CSV files that implement a relational data model with all the above information.

#msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

We have further mined all blobs to detect MIME type (using libmagic), most likely FOSS license (using ScanCode), find a sample origin that distributed the license as well as the earliest known commit that did so (using swh-graph).

#msr2022

Last updated 2 years ago

Stefano Zacchiroli · @zacchiro
1011 followers · 981 posts · Server mastodon.xyz

We have retrieved from 150 million projects archived by @swheritage all the versions of all files whose names match patterns commonly used by developers to distributed software licenses (COPYING, LICENSE, …), obtaining a whooping 6.5 million license blobs.

#msr2022

Last updated 2 years ago