Yesterday I submitted my first manuscript where I managed, synced and backup up all the data and files with :gitannex: #gitAnnex and @datalad.
I use #Apptainer's for #reproducibility across machines (data analysis and #TexLaTeX environments).
A very reassuring feeling being able to work on it anywhere and to know all files are securely backed up in a decentralized manner.
Also, reusing individual parts is possible. I'll use this workflow from now on 💪
#phdlife #gitannex #Apptainer #reproducibility #texlatex #DataLad #reproducibleresearch
@winnie git-annex is pretty much just a symlink manager, but unlike git you don’t check the files themselves into version control (saving you the problem of checking out a multi-gigabyte repo if you just wanted to work on the code). However, you can set up retention policies e.g. there must be X remotes that have the actual data. Great system built to be fault tolerant.
Coupled with #datalad, you can manage all sorts of nested annexes with different ACLs. It’s great.
@jcolomb Yes, #DataLad is great: for those that don't know, it uses git-annex under the hood and makes it more usable (though I must say, I haven't used it before). It's also funded a lot of work on git-annex.
Should we recommend DataLad as the way to get started for researchers/RDM? When would someone need to go straight to git-annex?
Hey cool information (especially the video), mostly for data managers and RSE who needs to understand the tool background, though.
For researchers, I do present how to work with datalad and/or GIN (set rules about what gets annexed, use simple commands later on).
Note that we want something like git, not only for large files, but also for many files: therefore the use of submodules in #datalad.
The above advantages make #Apptainer an ideal candidate for #reproducibleResearch. It's directly supported by @datalad (see the `datalad-container` extension) and is a bliss to use compared to #Docker in my opinion. Synchronizing the images with #gitAnnex or #DataLad works very well. Overall very impressed how straight-forward #containerization can be! 👍
#Apptainer #reproducibleresearch #docker #gitannex #DataLad #containerization
What are your favorite resources for learning #versioncontrol with #git and #DataLad @datalad?
We will soon start preparing teaching materials (which will all be shared openly) for a full-semester course on "Version control of code and data using Git and DataLad" at University of Hamburg (project with @nicoschuck) supported by a grant awarded by the Digital and Data Literacy in Teaching Lab #DDLitLab 🎉
#ddlitlab #DataLad #git #versioncontrol
Looks like @datalad is the next thing I'll look into. Using #gitAnnex for reproducible #science #dataAnalysis workflows, exactly what I need right now!
#gitannex #science #dataanalysis #DataLad #reproducibleresearch #git
@Russpoldrack For my talks on #rdm, #git, #datalad etc. (made with #rstats xaringan package), I set up a „gallery page“ (based on #rmarkdown) that separately displays slides of all previous talks. Past talks are „archived“ to a separate folder in the same repo. The CI that deploys to gh-pages builds all slide decks + the gallery page. Not super happy with this because I already had to fix broken links in past slides (still need to change it) https://lennartwittkuhn.com/talk-rdm/
#rmarkdown #rstats #DataLad #git #rdm