If anyone is interest in testing a backport of #SLURM 23.02 for Ubuntu 22.04 LTS, I published a PPA yesterday that contains the necessary Debian packages (I had to backport rocm-smi-lib too):
```bash
sudo add-apt-repository ppa:ubuntu-hpc/slurm-wlm-23.02
sudo apt install <slurm package>
```
I'm keen on making sure that it works for other people too and not just me 😅
Job keeps running out of memory and I keep increasing the memory request until I notice I've got a typo so it's been running on the default 4GB every time ... 🤦
#slurm #programming #analysis
@Danwwilson @Mehrad @rstats with #orgmode you have a lot of options here: you can open and edit files on the server in your local emacs. Or you can edit local files on a local emacs, but run the code on a remote server.
It's a little roundabout, but I even use this system for editing multiple #rstats scripts on a local machine, and submitting them as #slurm jobs to a remote cluster. I have been meaning to write this up for a while, maybe later in the summer
Depends on core utils, and colout: https://github.com/nojhan/colout
#shell #bash #slurm #HPC
#distributedcomputing Hello Mastodon!
Are some of you using #GNUParallel or #SLURM to distribute computations across a set of network nodes?
Requirements:
- multiple executions of a single C++ app, with different arguments
- no communication between executions required
- approximately 20000 executions to distribute across 5-10 nodes
- possibility to run 2-4 tasks in parallel on the same node to speed things up
- retrieval of individual output files for further analysis
What do you suggest?
#distributedcomputing #gnuparallel #slurm
I don't want to brag but I kinda do 🤭 I ran my own #rustlang little program on two clusters today. It's a tiny tool that runs a #Slurm command, processes the output with #RegEx, and outputs #prometheus metrics. But it works really nice, it's easy to adapt the RegEx to what I need to get, and it's easy to configure the metrics the way I want. Right now it's just counting the GPUs in use, labeling them with GPU type, EC2 instance type, etc., but technically could be...
#rustlang #slurm #regex #prometheus
Good news, everyone! 😀 I wrote a little piece of #rustlang 🦀 code, which is now running on a cluster, like, at work!
It's a simple #prometheus exporter, that reads a sinfo #SLURM command and outputs the metric. But it works, and it takes some args (port and update interval). The original idea was to figure out how to modify the existing exporter written in Go, but as I don't know almost anything about it, and also this opportunity showed up... go Rust! 🙃
AWS ParallelCluster has a new user interface. The software automates the setup of multi-node HPC machines in the public cloud.
https://day1hpc.com/post/announcing-the-new-parallelcluster-ui-launched-today/
Optimize your #HPC workloads with #AWS #ParallelCluster #Slurm-based memory-aware scheduling. Automatically balance memory usage for improved efficiency and reduced wait times 🚀☁️ https://day1hpc.com/post/slurm-based-memory-aware-scheduling-in-aws-parallelcluster-32/
#hpc #aws #parallelcluster #slurm
On a #HPC cluster, array jobs let you parallelize things without parallelizing your code - parallelize an easy script instead. For many tasks, this is enough! The basic idea is same code, slightly different data, and #ShellScript:s connect it all. Our array tutorial explains the concepts and provides copy-and-paste examples, and works on any #Slurm cluster. #RSEng #SciComp #tip
#hpc #shellscript #slurm #rseng #scicomp #tip
Nice post by @yakshavers on Slurm accounting, which adds flexibility, transparency, and control to operating an #HPC cluster on AWS using #AWS #ParallelCluster! Version 3.3.0 can now automatically configure #Slurm accounting whether you are using your own database or Amazon #Aurora.
https://aws.amazon.com/blogs/hpc/leveraging-slurm-accounting-in-aws-parallelcluster/
#hpc #aws #parallelcluster #slurm #aurora
🚨Release Alert: #AWS #ParallelCluster 3.4.1 is out today on PyPi. It fixes an issue with #Slurm where nodes could become inaccessible or backed by the wrong #EC2 instance type 🐛 #HPC https://pypi.org/project/aws-parallelcluster/
#aws #parallelcluster #slurm #ec2 #hpc
@mbauman Aha, that is good to know, as well as that there is a #slurm option to disable turbo mode. Impressive results then:
https://discourse.julialang.org/t/how-to-achieve-perfect-scaling-with-threads-julia-1-7-1/92603/31
@rkdarst @minrk @priesgo @SciCompAalto at #NERSC we also offer #JupyterHub on #HPC using #Slurm. I was not involved in setting it up or running it, but I can connect you with the right people if you have questions.
#nersc #jupyterhub #hpc #slurm
So here is your first christmas present: #Snakemake 7.19 is released, adding native #SLURM support, which @rupdecat and I have implemented in the last months. Apart from that, the release provides various bug fixes. #sciworkflows #reproducibility https://snakemake.github.io
#snakemake #slurm #sciworkflows #reproducibility
So... I think I finally got it to work. I have a tiny #SLURM cluster 🙃 Well, cluster is a big word. A clusterino, as Ned Flanders would put it. I followed some of the "Raspberrry Pi cluster" articles and tutorials, but as I only have one #raspi, it's the head node, and the compute node is my desktop. And a 32GB USB key is the NFS 😂 But it is fun for sure 🙃 #hpc
#SLURM presentations from #SC22 now available: https://slurm.schedmd.com/publications.html
There’s one I specifically had been hoping to be discussed for so long, and it finally happened: Slurm and/or/versus Kubernetes
It talks about potentially getting the slurm to work with K8s. Something that may not exactly be necessary as engineers do the on-prem and k8s separately, but there are still good reasons to think about the possibility of integration; like managing one infrastructure vs. two.
After many years, I'm coming back to play with job schedulers for cluster computing. 😀 This is a humble project but happy to see that "I Still Do" #Slurm