Michael · @mmeier
210 followers · 3592 posts · Server social.mei-home.net

Hm, after some further thought, it might be possible. With just one additional SSD and HDD, I could set up two Ceph clusters. Each with two SSD/HDD OSDs. That should still be enough (albeit barely). Then I could either import/export the RBD volumes, or perhaps even mirror them. A bigger problem is the CephFS, which doesn't seem to have import/export capability.

That would also allow me to do the migration piecemeal, instead of having to swing it all at once, somehow.

#homelab #Ceph

Last updated 1 year ago

Michael · @mmeier
210 followers · 3571 posts · Server social.mei-home.net

After digging into Ceph Rook for almost a day now, I'm still highly intrigued. But there seems to be a pretty large problem: There doesn't seem to be any official migration path from baremetal/cephadm to Ceph Rook. I've been turning over serveral scenarios in my head, but in the end, I just don't have the necessary hardware to run two clusters in parallel during a migration.

Ceph Rook was honestly the only thing which got me excited about migrating to k8s. 😔

#homelab #kubernetes #Ceph

Last updated 1 year ago

Michael · @mmeier
193 followers · 3114 posts · Server social.mei-home.net

I definitely need to solve my netbooting-with-ceph-rbd-root-disks problem. During every update, there is at least one Pi which doesn't come up again because it gets blacklisted by Ceph. This is probably because I'm not properly releasing the lock on the root disk RBD before the reboot. But I've honestly got no idea how to properly unmap the RBD when it's the root disk of a completely diskless Pi.

#Ceph #homelab

Last updated 1 year ago

Michael · @mmeier
138 followers · 2261 posts · Server social.mei-home.net

Well excellent. Suddenly, all my hosts are perfectly able to break the header locks for RBD mounts and netboot properly every time. Sometimes I really have the feeling that my entire setup is held together by a couple of arcane incantations and a large helping of fate deciding that today, I was a good boy and don't deserve errors. 🤷

#homelab #Ceph

Last updated 2 years ago

Michael · @mmeier
134 followers · 2221 posts · Server social.mei-home.net

What I have gathered up to now: This happens due to the fact that I have created the RBDs with the "exclusive-lock" enabled. The problem is now that during shutdown, the host does not properly unmap the RBD root volume, and consequently, during the next boot, the RBD is still locked and the host gets blocklisted when trying to map a locked RBD volume.

So now I have to see where I can hook into the shutdown process to properly unmap the root disk.

Does anyone have any pointers?

#Ceph #homelab

Last updated 2 years ago

Michael · @mmeier
134 followers · 2220 posts · Server social.mei-home.net

For quite a while now, I've had a weird problem with my devices netbooting with a Ceph RBD volume as their root disk. When those machines reboot, they end up on an OSD blocklist. As a consequence, they are not able to mount their boot disk and end up not booting at all. I can easily remove them from the blocklist, but that's currently manual, and if I miss the window to do it, the host is going to be stuck.

1/2

#Ceph #homelab

Last updated 2 years ago

Michael · @mmeier
131 followers · 2179 posts · Server social.mei-home.net

New blog article with a write-up of this weekend's Ceph host migration, and some thoughts and plots about the low performance I was seeing:

blog.mei-home.net/posts/ceph-m

#blog #homelab #Ceph

Last updated 2 years ago

Michael · @mmeier
131 followers · 2178 posts · Server social.mei-home.net

To overwrite the recovery configs, one has to overwrite the mclock scheduler with "osd_mclock_override_recovery_settings true" and then set "osd_max_backfills" to the appropriate value. 20 seems to work pretty well for me.

I find it a bit weird that the cluster isn't able to figure out how to use at least more of the available resources on its own but needs me to tell it.

2/2

#homelab #Ceph

Last updated 2 years ago

Michael · @mmeier
131 followers · 2177 posts · Server social.mei-home.net

I believe I've finally found the reason forthe incredibly slow backfill/recovery on my Ceph cluster.

There's a new scheduler, "mclock". When it is active, the old "osd_max_backfill" and "osd_recovery_max_active" configs have no effects. That new scheduler has a "high_recovery_ops" profile. With that one active, my recoveries go from max 10MB/s to 20MB/s. Which is still slower than it should be on a GBE network.

1/2

#homelab #Ceph

Last updated 2 years ago

Michael · @mmeier
130 followers · 2157 posts · Server social.mei-home.net

Still the same problem. The last few percent of the recovery are going very very slowly. I thought that it might be due to a bug, as I was still running 17.2.0, while the newest Ceph is 17.2.6. But a quick upgrade did not bring any improvement in the recovery speed.

None of the resources is used at all on the relevant hosts. Neither CPU, nor disk IO, nor network IO. The backfill is just dripping in with anywhere between 2 and 10 MiB/s. 🙄

#homelab #Ceph

Last updated 2 years ago

Michael · @mmeier
130 followers · 2141 posts · Server social.mei-home.net

First vacation project started: Migrating one of the Ceph VMs from my tower to my previous home server.

First step: Draining the Ceph VM's OSDs/

#homelab #Ceph

Last updated 2 years ago

Michael · @mmeier
118 followers · 1863 posts · Server social.mei-home.net

Today's evening entertainment: Updating my Ceph cluster from v16 to v17, after I'm no longer held back by Arch Linux' lack of recent Ceph packages.

I accept bets below this post. 😅

#homelab #Ceph

Last updated 2 years ago

Michael · @mmeier
120 followers · 1835 posts · Server social.mei-home.net

Alright. That was highly dangerous. I had a technician here to repair my oven. Whenever I switched it on, my (not fuse, but something measuring leak current? Don't know the english word) goes out, and my apartment is without power. It was supposed to be fixed now. So I made myself a Pizza. Five minutes later, no power. The Homelab was not shut down.

But everything came up again? It seems?

I think there is a hurray in order for my storage layer, Ceph.

#homelab #Ceph #uhoh #thatwasclose

Last updated 2 years ago

Michael · @mmeier
118 followers · 1810 posts · Server social.mei-home.net

What I've always appreciated about Ceph: They know that I'm a little bit absent-minded from time to time, and they are really serious about destructive command confirmations. 😅

#homelab #Ceph

Last updated 2 years ago

Michael · @mmeier
109 followers · 1665 posts · Server social.mei-home.net

Another blog in the "What does my current Homelab look like" series: blog.mei-home.net/posts/homela

This time, it's about Ceph and how I use it.

The series of articles has also been renamed to "Current Homelab"...because I was a little bit too slow to get to call it "Homelab in 2022". 😳

#homelab #Ceph #blogging

Last updated 2 years ago

Michael · @mmeier
100 followers · 1624 posts · Server social.mei-home.net

An interesting point on data collection in FOSS projects: matt-rickard.com/should-oss-pr

For me, the first thing which always comes to mind on the topic is how Ceph does it. Not only is telemetry opt-in, but they are also very open on what is send to them. For the cherry on top, they also provide public Grafana dashboards displaying some of the telemetry data:
telemetry-public.ceph.com/

#homelab #Ceph

Last updated 2 years ago

Michael · @mmeier
71 followers · 1182 posts · Server social.mei-home.net

The initial update playbook for the Ceph OSD hosts is done. And it ran through cleanly for all three hosts.
But: Something seems to have broken, as some of my Nomad cluster nodes had problems with their RBD based volumes for a bit. I think I might need to integrate a 2 minute pause between each OSD host's upgrade, to make absolutely sure the cluster has settled again.

#homelab #Ceph

Last updated 2 years ago

Michael · @mmeier
70 followers · 1160 posts · Server social.mei-home.net

Alright, here's the magic formula which makes it so that you can now access "blog.mei-home.net/posts/ceph-m" and "blog.mei-home.net/posts/ceph-m" (with trailing slash) and get a working article in both instances:

(.*)(?:\\/$|(\\/[^\\.\\/]*)$)

🎉

Enter that into Traefik's RegexpReplace and use "${1}${2}/index.html" as the replacement string to work with a Hugo blog stored in Ceph S3.

#homelab #traefik #hugo #Ceph

Last updated 2 years ago

Michael · @mmeier
70 followers · 1155 posts · Server social.mei-home.net

I'm currently on a "small" quest to solve a problem with my blog: When you access a post without a trailing "/", you get a 404. This is due to the fact that:
1) My blog is hosted on Ceph S3 which doesn't support index document handling
2) I'm just working with Traefik's regex path replacing, replacing a trailing "/" with "/index.html"

But there doesn't seem to be another way. Just tried Caddy, but turns out that try_files doesn't work with reverse proxy.

#homelab #traefik #Ceph

Last updated 2 years ago

Michael · @mmeier
69 followers · 1146 posts · Server social.mei-home.net

And another new blog post: blog.mei-home.net/posts/ceph-m

This time, I'm describing my experience migrating my Ceph cluster's MON daemons to three Raspberry Pis. One interesting thing was that seemingly, not all Ceph daemons are automatically updated with new MON addresses and need to be updated manually.

#blog #Ceph #homelab

Last updated 2 years ago