How do you manage TLS certificates for the applications in your Nomad cluster? #hashicorpnomad
I'm sorely tempted to give the now out-of-beta Podman driver a try. Only thing holding me back for now is that I've been using the Fluentd log driver of Docker to pipe all job logs directly into a Fluentbit agent running in a system job, and I'm not yet sure what to do for logging for Podman jobs.
The first great thing: There's now an extra container label on jobs, which only contains the job name - leaving out the "/periodic-TIMESTAMP" suffix added for periodic jobs. This is going to simplify my logging setup, as I no longer have to filter out the "/periodic-..." part. ๐
Currently working on the regular Homelab host update.
The highlight this time around is definitely the Nomad update to 1.6.
@hhg
I would look at #HashiCorpNomad.
Super light weight orchestrator that's easy to reply and expand. With a whole host of extra trucks up it's sleeve.
People, it happened: The new Nomad 1.6 finally has a command line function to reschedule a job. ๐
There are also some other nice things, e.g. the Podman task driver being called "Production ready" now.
https://www.hashicorp.com/blog/nomad-1-6-adds-node-pools-ux-updates-and-more
Finally finished my blog article on using Tasmota plugs, Mosquitto and mqtt2prometheus to measure the power draw of my IT equipment: https://blog.mei-home.net/posts/power-measurement/
#homelab #blog #iot #hashicorpnomad
Finally figured out why my #hashicorpnomad cluster had so frequent `rpc error: Not ready to serve consistent reads` errors. Had to bump the default heartbeat ttl/grace period from 10s to 30s (yes, pity me and my sluggish home network, https://developer.hashicorp.com/nomad/docs/configuration/server#client-heartbeats). Feels much more stable now ๐ค
And another smallish article, this time about a Consul error I recently encountered, with some explanation what a Consul Connect Service Mesh is, and how to debug certificate expiration issues in it:
#blog #homelab #hashicorpconsul #hashicorpnomad
It is a weird feeling sitting here waiting for my cluster to crash again in the hopes that the debug logs show something more.
#homelab #hashicorpnomad #hashicorpconsul
Just to reinforce: It happened, on the minute, exactly three days after the services were started again after the last occurrence. Not three days after the last occurrence - three days after the services were started again. Something must break in Consul Connect after three days.
Hmmmm, don't Consul connect mTLS certs have a 72 hour TTL, now that I think about it?
#homelab #hashicorpnomad #hashicorpconsul
And it happened again. My entire Nomad cluster broke. Again same picture, all Jobs are up, most health checks green.
This time, I used nsenter on one of the services and tried connecting to their upstream services in the Mesh via curl. Got connection reset by peer.
The most significant thing: It happened precisely three days after the services came back up again after the last occurrence. Still not enough info to write a useful bug, though.
#homelab #hashicorpnomad #hashicorpconsul
Wow this Nomad/Consul update was unfortunate.
First, Nomad clients weren't able to discover servers through Consul after 1.5.1, due to this bug: https://github.com/hashicorp/nomad/issues/16470
When trying to work around it, it took me way too long to figure out that the hardcoded server IPs go into the client/server blocks, not top level, in the conf.
Then nothing in my Consul service mesh was able to connect to anything, due to a breaking change in Consul.
Finally, everything is up again.
Running into a weird issue with Nomad, and I do not see an active issue in Github.
When I start a new job, the resource usage shows as expected, but after a while, both cpu and memory report as "0"
I checked the cli for usage with 'nomad alloc status' and it shows 0/{{limit}} there as well.
This started happening after the last update, and was part of why I stood up a new cluster, thinking I borked something, but it is now happening with this new cluster as well.
Anyone have any ideas to check?
Got the new Consul Cluster (3 nodes) up and configured with TLS and auto encrypt enabled. Then got the Nomad cluster with 3 servers and 4 clients.
Then migrated workloads over to it from the old cluster, and updated the configuration for anti-social and the haproxy server.
Then I deployed a registry container for my custom images.
Going to work on keycloak tomorrow, assuming I don't have to take one of the kids to Urgent Care.
#HashiCorp #hashicorpconsul #hashicorpnomad
I'm finally done with the big migration I started in December. I've just shut down the last VM serving as a Nomad cluster node. My cluster now consists only of 8x Raspberry Pi CM4 and an Udoo X86 II, just in case I ever come across an x86-only service I'd like to run.
The only things running on my old x86 machine now are two Ceph VMs, but I'm waiting for some 3D printing before I can replace those as well.
Spent the better part of last night and this morning troubleshoot an issue with Consul UI, to only just 30 minutes ago deciding to check if it is a known issue with version 1.15.0.
Github Issues confirms it is a known bug, and will be fixed in 1.15.1.
I wish I had checked that before starting to stand up a new cluster for Consul, Nomad, and hey while I am at it lets toss Vault in there too.
I may have wanted to do that anyway, before I put a lot of "production" stuff in the Nomad Cluster anyway.
At least I know I didn't screw up the update.
#hashicorpconsul #hashicorpnomad #hashicorpvault
Updating my Nomad and Consul Versions. Hold on to your butts...
#hashicorpconsul #hashicorpnomad
LibreTranslate is running in containers on my Nomad Cluster, and mapped to the local port via the service mesh.
The translate request is passed to 1 of 4 translate containers I created.
It has sped up the translate process significantly.
#mastoadmin #hashicorpconsul #hashicorpnomad
Mastodon Update for anti-social.online will happen later today. I am also going to look at moving the translate feature to containers running on nomad, and have it be connected over the service mesh.
#today #hashicorpconsul #hashicorpnomad