in0rdr · @in0rdr
7 followers · 100 posts · Server m.in0rdr.ch

How do you manage TLS certificates for the applications in your Nomad cluster?

#hashicorpnomad

Last updated 1 year ago

Michael · @mmeier
191 followers · 3081 posts · Server social.mei-home.net

I'm sorely tempted to give the now out-of-beta Podman driver a try. Only thing holding me back for now is that I've been using the Fluentd log driver of Docker to pipe all job logs directly into a Fluentbit agent running in a system job, and I'm not yet sure what to do for logging for Podman jobs.

#homelab #hashicorpnomad

Last updated 1 year ago

Michael · @mmeier
191 followers · 3081 posts · Server social.mei-home.net

The first great thing: There's now an extra container label on jobs, which only contains the job name - leaving out the "/periodic-TIMESTAMP" suffix added for periodic jobs. This is going to simplify my logging setup, as I no longer have to filter out the "/periodic-..." part. ๐ŸŽ‰

#homelab #hashicorpnomad

Last updated 1 year ago

Michael · @mmeier
190 followers · 3075 posts · Server social.mei-home.net

Currently working on the regular Homelab host update.

The highlight this time around is definitely the Nomad update to 1.6.

#homelab #hashicorpnomad

Last updated 1 year ago

gester · @gester
8 followers · 31 posts · Server social.vivaldi.net

@hhg
I would look at .
Super light weight orchestrator that's easy to reply and expand. With a whole host of extra trucks up it's sleeve.

#hashicorpnomad

Last updated 1 year ago

Michael · @mmeier
154 followers · 2824 posts · Server social.mei-home.net

People, it happened: The new Nomad 1.6 finally has a command line function to reschedule a job. ๐ŸŽ‰

There are also some other nice things, e.g. the Podman task driver being called "Production ready" now.

hashicorp.com/blog/nomad-1-6-a

#homelab #hashicorpnomad

Last updated 1 year ago

Michael · @mmeier
151 followers · 2712 posts · Server social.mei-home.net

Finally finished my blog article on using Tasmota plugs, Mosquitto and mqtt2prometheus to measure the power draw of my IT equipment: blog.mei-home.net/posts/power-

#homelab #blog #iot #hashicorpnomad

Last updated 1 year ago

in0rdr · @in0rdr
6 followers · 56 posts · Server m.in0rdr.ch

Finally figured out why my cluster had so frequent `rpc error: Not ready to serve consistent reads` errors. Had to bump the default heartbeat ttl/grace period from 10s to 30s (yes, pity me and my sluggish home network, developer.hashicorp.com/nomad/). Feels much more stable now ๐Ÿค 

#hashicorpnomad

Last updated 1 year ago

Michael · @mmeier
124 followers · 2050 posts · Server social.mei-home.net

And another smallish article, this time about a Consul error I recently encountered, with some explanation what a Consul Connect Service Mesh is, and how to debug certificate expiration issues in it:

blog.mei-home.net/posts/consul

#blog #homelab #hashicorpconsul #hashicorpnomad

Last updated 2 years ago

Michael · @mmeier
123 followers · 2019 posts · Server social.mei-home.net

It is a weird feeling sitting here waiting for my cluster to crash again in the hopes that the debug logs show something more.

#homelab #hashicorpnomad #hashicorpconsul

Last updated 2 years ago

Michael · @mmeier
122 followers · 1978 posts · Server social.mei-home.net

Just to reinforce: It happened, on the minute, exactly three days after the services were started again after the last occurrence. Not three days after the last occurrence - three days after the services were started again. Something must break in Consul Connect after three days.

Hmmmm, don't Consul connect mTLS certs have a 72 hour TTL, now that I think about it?

#homelab #hashicorpnomad #hashicorpconsul

Last updated 2 years ago

Michael · @mmeier
122 followers · 1977 posts · Server social.mei-home.net

And it happened again. My entire Nomad cluster broke. Again same picture, all Jobs are up, most health checks green.

This time, I used nsenter on one of the services and tried connecting to their upstream services in the Mesh via curl. Got connection reset by peer.

The most significant thing: It happened precisely three days after the services came back up again after the last occurrence. Still not enough info to write a useful bug, though.

#homelab #hashicorpnomad #hashicorpconsul

Last updated 2 years ago

Michael · @mmeier
122 followers · 1947 posts · Server social.mei-home.net

Wow this Nomad/Consul update was unfortunate.

First, Nomad clients weren't able to discover servers through Consul after 1.5.1, due to this bug: github.com/hashicorp/nomad/iss
When trying to work around it, it took me way too long to figure out that the hardcoded server IPs go into the client/server blocks, not top level, in the conf.

Then nothing in my Consul service mesh was able to connect to anything, due to a breaking change in Consul.

Finally, everything is up again.

#homelab #hashicorpnomad

Last updated 2 years ago

Josh Knapp :verified: · @GoTakeAKnapp
355 followers · 1941 posts · Server anti-social.online

Running into a weird issue with Nomad, and I do not see an active issue in Github.

When I start a new job, the resource usage shows as expected, but after a while, both cpu and memory report as "0"

I checked the cli for usage with 'nomad alloc status' and it shows 0/{{limit}} there as well.

This started happening after the last update, and was part of why I stood up a new cluster, thinking I borked something, but it is now happening with this new cluster as well.

Anyone have any ideas to check?

#hashicorpnomad

Last updated 2 years ago

Josh Knapp :verified: · @GoTakeAKnapp
355 followers · 1933 posts · Server anti-social.online

Got the new Consul Cluster (3 nodes) up and configured with TLS and auto encrypt enabled. Then got the Nomad cluster with 3 servers and 4 clients.

Then migrated workloads over to it from the old cluster, and updated the configuration for anti-social and the haproxy server.

Then I deployed a registry container for my custom images.

Going to work on keycloak tomorrow, assuming I don't have to take one of the kids to Urgent Care.

#HashiCorp #hashicorpconsul #hashicorpnomad

Last updated 2 years ago

Michael · @mmeier
118 followers · 1808 posts · Server social.mei-home.net

I'm finally done with the big migration I started in December. I've just shut down the last VM serving as a Nomad cluster node. My cluster now consists only of 8x Raspberry Pi CM4 and an Udoo X86 II, just in case I ever come across an x86-only service I'd like to run.

The only things running on my old x86 machine now are two Ceph VMs, but I'm waiting for some 3D printing before I can replace those as well.

#homelab #hashicorpnomad

Last updated 2 years ago

Josh Knapp :verified: · @GoTakeAKnapp
351 followers · 1900 posts · Server anti-social.online

Spent the better part of last night and this morning troubleshoot an issue with Consul UI, to only just 30 minutes ago deciding to check if it is a known issue with version 1.15.0.

Github Issues confirms it is a known bug, and will be fixed in 1.15.1.

I wish I had checked that before starting to stand up a new cluster for Consul, Nomad, and hey while I am at it lets toss Vault in there too.

I may have wanted to do that anyway, before I put a lot of "production" stuff in the Nomad Cluster anyway.

At least I know I didn't screw up the update.



#hashicorpconsul #hashicorpnomad #hashicorpvault

Last updated 2 years ago

Josh Knapp :verified: · @GoTakeAKnapp
350 followers · 1895 posts · Server anti-social.online

Updating my Nomad and Consul Versions. Hold on to your butts...

#hashicorpconsul #hashicorpnomad

Last updated 2 years ago

Josh Knapp :verified: · @GoTakeAKnapp
327 followers · 1729 posts · Server anti-social.online

LibreTranslate is running in containers on my Nomad Cluster, and mapped to the local port via the service mesh.

The translate request is passed to 1 of 4 translate containers I created.

It has sped up the translate process significantly.

#mastoadmin #hashicorpconsul #hashicorpnomad

Last updated 2 years ago

Josh Knapp :verified: · @GoTakeAKnapp
325 followers · 1715 posts · Server anti-social.online

Mastodon Update for anti-social.online will happen later today. I am also going to look at moving the translate feature to containers running on nomad, and have it be connected over the service mesh.

#today #hashicorpconsul #hashicorpnomad

Last updated 2 years ago