Yeah but 15 GB/s is 120 gbit. Your storage nodes are going to need more than 2x800gbit if you want to take advantage of the bandwidth once you start putting more than 14 drives in. Also, those 14 drives probably won’t have more than 30M iops. Your typical 2U storage node is going to have something like 24 drives, so you’ll probably be bottlenecked by bandwidth or iops no matter if you put in 15GB/s drives or 7GB/s drives.
Maybe it makes sense these days, I haven’t seen any big storage servers myself, I’m usually working with cloud or lab environments.
nah datacenters care more about capacity or iops, throughput is meaningless, since you’ll always be bottlenecked by network
The sqlite database that Jellyfin uses tends to get corrupted easily, especially if the disk gets full.
The main big feature that Jellyfin devs are working right now is a complete overhaul of the internal database system:
Damn, is that even legal to take on a plane?
Zed is great! Not as many features as IntelliJ, but insanely fast, and new features are being added all the time.
Insane that Github blocked their entire development without discussing it with them though. Ban the contributor, not the entire open source project.
This conversation is about ssds vs hdds in a server environment, but I’m not sure if those claims are true on either environment.
sata ssds are identical to sata hdds, the controller is just able to write down faster.
I could see some argument about nvme interrupts/polling being slower than sata at scale, but you’re not going to see a difference on a modern CPU with less than 10 nvme drives.
Sequential performance is meaningless these days, workstation and server performance are both limited by iops and latency. Raid increases latency slightly, but iops scale linearly until you run out of CPU or memory bandwidth.
Any file system will always be faster on an ssd than on an hdd. xfs/ext4/btrfs don’t have any hdd specific optimizations as far as I know. ZFS does, but it’s not going to make ssds slower than hdds, it just causes some write amplification.
Enterprise ssds are cheaper and faster than consumer ssds, you can buy them super cheap on eBay. 2TB with PLP for $100. However, you need to make sure you can fit a 22110 m.2 or have an adapter cable for u.2.
You’re always going to be better off building raid on ssd than hdd as long as you have the budget for it.
It’s about 5:1 cost ratio these days, it’s honestly pretty worthwhile to just go all nvme these days when you consider the reliability, performance and noise benefits. A raid 5 of nvme can be cheaper and faster than a raid 1 of hdds.
I don’t think I’m adding any more hard drives to my home ceph array at this point.
Yeah, I think you pick up things from all over the place as a consultant. I see lots of different environments and learn from them.
Ah yeah, external-dns operator is great! it’s maybe a bit basic at times but its super convenient to just have A/AAAA records appear for all your loadbalancer svcs and HTTPRoutes. Saves a ton of time.
That’s super unfortunate that the certs are siloed off. Maybe they can give you a NS record for a subdomain for you to use ACME on? I’ve seen that at some customers. Super important that all engineers have access to self-service certs, imo.
Rook is great! It definitely can be quite picky about hardware and balancing, as I’ve learned from trying to set it up with two nodes at home with spare hdds and ssds 😅 Very automated once it’s all set up and you understand its needs, though. NFS provisioner is also a good option for a storageclass as a first step, that’s what I used in my homelab from 2021 to 2023.
Heres my rook config:
https://codeberg.org/jlh/h5b/src/branch/main/argo/external_applications/rook-ceph-helm.yaml
https://codeberg.org/jlh/h5b/src/branch/main/argo/custom_applications/rook-ceph
Up to 3 nodes and 120TiB now and I’m about to add 4 more nodes. I probably would recommend just automatically adding disks instead of manually adding them, I’m just a bit more cautious and manual with my homelab “pets”.
I’m not very far on my RHCE yet tbh 😅 Red hat courses are a bit hard to follow 😅 But hopefully will make some progress before the summer.
The CKA and CKS certs are great! Some really good courses for those on udemy and acloudguru, there’s a good lab environment on killer.sh, and the practice exams are super useful. I definitely recommend those certs, you learn a lot and it’s a good way to demonstrate your expertise.
Well, my point was to explain how Kubernetes simplifies devops to the point of being simpler than most proxmox or Ansible setups. That’s especially true if you have a platform/operations team managing the cluster for you.
Some more details missed here would be that external-dns and cert-manager operators usually handle the DNS records and certs for you in k8s, you just have to specify the hostname in the HTTPRoute/VirtualService and in the Certificate. For storage, ansible probably simplifies some of this away, but LVM is likely more manual to set up and manage than pointing a PVC at a storageclass and saying “100Gi”.
Either way, I appreciate the discussion, it’s always good to compare notes on production setups. No hard feelings even in the case that we disagree on things. I’m a Red Hat Openshift consultant myself these days, working on my RHCE, so maybe we’ll cross paths some day in a Red Hat environment!
You’re not using a reverse proxy on rhel, so you’ll need to also make sure that the ports you want are available, and set up a dns record for it, and set up certbot.
On k8s, I believe istio gateways are meant to be reused across services. You’re using a reverse proxy so the ports will already be open, so no need to use firewall-cmd. What would be wrong with the Service included in the elasticsearch chart?
It’s also worth looking at the day 2 implications.
For backups you’re looking at bespoke cronjobs to either rsync your database or clone your entire 100gb disk image, compared to either using velero or backing up your underlying storage.
For updates, you need to run system updates manually on rhel, likely requiring a full reboot of the node, while in kubernetes, renovate can handle rolling updates in the background with minimal downtime. Not to mention the process required to find a new repo when rhel 11 comes out.
There’s much more tooling for containerd containers than there is for LXC
I have 33 database servers in my homelab across 11 postgres clusters, all with automated barman backups to S3.
Here is the entire config for the db cluster that runs my Lemmy instance
This stuff is all automated these days.
Yeah I’m not saying everybody has to go and delete their infra, I just think that all new production environments should be k8s by default.
The production-scale Grafana LGTM stack only runs on Kubernetes fwiw. Docker and VMs are not supported. I’m a bit surprised that Kubernetes wouldn’t have enough availability to be able to co-locate your general workloads and your observability stack, but that’s totally fair to segment those workloads.
I’ve heard the argument that “kubernetes has more moving parts” a lot, and I think that is a misunderstanding. At a base level, all computers have infinite moving parts. QEMU has a lot of moving parts, containerd has a lot of moving parts. The reason why people use kubernetes is that all of those moving parts are automated and abstracted away to reduce the daily cognitive load for us operations folk. As an example, I don’t run manual updates for minor versions in my homelab. I have a k8s CronJob that runs renovate, which goes and updates my Deployments in git, and ArgoCD automatically deploys the changes. Technically that’s a lot of moving parts to use, but it saves me a lot of manual work and thinking, and turns my whole homelab into a sort of automated cloud service that I can go a month without thinking about.
I’m not sure if container break-out attacks are a reasonable concern for homelabs. See the relatively minor concern in the announcement I made as an Unraid employee last year when Leaky Vessels happened. Keep in mind that containerd uses cgroups under the hood.
Yeah, apparmor/selinux isn’t very popular in the k8s space. I think it’s easy enough to use them, plenty of documentation out there; but Openshift/okd is the only distribution that runs it out of the box.
Yeah, that’s fair. I have set up Openshift Virtualization for customers using 3rd party appliances. I’ve even worked on some projects where a 3rd party appliance is part of the original spec for the cluster, so installing Openshift Virtualization to run VMs is part of the day 1 installation of the Kubernetes cluster.
Sure!
I haven’t used quadlets yet, but I did set up a few systemd services for containers back in the day before quadlets came out. I also used to use docker compose back in 2017/2018.
Docker compose and Kubernetes are very similar as a homelab admin. Docker compose syntax is a little less verbose, and it has some shortcuts for storage and networking. But that also means it’s less flexible if you are doing more complex things. Docker compose doesn’t start containers on boot by default I think(?) which is pretty bad for application hosting. Docker-compose has no way of automatically deploying from git like ArgoCD does.
Kubernetes also has a lot of self-healing automation, like health checks that can either disable the load balancer and/or restart the container if an app is failing, automatic killing of containers when resources are low, preventing the scheduling of new containers when resources are low, gradual roll-out of containers so that the old version of a container doesn’t get killed until the new version is up and healthy (helpful in case the new config is broken), mounting secrets as files in a container, and automatic retry on failed containers.
There’s also a lot of ubiquitous automation tools in the Kubernetes space, like cert-manager for setting up certificates (both ACME and local CA), Ingress for setting up reverse proxy, CNPG for setting up postgres clusters with automated backups, and first-class instrumentation/integration with prometheus and loki (both were designed for kubernetes first).
The main downsides with Kubernetes in a homelab is that there is about a 1-2GiB RAM overhead for small clusters, and most documentation and examples are written for docker-compose, so you have to convert apps into a Deployment (you get used to writing deployments for new apps though). I would say installing things like Ingress or CNPG is probably easier than installing similar reverse-proxy automations on Docker-compose, though.
Not QEMU in particular, poor phrasing on my part. I just mean setting up new environments that run applications on VMs.
I dont have any VMs running in my homelab.
Most of my customers run their Kubernetes nodes either on bare metal, or on a cloud provisioned VM from AWS/GCP/Azure etc
Buy used Samsung pm983s on ebay. Super cheap, super fast, and they have power-loss protection. Only downside is that they’re M.2 22110, not m.2 2280. There’s also a bunch of cheap Samsung and hgst u.2 drives on eBay, but you’ll need an adapter.