Server setup

A rough overview of what I went through to get an R710 configured with bare metal Ubuntu Server and Microk8s.

Installation

This R710 is 11th generation, so the iDRAC is v6, very ancient. The virtual console does not run on Java 8 as support for for SSLv3 ended with update 31. So, I had to re-enable it by modifying {JRE_HOME}\lib\security\java.security and stubbing jdk.tls.disabledAlgorithms=SSLv3. I now will actually install/uninstall the JRE every time I need to use the iDRAC as leaving it around could be dangerous.

At the time of installation, I only had one of the four ethernet ports connected. This caused Ubuntu Server to stall on startup, waiting for the three unconnected networks. This can be remedied by informing Netplan the disconnected ports are optional:

sudo nano /etc/netplan/00-installer-config.yaml
network:
    ethernets:
        # ...
        eno2:
            dhcp4: true
            optional: true
        # ...

Updates

Drivers found here. Art of Server has an incredibly useful R710 series.

The basic flow was wget the update, chmod it for execution, apply update. I ran all of this from the top mounted KVM to avoid a severed SSH connection bricking an update.

wget https://dl.dell.com/FOLDER05012856M/1/BIOS_0F4YY_LN_6.6.0.BIN
chmod +x BIOS_0F4YY_LN_6.6.0.BIN
sudo ./BIOS_0F4YY_LN_6.6.0.BIN

Microk8s

I used this as a introductory guide. I then installed helm and kubectl and mapped them to the Microk8s directories.

sudo snap install kubectl --classic
sudo snap install helm --classic
sudo mkdir /var/snap/microk8s/current/bin
sudo ln -s /snap/bin/helm /var/snap/microk8s/current/bin/helm
microk8s.kubectl config view --raw > $HOME/.kube/config

I modified the default Kubernetes editor to be nano. Not a fan of vi.

nano ~/.bash_profile
export KUBE_EDITOR="nano"

I enabled many Microk8s add-ons that I knew I would be using:

sudo microk8s enable dashboard dns helm3 ingress storage

Dashboard

I enable and use the Kubernetes dashboard only when necessary by way of a restricted ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
    name: dashboard
    annotations:
        nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
        nginx.ingress.kubernetes.io/configuration-snippet: |
            rewrite ^(/dashboard)$ $1/ redirect;
        nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
        nginx.ingress.kubernetes.io/rewrite-target: /$2
        nginx.ingress.kubernetes.io/whitelist-source-range: 192.168.0.0/16
    namespace: kube-system
spec:
    rules:
      - http:
            paths:
              - path: /dashboard(/|$)(.*)
                pathType: Prefix
                backend:
                    service:
                        name: kubernetes-dashboard
                        port:
                            number: 443

TLS

I use cert-manager with letsencrypt. To install the CustomResourceDefinitions:

microk8s kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml

And then add a ClusterIssuer:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
    name: letsencrypt
    namespace: cert-manager
spec:
    acme:
        server: https://acme-v02.api.letsencrypt.org/directory
        email: {email}
        privateKeySecretRef:
            name: letsencrypt
        solvers:
          - http01:
                ingress:
                    class: nginx

This requires configuring each ingress with the following annotations:

metadata:
    annotations:
        acme.cert-manager.io/http01-edit-in-place: "true"
        cert-manager.io/cluster-issuer: letsencrypt

Mistakes

  • At one point I decided I wanted to rename the server. Kubernetes proceeded to assume I had a two-node cluster and attempted to run both nodes on the same machine. The nodes did not agree with each other on anything and nothing worked. I opted to reinstall everything.
  • I accidentally used the same label and selector for all Deployment resources for this website. This causes very, very irregular network behavior and was extremely difficult to diagnose.
  • Because I am running on a bare metal single-node cluster, I set resource limits for all of my Deployments to be comically high. I neglected to set resource requests, causing my requests to match my limits. This caused my pods to begin to fail scheduling due to lack of available resources.
  • I installed the RAM modules in incorrect configurations multiple times. Eventually I just filled all slots. ECC memory is not cheap.
  • I have regularly neglected to appropriately back up my PersistentVolumeClaims resulting in many ugly restores.