Server setup

2021-04-04

A rough overview of what I went through to get an R710 configured with bare metal Ubuntu Server and Microk8s.

Installation

This R710 is 11th generation, so the iDRAC is v6 and very ancient. The virtual console does not run on Java 8 as support for for SSLv3 ended with update 31. So, I had to re-enable it by modifying {JRE_HOME}\lib\security\java.security and stubbing jdk.tls.disabledAlgorithms=SSLv3. I will actually install/uninstall the JRE every time I need to use the iDRAC as leaving it around could be dangerous.

At the time of installation, I only had one of the four RJ45 ports connected. This caused Ubuntu Server to stall on startup, waiting for the three unconnected networks. This can be remedied by informing Netplan the disconnected ports are optional:

sudo nano /etc/netplan/00-installer-config.yaml

network:
    ethernets:
        # ...
        eno2:
            dhcp4: true
            optional: true
        # ...

Updates

Drivers found here. Art of Server has an incredibly useful R710 series.

The basic flow was wget the update, chmod it for execution, apply update. I ran all of this from the top mounted KVM to avoid a severed SSH connection bricking an update.

wget https://dl.dell.com/FOLDER05012856M/1/BIOS_0F4YY_LN_6.6.0.BIN
chmod +x BIOS_0F4YY_LN_6.6.0.BIN
sudo ./BIOS_0F4YY_LN_6.6.0.BIN

Microk8s

I used this as a rough guide. After installation I usermoded to avoid having to escalate for each command:

sudo usermod -a -G microk8s {user}
sudo chown -f -R {user} ~/.kube

I modified the default Kubernetes editor to be nano. Not a fan of vi.

sudo nano ~/.bash_profile

export KUBE_EDITOR="nano"

I enabled many Microk8s add-ons that I knew I would be using:

sudo microk8s enable dashboard dns helm3 ingress storage

Dashboard

I enable and use the Kubernetes dashboard only when necessary by way of a restricted ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
    name: dashboard
    annotations:
        nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
        nginx.ingress.kubernetes.io/configuration-snippet: |
            rewrite ^(/dashboard)$ $1/ redirect;
        nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
        nginx.ingress.kubernetes.io/rewrite-target: /$2
        nginx.ingress.kubernetes.io/whitelist-source-range: 0.0.0.0/0
    namespace: kube-system
spec:
    rules:
      - http:
            paths:
              - path: /dashboard(/|$)(.*)
                pathType: Prefix
                backend:
                    service:
                        name: kubernetes-dashboard
                        port:
                            number: 443

TLS

I use cert-manager with letsencrypt. To install the CustomResourceDefinitions:

microk8s kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v1.2.0/cert-manager.yaml

And then add a ClusterIssuer:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
    name: letsencrypt
    namespace: cert-manager
spec:
    acme:
        server: https://acme-v02.api.letsencrypt.org/directory
        email: {email}
        privateKeySecretRef:
            name: letsencrypt
        solvers:
          - http01:
                ingress:
                    class: nginx

This requires configuring each ingress with the following annotations:

metadata:
    annotations:
        acme.cert-manager.io/http01-edit-in-place: "true"
        cert-manager.io/cluster-issuer: letsencrypt

Mistakes

At one point I decided I wanted to rename the server. Kubernetes proceeded to assume I had a two-node cluster and attempted to run both nodes on the same machine. The nodes did not agree with each other on anything and nothing worked. I opted to reinstall everything.
I accidentally used the same label and selector for all Deployment resources for this website. This causes very, very irregular network behavior and was extremely difficult to diagnose.
Because I am running on a bare metal single-node cluster, I set resource limits for all of my Deployments to be comically high. I neglected to set resource requests, causing my requests to match my limits. This caused my pods to begin to fail scheduling due to lack of available resources.
I installed RAM in incorrect configurations multiple times. Eventually I just filled all slots.
I have regularly neglected to appropriately back up PersistentVolumeClaims resulting in many ugly restores.