UTHPC - Kubernetes network connections – Incident details

Kubernetes network connections

Resolved
Degraded performance
Started 6 days agoLasted about 2 hours

Affected

kubernetes.hpc.ut.ee

Degraded performance from 2:15 PM to 3:21 PM, Operational from 3:21 PM to 3:46 PM

minu.etais.ee

Degraded performance from 2:15 PM to 3:21 PM, Operational from 3:21 PM to 3:46 PM

UT HPC webservices

Degraded performance from 2:15 PM to 3:21 PM, Operational from 3:21 PM to 3:46 PM

hpc.ut.ee

Degraded performance from 2:15 PM to 3:21 PM, Operational from 3:21 PM to 3:46 PM

docs.hpc.ut.ee

Degraded performance from 2:15 PM to 3:21 PM, Operational from 3:21 PM to 3:46 PM

registry.hpc.ut.ee

Degraded performance from 2:15 PM to 3:21 PM, Operational from 3:21 PM to 3:46 PM

Updates
  • Resolved
    Resolved

    The issue was related to the amount of connections to/from Kubernetes, our firewalls were configured to not allow that high connection rate, but as we are adding new nodes to the cluster, the base rate is exceeding the limits. The limits have been revisited, and new monitoring is being added to not have the same issue in the future.

  • Monitoring
    Monitoring
    We implemented a fix and are currently monitoring the result.
  • Investigating
    Investigating

    We're currently seeing higher request error and request latency rates to applications in Kubernetes. Working on trying to find a cause.