I recently went through a flurry of interviews trying to hire a strong Platform Engineer for my team.
Since our platform is built on Kubernetes, a solid grasp of its fundamentals isn’t optional, it’s table stakes.
Here is my favorite question.
“Imagine a multi-tenant Kubernetes cluster running many different workloads.
There’s a mission-critical, memory-intensive workload called prime — and as the name suggests, it needs prime access to resources.
From capacity planning, we know prime needs roughly 4–6 GiB of memory to function correctly.
What is the best way to configure requests and limits so that this workload is never OOM-killed?”
Pause here for a moment and think it through.
Ready for the answer that would get you through the interview?
To answer this correctly, you need to understand how Kubernetes thinks about resources.
Kubernetes makes resource management decisions across two distinct phases:
Scheduling — Requests help the scheduler decide where a Pod can run.
Enforcement — Limits and QoS determine what happens when things go wrong, such as memory pressure on a node.
Requests: What the Scheduler Actually Uses
Resource requests are promises made to the scheduler. If a Pod requests 3 GiB of memory, the scheduler guarantees that it will be placed on a node that can accommodate that request.
However, if the node has additional free resources available, the container is allowed to use more than its requested amount — requests are only about placement, not enforcement.
Caveats to Keep in Mind
Pod without limits: If a Pod specifies requests but no limits, it can use any available resources on the node beyond its request.
Pod with limits but no requests: Kubernetes automatically sets the request equal to the limit. This ensures the scheduler can place the Pod safely.
Limits: Where Reality Is Enforced
Resource limits are enforced at runtime by the container runtime and the kubelet. Unlike requests, which only affect scheduling, limits define hard boundaries on how a container can consume resources.
CPU Limits
Exceeding a CPU limit does not kill the container.
Instead, the container is throttled using CFS quotas, which slows it down.
This allows other containers on the node to get CPU time fairly.
Memory Limits
Exceeding a memory limit results in an OOMKill — the container is terminated immediately.
Memory cannot be throttled, only restricted.
Under node pressure, Pods are also evicted according to their QoS class.
This asymmetry — throttled CPU vs deadly memory — is critical to understand when planning mission-critical workloads like prime.
Bonus points if you call this out!
QoS Classes: The final piece.
Kubernetes assigns every Pod a QoS class based on the resource requests and limits of its component Containers. QoS classes are derived, not configured.
Kubernetes uses QoS to decide which Pods are evicted first when the node is under resource pressure.
Guaranteed
Every container in the Pod has both requests and limits specified for cpu and memory AND requests == limits.
Strongest guarantees
Last to be evicted
Burstable
At least one Container in the Pod has a memory or CPU request or limits
Evicted after all of the BestEffort pods are evicted.
BestEffort
None of the containers have any requests or limits
First to be evicted.
Your “Final” answer.
For a mission-critical, memory-intensive workload like prime, it can be argued using the above concepts that the correct approach is to use Guaranteed QoS:
resources:
requests:
memory: 6Gi
limits:
memory: 6Gi
This ensures that memory is reserved, eliminating risk of memory overcommittmen and that the Pod is last in line for eviction.
So there you go. If you ever come across a similar question, you know how to impress.

