tutorials

Developing with Docker and Kubernetes locally - and avoid the errors that "Hello World" examples do not show

View on GitHub

Manual pod scaling - without loosing requests

Simple Kubernetes tutorials deal not with scenarios, where scale up or scale down actions occur without loosing requests. There are several things to observe:

Challenges

So we have several challenges to simulate in out server and to solve such that Kubernetes does not generate errors.

Simulating a startup time for our service

We added a default time.Sleep of 25 seconds to the hello service, see hello.go at line 43. This means that the server only starts serving requests after the startup period is finished. This simulates a startup time that should be respected.

Simulating long running requests

We added a default time.Sleep of 5 seconds to the hello endpoint, see hello.go at line 77. This means that the server serves the request but it takes three seconds to complete. This simulates long running requests that should not fail.

Simulating load

There is a Makefile target named load that generates some “load” on the service. To be honest, the service is able to deliver much more requests per second, but for this demo the load is more than enough. We use a classic ab (see doc) for generating load.

The load is generated using a docker container:

docker run 

Scale Up

Scale up only works error free, when there are liveness and readyness probes correctly configured. See the documentation for details about how to configure such a probe.

To have those probes working, you have to specify an endpoint that accepts these requests. Note that readyness and liveness endpoints should be two different endpoints in real world applications. In the example we added the /status endpoint in hello.go at line 21 and line 53.

The deployment has to have those two probes configured for scaling up without getting errors. Note the initialDelaySeconds that is set to 20 seconds and the frequency is specified in periodSeconds with 10 seconds. Remember that we implemented a startup time of at least 25 seconds, so the first probe is going to fail and the pod is staying unready till the second probe hits. Only after the probe succeeds, traffic is going to be routed to the pod.

If the probe would fail a second time, the pod would get restarted.

Scale Down

The scale down scenario is more complicated. We have to care about a several things:

Graceful Shutdown

Orchestrate service and pod

TODO:

delaying shutdown to wait for pod deletion propagation

original comment in issue regarding pod deletion propagation

Configure correct grace period

The configuration parameter terminationGracePeriodSeconds of a pod contains the number of seconds that a pod is killed, after the SIGTERM signal was sent to the process. In the optimal configuration this is longer than the configured graceful shutdown time added to the length of the configured preStop Hook. Only then the pod is guaranteed to have received the SIGTERM signal and the graceful shutdown can be finished. When the terminationGracePeriodSeconds has expired, the kubernetes controller sends a SIGKILL to the processes/container in the pod and the processes are forcefully shut down.

Takeaways

So, here are some key takeaways:

Contact

You can contact me using Twitter (my profile) or if you have comments regarding this tutorial, visit me on GitHub and file an issue or create a pull requests.