Earlier this week, I released an open source version of “Whack a Pod,” a demo that we at Google Cloud have been using at Google Cloud Next, Google I/O and at various regional events. For those that haven’t seen the demo, it turns a Kubernetes cluster into a Whack-a-mole game, where Kubernetes Pods are the moles, and you are trying to take down enough pod/moles to disrupt the service those pod/moles are serving up.

The versions we have used at our events include a physical whack-a-mole machine that was hooked up to our game, so that you could actually physically kill Kubernetes pods by swinging a hammer at them.  The work on the physical rig was done by Sparks and is not included in this repo. But, you can run this version anywhere, with minimal hardware requirements – a screen and an interface, touchscreen or mouse based.

Why Build it?

I wanted an easy and fun way to explain Kubernetes. I wanted something that could be hooked up to a real thing, and allow you to touch, as it were, something in the cloud. I wanted to get the idea of Kubernetes being resilient across.

I also wanted to make a series of Carnival based games, all with the same look and feel as the carnival version of whack a pod.  They were shot down for being too whimsical. But that’s another story.

How does it work?

The entire application consists of three separate applications that are all hosted on the same Kubernetes cluster. We also create three services with which to expose the applications.


This is application that are represented by the moles.  We launch a deployment that creates a replica set with instructions to keep 12 of them running at all times.


This is the basic service that the pods are keeping up. It is tremendously simple – when polled it returns a random hexadecimal color value.


This is a slightly tweaked version of the color api above for use with the advanced interface. In addition to the color, it returns the unique Kubernetes generated name of the pod returning that generated the random color.


This is a set of commands that allow our front-end to issue commands against the Kubernetes cluster without need for credentialing. It is basically a proxy to the Kubernetes API with a restricted set of actions possible.


Creates a deployment for running the pods that serve up the API application. Used in all interfaces when you start up.


Deletes all pods for the API deployment.


Deletes the deployment for the API application. Used in all interfaces when clean finish with the deployment


Deletes a single pod.  Used in all interfaces when you whack a pod.


Cordons a node to prevent it from any pods being scheduled on it, then it kills all the API pods that are running on the node. Used in the advanced interface.


Gets information about the nodes of the Kubernetes cluster. Used in the advanced interface.


Gets information about all of the pods running the API service.  Used in all of the interfaces to populate the list of pod/moles.


Resets a node so that it can start accepting newly scheduled pods.  Used in the advanced interface.


The game consists of a few separate HTML/JS/CSS apps joined together.  All of them work in the same general way.

The demo starts with no pods running. Each version will prompt you to deploy the pods. Deploying creates a replica set with 12 pods running.

The ui regularly polls the api/color service.  If it gets a result, the service is up, if it doesn’t get a result, the service is down. There is an indicator towards the top that give the player feedback on service status.

The pods are displayed in a grid, their status is indicated by color differences (or mole position differences).  The statuses are: starting, running, terminated. Starting and terminated pods cannot be “whacked.”  When you whack a running pod, the UI calls admin/api/k8s/deletepod/. The pods remain for awhile after they have been terminated, and are replaced by pods in the started state.


This is the basic game.  It has a fun carnival theme, and is designed to be more a fun distracting game then a real lesson about Kubernetes.


This is the basic game but without the carnival theme. It’s more in line with the branding for the Google Cloud Next events.  We have used this version on a touchscreen at a few of our regional Cloud Next Event.

It also has a panel that displays an abridged version of the json response from Kubernetes commands?  Why abridged?  Because most of the single responses from Kubernetes are over 100 lines of formatted json.  The app shows the salient details.


When showing the demo at various events, we found ourselves wanting another view of the information for when conversations started to go deeper.  The advanced view is a response to that.  It gets rid of the time element, and instead displays the pods as they populate the cluster nodes, and not in a fixed grid. We show the service responses directly. We also show off which pod is actually responding to the service request. The interface includes the ability to drain a cluster node to simulate the node going down. Killing an actual node takes much longer, so this seemed like a reasonable way to simulate node death.


Why 3 services?

I originally wrote it as 1 app. When you killed all of those pods, and disrupted the service, you killed the UI too.  This combined with inconsistent caching behavior caused really odd issues when you ran the demo. It made much more sense to split them up so as to not kill the game while you were playing the game.

Why PHP?

For starters I do outreach for Google Cloud to the PHP community. But also I started writing this as a quick and dirty prototype.  When I want something done quickly, I write it in PHP. It’s very productive for me. I used the Docker image for App Engine flexible environment’s PHP runtime to just get the thing to work.

When the time came to tighten up everything, I considered re-writing all of the apis to use Golang instead. But one of the side effects of choosing PHP and the GAE flexible runtime was that there is a little overhead to starting the service – as opposed to writing a lean go app that only does just one thing.  That overhead is only a couple of seconds, but having it allows the demo to illustrate the full lifecycle of a pod.

Why No Public Demo?

I haven’t figured out a way to convert this demo to a multi tenant demo. So it’s one front end client to one Kubernetes Cluster. So multiple players would interfere with each other.   Right now it involves trying to take down a service directly tied to a specific IP. I’m sure there is a way to rewrite it to do so, I just haven’t had the time to do so.

What I learned from it?

Kubectl is awesome

In a lot of cases, I was just trying to recreate a kubectl command to tie to the front-end. Specific examples are kubectl delete deployment, and kubectl drain.  In both cases, these are actually doing a lot of work, but hiding it behind one command.  In the case of kubectl delete deployment, the command is deleting the deployment, then the replica set, then each pod and making it all one thing.  If you just delete the deployment via the API, all of the children remain – and if you aren’t expecting it, you’ll be confused.

The fact that kubectl can be called with a flag to reveal those underlying calls is very much appreciated.  The flag is “-v=8” in case you need it.

Kubernetes is a weeble not a fortress

I think this was one of the more surprising things I learned.  I didn’t go out of my way to make the system super resilient or super brittle, but under most conditions if you were able to directly delete all of your Pods, you can cause major outages for your services.  However, those outages would seldom be very long. In fact, at Google I/O we were tracking how long people kept the service down, the most we saw was 50% downtime. And this was two very motivated people hitting moles as soon as they appeared.  Most of the time we saw downtown of less than 30%. Again, keeping in mind that the point of the game mechanics is to cause as much downtime as possible.

A dead Pod is only mostly dead

At a few events we ran into issues where none of the visible pods were in a “running” state, but the service was still up. I thought it was due to UI weirdness.  Turns out it was due to the fact that running the Kubernetes delete pod command marks the pod as “terminating” and then allows for graceful termination.  So if request gets routed to them, and they are still responding to requests, they can, even if marked terminated.  I would have known it already if I had bothered to read the documentation a little clearer.

Whack-a-mole machines rest when the moles are down

When the moles are up, the machines are doing work, when they are down the machine is resting.  When Sparks first hooked up the physical whack-a-mole machine to the Kubernetes cluster (figuratively – one is in our data center, the other at the event) the moles were up all the time, because, well, that’s what Kubernetes does.  Long story short, our first whack-a-mole machine burnt itself out,  because when 21st century cloud technology collides with 1970’s arcade technology, the cloud wins.


This was a really fun project. It helped me learn a lot about Kubernetes, while at the same time enabling a great educational experience about Kubernetes. Plus I got people to build a whack-a-mole machine. It was pretty awesome.

Get the source on github.

Makefile – Delete Forwarding Rules

I have a demo where I build a Kubernetes cluster on Container Engine to run a LAMP app. In the demo, I script out a complete build process from an empty project to the full running app.  Testing this requires a clean up that takes me all the way back to an empty project with no cluster.  

There is one thing I do not tear down – static IP addresses.  I don’t tear these down because they are locked to host names in Google Domains, and I use those IPs in my Kubernetes setup to make sure that my cluster app is available at a nice URL and not just a randomly assigned IP.

But I have been running into a problem with this. Sometimes the static IPs hold on to Forwarding Rules that are autogenerated with crazy randomized names by Container Engine. It appears to  happen only when I do a full clean.  I suspect that I am deleting the cluster before it has a chance to issue the command to delete the forwarding rules itself.

In any case, I got tired of dealing with this manually, so I made a Makefile solution. First I get the dynamic list of crazy random forwarding rule names using the Makefile technique I outlined earlier.  Then I pass that list to a gcloud command:

RULELIST = $(shell gcloud compute forwarding-rules list --format='value[terminator=" "](name)')
	-gcloud compute forwarding-rules delete $(RULELIST) --region $(REGION) -q

Note that I had to make sure I passed a region, otherwise the command would have prompted me to enter it manually.

How Kubernetes Updates Work on Container Engine

I often get asked when I talk about Container Engine (GKE):

How are upgrades to Kubernetes handled?


As we spell out in the documentation, upgrades to Kubernetes masters on GKE are handled by us. They get rolled out automatically.  However, you can speed that up if you would like to upgrade before the automatic update happens.  You can do it via the command line:

gcloud container clusters upgrade CLUSTER_NAME --master

You can also do it via the web interface as illustrated below.

GKE notifies you that upgrades are available.
GKE notifies you that upgrades are available.
You can then upgrade the master, if the automatic upgrade hasn’t happened yet.
You can then upgrade the master, if the automatic upgrade hasn’t happened yet.
Once there, you’ll see that the master upgrade is a one way trip.
Once there, you’ll see that the master upgrade is a one way trip.


Updating nodes is a different story. Node upgrades can be a little more disruptive, and therefore you should control when they happen.  

What do I mean by “disruptive?”

GKE will take down each node of your cluster killing the resident pods.  If your pods are managed via a Replication Controller or part of a Replica Set deployment, then they will be rescheduled on other nodes of the cluster, and you shouldn’t see a disruption of the services those pods serve. However if you are running a Pet Set deployment, using a single Replica to serve a stateful service or manually creating your own pods, then you will see a disruption. Basically, if you are being completely “containery” then no problem.  If you are trying to run a Pet as a containerized service you can see some downtime if you do not intervene manually to prevent that downtime.  You can use a manually configured backup or other type of replica to make that happen.  You can also take advantage of node pools to help make that happen.   But even if you don’t intervene, as long as anything you need to be persistent is hosted on a persistent disk, you will be fine after the upgrade.

You can perform a node update via the command line:

gcloud container clusters upgrade CLUSTER_NAME [--cluster-version=X.Y.Z]

Or you can use the web interface.

Again, you get the “Upgrade Available” prompt.
Again, you get the “Upgrade Available” prompt.
You have a bunch of options. (We recommend you stay within 2 minor revs of your master.)
You have a bunch of options. (We recommend you stay within 2 minor revs of your master.)

A couple things to consider:

  • As stated in the caption above, we recommend you say within 2 minor revs of your master. These recommendations come from the Kubernetes project, and are not unique to GKE.
  • Additionally, you should not upgrade the nodes to a version higher than the master. The web UI specifically prevents this. Again, this comes from Kubernetes.
  • Nodes don’t automatically update.  But the masters eventually do.  It’s possible that the masters could automatically update to a version more than 2 minor revs beyond the nodes. This can a cause compatibility issues. So we recommend timely upgrades of your nodes. Minor revs come out at about once every 3 months.  Therefore you are looking at this every 6 months or so.

As you can see, it’s pretty straightforward. There are a couple of things to watch out for, so please read the documentation.

Making Kubernetes IP addresses static on Google Container Engine

I’ve been giving a talk and demo about Kubernetes for a few months now, and during my demo, I have to wait for an ephemeral, external IP address from a load balancer to show off that Kubernetes does in fact work.  Consequently, I get asked “Is there any way to have a static address so that you can actually point a hostname at it?” The answer is: of course you can.

Start up your Kubernetes environment, making sure to configure a service with a load balancer.

apiVersion: v1
kind: Service
    name: frontend
  name: frontend
  type: LoadBalancer
    # The port that this service should serve on.
    - port: 80
      targetPort: 8080
      protocol: TCP
  # Label keys and values that must match in order to receive traffic for this service.
    app: "todotodo-fe"

Once your app is up, make note of the External IP using kubectl get services.


Now go to the Google Cloud Platform Console -> Networking -> External IP Addresses.

Find the IP you were assigned earlier. Switch it from “Ephemeral” to “Static.” You will have to give it a name and it would be good to give it a description so you know why it is static.


Then modify your service (or service yaml file) to point to this static address. I’m going to modify the yaml.   


Once your yaml is modified you just need to run it; use kubectl apply -f service.yaml.

To prove that the IP address works, you should kubectl delete the service and then kubectl apply, but you don’t have to do that. If you do that though, please be aware that although your IP address is locked in, your load balancer still needs a little bit of time to fire up.  

Instead of this method, you can create a static IP address ahead of time and create the forwarding rules manually. I think that’s its own blog post, and  I think it is just easier to let Container Engine do it.

I got lots of help for this post from wernight’s answer on StackOverflow, and the documentation on Kubernetes Services.

I can confirm this works with Google Container Engine. It should work with a Kubernetes cluster installed by hand on Google Cloud Platform.  I couldn’t ascertain if it works on other cloud providers.

Kubernetes Secrets Directly to Environment Variables

kubernetes-secretsI’ve found myself wanting to use Kubernetes Secrets for a while, but I every time I did, I ran into the fact that secrets had to be mounted as files in the container, and then you had to programmatically grab those secrets and turn them into environment variables.  This works and there are posts like this great one from my coworker, Aja Hammerly that tell you how to do it.

It always seemed a little suboptimal for me though.  Mostly because you had to alter your Docker image in order to use secrets. Then you lose some of the flexibility to use a Dockerfile in both Docker and Kubernetes. It’s not the end of the world – you can write a conditional script –  but I never liked doing this.  It would be awesome if you could just write Secrets directly to ENV variables.

Well it turns out you can. Right there in the documentation there’s a whole section on Using Secrets as Environment Variables. It’s pretty straightforward:

Make a Secrets file, remembering to base64 encode your secrets.

apiVersion: v1
kind: Secret
  name: wordpress-secrets
type: Opaque
  username: d293IHlvdSBkZWNvZGVkIGl0  
  password: Z29vZCBmb3IgeW91
  host: bm90aGluZyBqdWljeSB0aG91Z2g=

Then configure your pod definition to use the secrets.

apiVersion: extensions/v1beta1
kind: Deployment
  name: wordpress-deployment
  replicas: 2
      type: RollingUpdate
        app: wordpress
        visualize: "true"
      - name: "wordpress"
        image: "wordpress"
        - containerPort: 80
        - name: WORDPRESS_DB_USER
              name: wordpress-secrets
              key: username
              name: wordpress-secrets
              key: password
        - name: WORDPRESS_DB_HOST
              name: wordpress-secrets
              key: host

That’s it. It’s a great addition to the secrets API.  I’m trying to track down when it was added. It looks like it came in 1.2.  The first reference I could find to it in the docs was in this commit  updating Kubernetes Documentation for 1.2.