At Yext we are moving more of our hybrid on-premise / cloud workload to containers orchestrated with Kubernetes, from a home-grown job management system using un-sandboxed Linux processes.

One of the requirements for moving to Kubernetes is building a container to run your application within, which is an extra step that can be foreign if you’re used to having a separate host provisioning process using Ansible or similar. Since we use Bazel in our polyglot monorepo to build fat jars / binaries, it was a natural extension to use distroless and rules_docker to build containers for our applications, with a minimum of overhead.

When developing the base image, I didn’t find the process of customizing the container image to be very straightforward, so I wanted to publish a simplified example in hopes that it’s useful for others.

Requirements

To emulate our ansible-provisioned production servers for running Java applications, I had these requirements:

  • Time zone set to America/New_York
  • Locale set to en_US.UTF-8
  • Python 2.7

For bonus points, we’ll even include unit tests that verify the configuration.

Here’s how.

How to

First, you’ll need to pull in a list of the Debian packages you need in your Bazel WORKSPACE file:

# ================================================================
# Debian Package manager courtesy of distroless
# https://github.com/GoogleContainerTools/distroless/tree/master/package_manager
#
# These are used to build up base container images.
# ================================================================

git_repository(
  name = "distroless",
  remote = "https://github.com/GoogleContainerTools/distroless",
  commit = "813d1ddef217f3871e4cb0a73da100aeddc638ee" ,
)

load(
    "@distroless//package_manager:package_manager.bzl",
    "package_manager_repositories",
    "dpkg_src",
    "dpkg_list",
)

package_manager_repositories()

dpkg_src(
    name = "debian_stretch",
    arch = "amd64",
    distro = "stretch",
    sha256 = "4cb2fac3e32292613b92d3162e99eb8a1ed7ce47d1b142852b0de3092b25910c",
    snapshot = "20180406T095535Z",
    url = "http://snapshot.debian.org/archive",
)

dpkg_list(
    name = "package_bundle",
    packages = [
        "locales",
        "locales-all",
        "libpython2.7-minimal",
        "python2.7-minimal",
        "libpython2.7-stdlib",
         # Add any other debian packages you need here.
    ],
    sources = [
        "@debian_stretch//file:Packages.json",
    ],
)

Next, you’ll want to refer to the Debian documentation for time zones and locales. With some trial and error, we’ve found a container image definition that meets our requirements:

# //kubernetes/images/BUILD
load("@io_bazel_rules_docker//container:image.bzl", "container_image")
load("@package_bundle//file:packages.bzl", "packages")

container_image(
    name = "java_base_image",
    base = "@distroless//java:debug", # :debug includes busybox
    env = {
        "LANG": "en_US.UTF-8",
        "TZ": "America/New_York",
    },
    symlinks = {
        "/etc/localtime": "/usr/share/zoneinfo/America/New_York",
    },
    debs = [
        packages["locales"],
        packages["locales-all"],
        packages["python2.7-minimal"],
        packages["libpython2.7-minimal"],
        packages["libpython2.7-stdlib"],
    ],
)

With this definition, we can easily build the container, load it into our local docker daemon, and drop into it to poke around and try stuff.

$ bazel run kubernetes/images:java_base_image
INFO: Analysed target //kubernetes/images:java_base_image (48 packages loaded).
INFO: Found 1 target...
Target //kubernetes/images:java_base_image up-to-date:
  bazel-bin/kubernetes/images/java_base_image-layer.tar
INFO: Elapsed time: 49.634s, Critical Path: 10.98s
INFO: 26 processes: 24 darwin-sandbox, 2 local.
INFO: Build completed successfully, 38 total actions
INFO: Build completed successfully, 38 total actions
2ee7753b7481: Loading layer [==================================================>]  18.11MB/18.11MB
94d29ce93c46: Loading layer [==================================================>]  1.126MB/1.126MB
6189abe095d5: Loading layer [==================================================>]  1.966MB/1.966MB
4bfa477d2f61: Loading layer [==================================================>]  99.95MB/99.95MB
d278a061cd4a: Loading layer [==================================================>]  3.973MB/3.973MB
76d2a543d663: Loading layer [==================================================>]  112.6kB/112.6kB
16a41d9f690b: Loading layer [==================================================>]  1.597MB/1.597MB
Loaded image ID: sha256:9c663f9c2b78f81ff22ffa182c8f66aadc497d287d1d883f99741c9a38acd1fb
Tagging 9c663f9c2b78f81ff22ffa182c8f66aadc497d287d1d883f99741c9a38acd1fb as bazel/kubernetes/images:java_base_image

$ docker run -it --entrypoint=sh 9c663f9c2b78f81ff22ffa182c8f66aadc497d287d1d883f99741c9a38acd1fb
/ #

How do we know this achieves the goals? We can write Java programs that demonstrate the time zone and locale configuration:

// UTF8Test.java
class UTF8Test {
    public static void main(String[] args) throws Exception {
        System.out.write("Îƞŧéřƞȧŧǐøƞȧŀǐẑȧŧǐøƞ".getBytes());
    }
}

// TZTest.java
class TZTest {
    public static void main(String[] args) throws Exception {
        System.out.println(java.util.TimeZone.getDefault().getID());
    }
}

Then configure a container_test to run and verify them automatically:

# tz_test.yaml
schemaVersion: "1.0.0"
commandTests:
  - name: 'java.util.TimeZone.getDefault() should return America/New_York'
    command: ['java', '-jar', 'TZTest_deploy.jar']
    expectedOutput: ['America/New_York']


# utf8_test.yaml
schemaVersion: "1.0.0"
commandTests:
  - name: 'default encoding'
    command: ['java', '-jar', 'UTF8Test_deploy.jar']
    expectedOutput: ['Îƞŧéřƞȧŧǐøƞȧŀǐẑȧŧǐøƞ']

And the BUILD entries

# //kubernetes/images/BUILD
load("@io_bazel_rules_docker//contrib:test.bzl", "container_test")

# Default Locale of en_US.UTF-8
java_binary(
    name = "UTF8Test",
    srcs = ["UTF8Test.java"],
    main_class = "UTF8Test",
)

container_image(
    name = "testutf8",
    base = ":java_base_image",
    files = [":UTF8Test_deploy.jar"],
)

container_test(
    name = "utf8_test",
    size = "small",
    configs = ["utf8_test.yaml"],
    image = ":testutf8",
)

# .. similar rules for TZTest

Now a single “bazel test” command is all I need to verify that the configuration works.

$ bazel test kubernetes/images:*
...
//kubernetes/images:tz_test         PASSED in 2.7s
//kubernetes/images:utf8_test       PASSED in 2.4s

One last thing

Although we have a working base image, it turns out that the locales-all package adds about 140 MB to the image size! Maybe we can avoid such a heavy cost. It turns out that a tarball containing the /usr/lib/locale/en_US.utf8 directory would do the job just as well. Exfiltrate it from the container image we built in the previous step, and update the container_image definition to look like this:

# //kubernetes/images/BUILD
load("@io_bazel_rules_docker//container:image.bzl", "container_image")
load("@package_bundle//file:packages.bzl", "packages")

container_image(
    name = "java_base_image",
    base = "@distroless//java:debug",
    env = {
        "LANG": "en_US.UTF-8",
        "TZ": "America/New_York",
    },
    symlinks = {
        "/etc/localtime": "/usr/share/zoneinfo/America/New_York",
    },
    tars = [
        "files/en_US.utf8.tar.gz",
    ],
    debs = [
        packages["python2.7-minimal"],
        packages["libpython2.7-minimal"],
        packages["libpython2.7-stdlib"],
    ],
)

Bazel FTW

In the end, our base image for Java ends up being about 135MB, with 15MB of that for Python, 100MB for Java, and 15MB for Busybox.

Bazel is more restrictive in how you build containers compared to Dockerfile’s “run any commands you want at build time”, but the benefit of retaining a single build tool, reproducible & fast/cache-friendly builds, integration with automated tests, and ability to define reusable image layers (not shown here) make the effort of getting over the initial learning curve well worth it.