Building Windows Unreal Engine projects in a Kubernetes cluster

Posted by Eugene McArdle and Aiden De Loryn on 16 February 2022

Tags: Admiral, Kubernetes, Tekton Pipelines, Unreal Engine, Windows

At TensorWorks we are developing Admiral, a CI/CD platform for Unreal Engine projects that brings modern software development best practices to developers working with the Unreal Engine. Admiral is a Kubernetes-native application written in Go, making it horizontally scalable, highly available, and easy to install and manage either on-premises or in the cloud. At the core of any CI/CD platform is the concept of a pipeline - a series of tasks that run to accomplish an overall goal. Some pipeline tasks may be dependent on other tasks, while others can run concurrently to improve performance. Admiral is built on Tekton Pipelines, an open source project that brings pipelines to Kubernetes.

When we started using Tekton Pipelines they were not able to run Windows workloads. This was a problem for us as Unreal Engine developers love using Windows and are unlikely to stop using it anytime soon. Thus, we needed to figure out how to run Windows workloads on Tekton Pipelines. Fortunately for us, the maintainers of the Tekton project are lovely humans and openly welcomed us to contribute when we expressed our interest in adding this functionality to their project.

Before we jump into how we added Windows workloads to Tekton, we will give a brief lesson in Tekton Pipelines.

Tekton Pipelines 101
Gigantic Windows workloads in Tekton/Kubernetes
How we added Windows workloads to Tekton
Unforeseen problems before landing
The happy world of building Windows Unreal Engine projects in Admiral

Tekton Pipelines 101

A Tekton pipeline consists of one or more Tasks, which are made up of one or more Steps. Tasks are separate, and may run in parallel, on different nodes, etc. The Steps of a Task will all run in separate Containers in a single Pod, and those containers share a common Workspace (mounted as a Kubernetes EmptyDir volume). This allows Steps to share data, and for the results of one step to be consumed by the next. Each Step can use a different container image, or we can use one Task-level image for all steps. One problem solved by Tekton Pipelines is the ordering of Steps within a Task, which is not natively supported by Kubernetes, but is essential for any pipelining system.

An example Task for the Unreal Engine use case might involve two steps 1) building an Unreal project, and 2) pushing it in a container to a registry. These two Steps must run in order, as publishing a project before it’s built is unlikely to be useful 🙂. Tekton Pipelines allow Tasks to be dependent on other Tasks, so we could, for example, run unit tests in one Task and then if everything passes, build and package the Unreal Engine project. Alternatively, we could have builds for different OS’s running in separate, parallel, Tasks. These are just a few examples of how Tekton Pipelines could add value to Unreal Engine developers.

Gigantic Windows workloads in Tekton/Kubernetes

Tekton Pipelines, and Kubernetes in general, expect containers to use lightweight and minimal images to improve efficiency. Unreal images are exceptionally large, and are fairly unique in the world of containers. Tekton does allow for Results to be emitted from Tasks, but they are typically very small. The current size limit is 4096 bytes, and while there are proposals to increase this limit, the maximum size is still likely to be under 4MB in size. In contrast a built Unreal project is likely to be many Gigabytes in size, which makes our use case of Unreal containers in Tekton Pipelines fairly unique.

At the start of development on Admiral we were very conscious of a critical issue we would need to address: Kubernetes, and the world of containers in general, is very heavily geared towards Linux as a primary operating system. As we mentioned above, the vast majority of Unreal Engine game developers need Windows support more than they need Linux support. Unfortunately for us, the available resources and support for Windows containers and nodes is less mature compared to Linux. The present state of Windows containers on Kubernetes requires that Windows builds must be run on Windows nodes. One potential workaround involves running Windows builds in a Windows VM, running in a Linux container on a Linux node, but the overhead of this approach is utterly prohibitive. We knew that supporting Windows builds of Unreal projects would require us to run Windows workloads on Windows nodes, managed by Tekton Pipelines. However, when we set out on our product development journey Tekton Pipelines were not able to run Windows workloads.

How we added Windows workloads to Tekton

Initial exploration and discussions with the Tekton team gave us the bones for what degree of support for Windows we would be providing (see TEP-0057). Tekton Pipelines are made up of two parts: 1) the Tasks, Pipelines, TaskRuns and PipelineRuns which make up the actual “jobs”, and 2) the core components, which are responsible for managing, creating, and monitoring those elements. It was decided that we didn’t need (or want) the core components to run on a Windows node, thus those components would remain Linux exclusive. As Kubernetes clusters require Linux nodes for their control-plane, there was no need for the core Tekton Pipelines components to run on a Windows node. Instead, it was determined that only the TaskRuns and PipelineRuns that make up the actual workloads of a Tekton Pipeline would be required to run on Windows nodes.

To differentiate from Linux workloads we decided that Windows workloads would be specified by users in the podTemplate supplied as part of the TaskRun or PipelineRun. Early discussions around whether this should be done as a nodeAffinity rule or a nodeSelector examined the suitability of each. NodeSelectors are always a hard rule (the requirement must be met or the pod cannot be scheduled on this node) and only allow for direct label matches. NodeAffinity rules on the other hand are far more flexible, allowing for hard or soft rules (requirement or preference) as well as more complex condition checks. We decided that the requirement for a Windows node would be available through either a nodeSelector or nodeAffinity rules, leaving this decision up to the user.

In addition to adding functionality to run Windows workloads in a Tekton Pipeline, a further goal of the Tekton team was that there should be no noticeable difference for users working with homogeneous Linux clusters. That is, all Tasks and Pipelines should work exactly as they did before support for Windows workloads was added. Thus, Windows specific functionality would be exclusively opt-in, and if no nodeSelector is specified then that workload would be scheduled to a Linux node.

The Tekton components that were identified as needing to run on Windows nodes were:

Entrypoint binary, which is copied into each container by Tekton Pipelines and is used to manage step ordering and execution,
Script mode, which allows Tekton Pipelines users to provide an inline script that will be run in the Step’s container instead of providing a command and arguments
No-op image - used to manage sidecars, which are containers that run alongside a Task providing services as needed

During testing we built a Windows container image for the entrypoint using a Dockerfile and were able to get step ordering to work for a Task on Windows. Interestingly, the entrypoint executable did not need to have the .exe extension, which is normally required on Windows, to run. This served as a promising proof of concept for Windows support for Tekton Pipelines, as the entrypoint binary is a core component responsible for ensuring that containers (or Steps) are executed sequentially within a pod (or Task), enabling CI/CD with Tasks and Pipelines.

Similarly to the entrypoint container image, we successfully tested a build of the no-op image for Windows. The no-op image is used to stop sidecar containers, and for scheduling Tasks that share a persistent workspace to the same node (this is known as an affinity assistant). It should be noted that when Windows and Linux Tasks (or pods) are used together and share the same persistent volume, the affinity assistant must be disabled as these two Tasks cannot run on the same node.

Script mode compatibility with Windows was a trickier problem, as the approach taken for Linux scripts was unsuitable for Windows. Linux scripts are simple: the script is dumped into a randomly named file, that file’s permissions are chmodded to make it executable, and the entrypoint uses the file as the executable that runs for this step. If the script does not contain a shebang indicating how it is to be run then one is inserted. Unfortunately Windows is not flexible enough to allow this approach without additional work.

Since we wanted this to behave as similarly to Linux as possible we introduced the need for a custom Windows shebang, to be inserted at the start of a Windows script in the Step specification, which follows the pattern: #!win <executable> <args…>. The executable given in the shebang is the Windows executable that is to be used to interpret the script file, and any args needed must also be provided. For example, the Windows command to use PowerShell to execute the commands in a file is: powershell -File <filename>, and so the shebang line for this script would need to be #!win powershell -File. In the absence of a shebang line the script contents will be dumped into a .cmd file and executed. This approach should work for any script whose commands can be read from a file, and has been tested with PowerShell (classic and core), cmd files, and python scripts. All this work is done in a “place-scripts” container, which is added as one of the Init Containers to the pod which runs the Task, and allows Windows scripts to perform in a similar way to Linux scripts.

Unforeseen problems before landing

During implementation we discovered some unforeseen problems that had to be solved before Windows workloads in Tekton could be landed. For example, as soon as we had Tekton Pipelines running on a mixed-OS cluster we encountered critical failures, since no Tekton Pipeline components had nodeSelectors. Specifically, Tekton Pipelines would not work on mixed-OS clusters. The problem was that core Tekton components and workloads were being scheduled to Windows nodes - which would not work. To solve this we added affinity rules to the Tekton Pipeline controller and webhook deployments, to ensure that Tekton Pipelines will perform the same way on a mixed-OS cluster as they do on a homogeneous Linux cluster. If users are running Tekton Pipelines on a mixed-OS cluster they must add nodeSelectors or nodeAffinity rules to their TaskRuns or PipelineRuns as needed.

Our second unforeseen problem was related to the build process used by the Tekton team. The Tekton team use ko to build the images, but when we started working on adding Windows support, ko did not support building Windows images. All our initial testing was done with a custom build of Tekton Pipelines, where we manually built a mixed-OS entrypoint image using Docker manifest, and pointed the Tekton Pipeline controller to that directly. As a result of our work on the Windows Support feature, Jason Hall of RedHat (one of the founders of the Tekton project), added support for mixed-OS images into ko, meaning that as of Tekton Pipelines 0.29 it is possible to build mixed-OS images - thus Tekton Pipelines is now “built for Windows”.

The happy world of building Windows Unreal Engine projects in Admiral

With Windows workloads working in Tekton Pipelines Admiral can now build both Windows and Linux Unreal Engine projects. If automated and scalable builds of your Unreal Engine projects appeals to you, check out the Admiral website for more information and to register your interest!