Faster GitHub PR check time with multi-level matrix strategy
Posted on September 26, 2022 in dotnet
The Hazelcast .NET1 Client is provided as a library that is supported on .NET (Framework) version 4.6.2 to 4.8, .NET (Core) 3.1, and .NET (Just .NET) 5.0 and 6.0. For .NET Core and Just .NET, the library is supported both on Windows and on Linux. This means that we want to run our complete tests suite on both platforms, for each supported .NET version.
Originally, we defined a Build PR workflow that would trigger on each PR and build and run the tests for each .NET version supported by the platform, one after another. In order to test both platforms, we used an OS matrix strategy:
name: Build PR
on: pull_request
jobs:
build-pr:
name: Build PR (${{ matrix.os }})
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
os: [ ubuntu-latest, windows-latest ]
steps:
- name: Build and Test the PR
(etc)
Thanks to this, on each PR, two jobs would run in parallel: one for Linux, and one for Windows. However, on each platform, the tests for each .NET version would run sequentially. And thus... if the complete tests suite takes about 20 minutes to run, the Windows job would take net462
+ net48
+ netcoreapp3.1
+ net5.0
+ net6.0
= 5 times 20 minutes, or 1h40. Which is quite a lot of time to validate a PR.
Entering the Second Dimension
We want to reduce that time, and the idea is to parallelize each .NET version, in addition to platforms. On each PR, we would fork one job for .NET 5.0 on Linux, one job for .NET 6.0 on Linux, one job for .NET 4.8 on Windows, etc. The total build time would be approximatively the build time of one version or, in our example above, 20 minutes instead of 1h40.
GitHub supports 2-dimensions matrixes. For instance:
matrix:
os: [ ubuntu-latest, windows-latest ]
dotnet: [ net5.0, net6.0 ]
In our situation, however, the list of versions to put in the dotnet
dimension if not fixed and actually depends on the OS. We would need to write something akin to the following code, where get_dotnet_version
would be a function returning the .NET version corresponding to the OS:
matrix:
os: [ ubuntu-latest, windows-latest ]
dotnet: get_dotnet_versions(matrix.os)
Alas, GitHub does not support that type of construct. The only thing it supports is referencing a variable, such as an input, in the matrix. So we would need to be able to produce a unique variable, and yet its value would depend on matrix.os
... again, that type of construct is not supported. All in all, I could not find a way to get one matrix dimension to depend on the other, within one workflow.
Divide and Conquer
Now... what-if instead of trying to do everything in one workflow, we used two? Our first workflow looks like:
jobs:
getversions:
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ ubuntu-latest, windows-latest ]
outputs:
# beware! this cannot be dynamic = must match the OS matrix
versions-ubuntu-latest: ${{ steps.getversions.outputs.versions-ubuntu-latest }}
versions-windows-latest: ${{ steps.getversions.outputs.versions-windows-latest }}
steps:
- name: Determine DotNet Versions
id: getversions
shell: bash
run: |
VERSIONS=$(....)
echo "::set-output name=versions-${{ matrix.os }}::$VERSIONS"
buildtest:
name: Build&Test ${{ matrix.os }}
needs: getversions
strategy:
fail-fast: false
matrix:
os: [ ubuntu-latest, windows-latest ]
uses: ./.github/workflows/build-pr-on.yml
secrets: inherit
with:
os: ${{ matrix.os }}
versions: ${{ needs.getversions.outputs[format('versions-{0}', matrix.os)] }}
Let's go through this workflow. It contains two jobs, getversions
and buildtest
.
The first job, getversions
, runs a unique step that determines the versions based on the platform, and sets a job output variable accordingly. That variable contains a JSON list of versions, e.g. [ "net462", "net48" ]
. Actually, it sets one job output per platform, and I could not find a way to avoid hard-coding the platform names.
The second job, buildtest
, does not run any steps, but instead triggers another workflow, passing two input parameters: the platform, and the list of versions. And this is where we manage to specify versions that depend on the platform.
And then, the magic happens in that other workflow. First, in order to be "callable" with input parameters, it requires this on
statement:
on:
workflow_call:
inputs:
os:
required: true
type: string
versions:
required: true
type: string
And then its job is defined as:
jobs:
buildtest:
name: ${{ matrix.fwk }}
runs-on: ${{ inputs.os }}
strategy:
fail-fast: false
matrix:
version: ${{ fromJson(inputs.versions) }}
And there we have it: a build and test job that runs on the specified platform, for each specified versions. The last piece of the puzzle consists in wrapping in all into one single conclusion point. This is done via an additional job in the first workflow:
report:
name: Build&Test Result
runs-on: ubuntu-latest
if: always()
needs: buildtest
steps:
- name: report
shell: bash
run: |
if [ "${{ needs.buildtest.conclusion }}" == "success" ]; then
echo "All Build&Test checks completed successfully."
else
echo "At least one Build&Test check has failed."
echo "::error::At least one Build&Test check has failed."
fi
This job "needs" the buildtest
job, in other words it will only run once all our forked build & test jobs have completed. It always run, and succeeds or fails depending on the outputs of all the forked jobs. This means that this is what we see in GitHub actions:
The block to the left shows the two jobs that run, on per platform, and figure out the framework versions for each platform. The center block shows each combination of platform / version build - they all run in parallel. Finally, the block to the right shows the global status of the build.
That last "Build&Test Result" block can be used as a required check in our GitHub branch configuration, thus ensuring that all test combinations must be successful before any PR can be merged.
All in about 20 minutes, not 1h40!
-
or maybe it is the "Hazelcast dotnet Client" or "Hazelcast Dotnet Client", depending on the success of the #dropTheDot movement initiated by Khalid at a time the .NET Drama Advisory (or should it be the Dotnet Drama Advisory) was pretty low and he got bored. ↩
There used to be Disqus-powered comments here. They got very little engagement, and I am not a big fan of Disqus. So, comments are gone. If you want to discuss this article, your best bet is to ping me on Mastodon.