Optimizing the CI/CD Pipeline for Robot Framework Browser Using Pabot and RF Logs

One challenge in developing an open-source project that runs on users' machines is that there will be many different and unknown environments where the software should work. This puts pressure on testing all the combinations that one may think of. Everything should work on all the major operating systems (Windows, macOS, and Linux). Everything should work on the latest Python and Node.js versions and also on older ones.

Another thing more specific to Robot Framework library development is the fact that a proper library has many keywords. The Robot Framework Browser library currently has 135 keywords in version 18.9.1, meaning there is a need for many tests to cover that large user interface. There are around 700 test cases in the RF Browser test suite.

We could be testing across all combinations of:

Operating systems (Windows, macOS, Linux)
Python versions (3.9, 3.10, 3.11, 3.12)
Node.js versions (18.x, 20.x)

In GitHub Action config, this can be done with a matrix strategy:


strategy:
  matrix:
    python-version: [ "3.9", "3.10", "3.11", "3.12" ]
    os: ["windows-latest", "ubuntu-latest", "macos-latest"]
    node-version: ["18.x", "20.x"]

This setup counts up to 24 combinations over the set of 700 tests.

Strategic Optimization with Pabot, Pairwise Testing, and RF Logs

Pabot has an option to split the test set into shards and execute only one of those shards. A shard is a subset of the tests. For example, a test set could be split into 4 shards. When all these sets are done, then all the tests have been executed:


pabot --shard 1/4 .
pabot --shard 2/4 .
pabot --shard 3/4 .
pabot --shard 4/4 .

Sharding allows non-centralized distribution so that each GitHub runner can work independently without a central manager. In the current setup, tests are split into 4 shards:

Shard 1: 153 tests
Shard 2: 290 tests
Shard 3: 218 tests
Shard 4: 106 tests

This can again be done with the matrix strategy:


strategy:
  matrix:
    python-version: [ "3.9", "3.10", "3.11", "3.12" ]
    os: ["windows-latest", "ubuntu-latest", "macos-latest"]
    node-version: ["18.x", "20.x"]
    shard: [ 1, 2, 3, 4 ]

Pairwise Testing

Although sharding allows nice parallelism in GitHub Actions, going through all 24 combinations with 4 shards gives 96 GitHub runners. These easily consume all the runners available for the project.

By employing a pairwise testing strategy, the project tests combinations of parameters in pairs rather than all possible permutations. This significantly reduces the number of required runners while still effectively identifying most defects caused by parameter interactions.

The pairwise strategy is applied across:

3 operating systems (Ubuntu, macOS, Windows)
4 Python versions
2 Node.js versions
4 test shards

For generating the chosen combinations, I used allpairspy:


from allpairspy import AllPairs

parameters = [
    ["3.9", "3.10", "3.11", "3.12"],  # python-version
    ["windows-latest", "ubuntu-latest", "macos-latest"],  # os
    ["18.x", "20.x"],  # node-version
    [1, 2, 3, 4]  # shard
]

print("Pairwise combinations for GitHub Actions matrix:")
print("\nFormat: [python-version, os, node-version, shard]")
print("-" * 60)
for i, pairs in enumerate(AllPairs(parameters)):
    # Convert numbers to strings for consistent formatting
    formatted_pairs = [str(p) for p in pairs]
    print(f"{i+1:2d}: {formatted_pairs}")

# Generate GitHub Actions matrix format
print("\nGitHub Actions matrix format:")
print("-" * 60)
print("strategy:")
print("  matrix:")
print("    include:")
for i, pairs in enumerate(AllPairs(parameters)):
    print(f"      - python-version: '{pairs[0]}'")
    print(f"        os: '{pairs[1]}'")
    print(f"        node-version: '{pairs[2]}'")
    print(f"        shard: {pairs[3]}")

This gives around 14–17 sets to run, significantly less than 96 while still ensuring each test is run against each OS, Node.js version, and Python version.

Test Result Visualization with RF Logs

After execution, test results are automatically uploaded to RF Logs, a service for sharing and analyzing Robot Framework logs and reports. One key benefit is that the log and report are directly served and linkable from the web service, simplifying the process of referring to files and discussing them in places like Slack or Jira.

RF Logs organizes test results with metadata tags (like OS, Python version, Node.js version, shard number, and branch name). This simplifies filtering and investigating issues across different environments and test configurations. Detailed timing information also gives the ability to further optimize test run times.

Here is the actual configuration from RF Browser workflow:


- name: Install rflogs
  if: always()
  run: |
    pip install rflogs
- name: Upload test results to RF Logs
  if: always()
  env:
    RFLOGS_API_KEY: ${{ secrets.RFLOGS_API_KEY }}
  working-directory: ${{ github.workspace }}/atest/output
  run: |
    rflogs upload --tag branch:${{ github.head_ref || github.ref_name }} --tag shard:${{ matrix.shard }} --tag os:${{ matrix.os }} --tag python-version:${{ matrix.python-version }} --tag node-version:${{ matrix.node-version }} || true

Conclusion

Optimizing your CI/CD pipeline doesn't require testing every possible environment combination. By strategically selecting test cases using pairwise testing, leveraging tools for parallel execution, and utilizing result visualization, you can maintain high quality without sacrificing efficiency. The Robot Framework Browser project demonstrates this by effectively managing 767 tests across multiple environments and shards, ensuring comprehensive coverage while keeping the testing process streamlined and efficient.

You can check the complete GitHub configuration and runs from here.