Commit Graph

112 Commits

Author SHA1 Message Date
Guilherme Gallo
e1d54be524 ci/lava: Avoid eval when generating env script
Remove use of `eval` when writing `dut-job-env-vars.sh`, as it's
unnecessary. The script only needs to declare variables, not evaluate
them.

Using `eval` introduces parsing issues when variables contain both
single and double quotes, such as in commit titles. Example:
https://gitlab.freedesktop.org/mesa/mesa/-/jobs/77995175#L3188
This job failed to parse `CI_COMMIT_TITLE` and `CI_MERGE_REQUEST_TITLE`
correctly due to mixed quoting in:

    Revert "ci: disable Collabora's farm due to maintenance"

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35421>
2025-06-10 18:19:21 -03:00
Guilherme Gallo
9024e0df83 ci/lava: Don't fail if the section times mismatches
Time drift can occur during LAVA job execution due to transitions
between three different clocks.

The process begins in the GitLab job [1], using the CI_JOB_STARTED_AT
variable. If SSH is enabled, we then connect to the DUT through an
Alpine-based SSH client container inside the LAVA dispatcher [2], where
some GitLab-related steps are timestamped by lava_job_submitter.
Finally, the DUT [3] runs and uses the setup-test-env.sh helper to
handle GitLab sections, potentially using a third distinct clock.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35222>
2025-06-05 22:32:37 +00:00
Guilherme Gallo
19357b9a84 ci/lava: Fix type hint errors in GitlabSection
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35222>
2025-06-05 22:32:36 +00:00
Guilherme Gallo
bfca9fbbb3 ci/lava: SSH tweaks
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35126>
2025-06-04 15:43:40 +00:00
Eric Engestrom
5a5b00cfca ci: drop unneeded printing of pass/fail alongside the exit_code
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35214>
2025-05-29 07:29:25 +00:00
Valentine Burley
f6dce6dee1 ci: Add a minimal Alpine container for running LAVA jobs
Compared to the existing Debian-based x86_64_pyutils container, this
Alpine-based variant reduces the image size by approximately 83%.

Include all the necessary python artifacts, including lava_job_submitter
in the container to avoid having to download them at the start of each
test job.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34980>
2025-05-26 17:25:40 +00:00
Valentine Burley
8b37cfae2e ci/lava: Forward environmental variables to DUT directly
Instead of uploading the environmental variables to S3, append it to the
job definition instead.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35051>
2025-05-26 15:30:47 +00:00
Valentine Burley
ffe8a2e023 ci/lava: Use init-stage2 and setup-test-env.sh from Mesa install
init-stage2.sh and setup-test-env.sh are already downloaded on the DUT as
part of the mesa-build overlay, which downloads the Mesa artifacts from
S3.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35051>
2025-05-26 15:30:47 +00:00
Guilherme Gallo
e9e98d997d ci/lava: Parametrize message burst length on unit tests
We can have jobs with a lower job timeout values, given by
CI_JOB_TIMEOUT environment variable, such as the pytest ones.

The previously hardcoded burst length of 1000 messages at a simulated
rate of 1 msg/sec caused tests to exceed these timeouts and fail
unexpectedly on specific job timeouts.

Reported-by: Eric Engestrom <eric@igalia.com>
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34907>
2025-05-19 22:44:21 +00:00
Guilherme Gallo
e7f6b4bdae ci/lava: Improve timeout estimation logic for case/suite runs
Some jobs, like those using pytest, have lower `CI_JOB_TIMEOUT` values.
This change ensures that the estimated LAVA overhead (in minutes) is
compatible with the actual job timeout, failing early with an assertion
and also avoiding mismatches and unintended timeouts.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34907>
2025-05-19 22:44:21 +00:00
Valentine Burley
34012d5af3 ci: Remove EXTERNAL_KERNEL_TAG variable
The EXTERNAL_KERNEL_TAG variable is no longer needed.
For LAVA and bare-metal, we can override the KERNEL_TAG variable to fetch
both the kernel image and modules from a different tag than the default
mainline gfx-ci/linux kernel.

For LAVA, this also avoids the issue where jobs using EXTERNAL_KERNEL_TAG
would still have mainline kernel modules downloaded by the LAVA overlay.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34873>
2025-05-09 07:18:53 +00:00
Valentine Burley
53c7a04d12 ci/lava: Ensure firmware directory exists before downloading a660_zap.mbn
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34451>
2025-04-28 20:08:32 +00:00
Valentine Burley
da71656dd9 ci/lava: Merge and deduplicate log sections
Drop the duplicated rootfs preparation sections, and combine the
TEST_SUITE and TEST_DUT_SUITE sections.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34523>
2025-04-16 21:34:35 +00:00
Valentine Burley
70b033d2ad ci/lava: Don't include the timeout in the log sections
Reduces visual clutter in the job logs.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34523>
2025-04-16 21:34:35 +00:00
Valentine Burley
f7224dd159 ci/lava: Collapse more log sections
These sections were not collapsed, causing the setup sections to clutter
the job logs.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34523>
2025-04-16 21:34:35 +00:00
Guilherme Gallo
fb224e9016 ci/lava: Fix LAVA lima jobs
lima uses a different version from other farms, where some log output
patches were not delivery yet, so let's use a temporary fix to make
those job traces look as nice as the other ones.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33994>
2025-03-13 00:45:59 -03:00
Guilherme Gallo
0330522e99 ci/lava: Fix LAVA lima jobs
lima uses a different version from other farms, where some log output
patches were not delivery yet, so let's use a temporary fix to make
those job traces look as nice as the other ones.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33994>
2025-03-13 03:31:31 +00:00
Guilherme Gallo
8fcc52b8d7 ci/lava: Don't print empty lines when changing sections
Make `print_log` section-aware to stop printing newlines whenever a
section changes.
This also caught a bug: the `handle_exception` was sending an exception
type to the `print_log`, now it is fixed.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33906>
2025-03-10 05:44:25 +00:00
Guilherme Gallo
422e65557d ci/lava: Tweak timeouts
LAVA actions follow a hierarchical structure, where most subactions have
their timeouts overridden if the parent action supports a retry
mechanism, such as the `depthcharge-retry` action.

The timeout is calculated as: [1]

```
parent action timeout / failure_retry value
```

To adjust a subaction's timeout, we need to modify the nearest parent
action.

[1]
https://gitlab.collabora.com/lava/lava/-/blob/collabora/production/lava_dispatcher/action.py#L149

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33906>
2025-03-10 05:44:25 +00:00
Guilherme Gallo
a33c0e1867 ci/lava: Split boot action into deploy and boot
The boot action was wrapping the deploy action, which could cause
timeout misalignment. For example, the boot `GitlabSection` timeout was
shorter than the deploy timeout in LAVA, leading to cases where LAVA
jobs were canceled during their own retry mechanism.

By splitting these actions, we can align the timeouts properly,
preventing interference and unnecessary job cancellations.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33906>
2025-03-10 05:44:25 +00:00
Guilherme Gallo
d85af615f9 ci/lava: Remove depthcharge-start timeout
It has no effect, as it is overridden by depthcharge-retry timeout /
failure_retry.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33906>
2025-03-10 05:44:25 +00:00
Guilherme Gallo
47659ddf70 ci: Simplify LAVA farm detection
Refactor the LAVA farm detection to use a simpler environment
variable-based approach:
- Remove the complex regex-based farm detection
- Replace LavaFarm enum with a simple string-based farm identification
- Update related tests and job definition logic
- Remove hypothesis testing dependency

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33888>
2025-03-08 02:45:02 +00:00
Guilherme Gallo
fbc55afbdf ci/lava: Properly detect VMWARE farm
This will make the job definition default to the UART format for vmware
jobs, as only Collabora's farm relies on the SSH job definition due to
the unreliable Chromebook UART in LAVA.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33874>
2025-03-04 16:46:17 +00:00
Guilherme Gallo
1dbebd2619 ci/lava: Add U-Boot action timeout for rockchip DUTs
Add a specific timeout for the U-Boot action in LAVA job definitions for
rockchip devices. This ensures sufficient time for U-Boot to download
the kernel and set up early network, preventing potential job failures
due to timeout constraints.

This behavior started to happen since LAVA 2025.02 version.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33839>
2025-03-04 01:17:50 +00:00
Guilherme Gallo
1169f704d3 ci/lava: Propagate errors in SSH tests
The `lava_ssh_test_case` wrapper was missing the `set -e` shell option,
which made LAVA system interpret the job was succeeding, because the
`container` namespace was exiting normally, even though the `dut`
namespace was failing.

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33839>
2025-03-04 01:17:50 +00:00
Guilherme Gallo
02a86b3284 ci/lava: Drop the repeating quotes on lava-test-case
LAVA was recently patched [1] with a fix on how parameters are parsed in
`lava-test-case`, so we don't need to repeat quotes to send the
arguments properly to it.

[1] 18c9cf7976

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33839>
2025-03-04 01:17:50 +00:00
Valentine Burley
b2105fe162 ci/lava: Allow passing extra cmdline arguments
The LAVA_CMDLINE variable is appended to extra_nfsroot_args.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33282>
2025-02-05 14:01:03 +00:00
Valentine Burley
61d9c47944 ci/lava: Use CI_JOB_TIMEOUT instead of separate variable
The CI_JOB_TIMEOUT variable is the GitLab-defined job timeout in
seconds.
Use this variable in LAVA instead of the separate JOB_TIMEOUT,
which was intended to represent the test phase timeout (job timeout
minus 5 minutes), but was often overlooked.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32609>
2024-12-18 09:23:27 +00:00
Guilherme Gallo
b2c2f0d187 ci/lava: Set default exit code to 1 for failed jobs
Sets the default exit code to 1 to ensure the GitLab job fails when the
LAVA job fails or is interrupted. Adds tests to verify the exit code is
correctly set based on the logs or the lack of them (unexpected
finishing: timeouts and canceling).

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32163>
2024-11-21 04:10:52 +00:00
Guilherme Gallo
bc86b73bbe ci/lava: Fix lava-tags parsing
python-fire auto-converts `item1,item2` into a tuple, but if there is a
dash `-` inside the argument, it treats it as a string.

Let's validate the data, when it comes as a `str` or `tuple`.

For more details, here are the tested scenarios:

| --lava-tags= | Type  | Value               |
|--------------|-------|---------------------|
| None         | bool  | True                |
| ''           | str   | ''                  |
| tag1         | str   | "tag1"              |
| tag1,        | tuple | ("tag1",)           |
| tag-1,tag-2  | str   | 'tag-1,tag-2'       |
| tag1,tag2    | tuple | ("tag1", "tag2")    |
| ','          | str   | ','                 |
| ',,'         | str   | ',,'                |
| 'tag1,,'     | str   | 'tag1,,'            |

See also:
https://google.github.io/python-fire/guide/#argument-parsing

Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
2024-10-31 18:00:27 +00:00
Daniel Stone
f44970173d ci/lava: Provide list of overlays to submitter
Instead of providing a hardcoded set of arguments, allow overlays to be
added to the submitter script. Passing Python dicts as a string
representation and relying on coercion from strings is far from great,
but fire doesn't give anything else, so.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
2024-10-31 18:00:27 +00:00
Daniel Stone
f32a2de26d ci/lava: Provide LAVA rootfs URL directly
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
2024-10-31 18:00:27 +00:00
Daniel Stone
2b3839c9c7 ci/lava: Use LAVA rootfs overlays for build/per-job
We compose the rootfs from a mixture of the base rootfs (exported from
the container build stage, currently lava_build.sh, which can be reused
as long as the container isn't rebuilt), the Mesa build overlay
(exported from the debian-* build job, which can be reused for every job
in that pipeline), and the per-job rootfs (containing job-specific
variables which cannot be reused).

Instead of having LAVA pull the base rootfs and separately downloading
the build/per-job parts on the DUT, get LAVA to compose the whole thing
by using overlays.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Co-authored-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
2024-10-31 18:00:26 +00:00
Daniel Stone
021d7d8b77 ci/lava: Remove duplicate build download
We already do it above.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31882>
2024-10-31 18:00:26 +00:00
Daniel Stone
d171f47f44 ci/lava: Coalesce post-processed job information
We can combine a section _and_ a print.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
9be46b29f0 ci/lava: Print relative timestamps in sections
Follow what the shell executor does and print the time since the job
started in the section header.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
8ee6241a8c ci/hw: Wrap pre-test setup in collapsed section
Most people don't care about environment variables and starting Weston.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
970f37be09 ci/lava: Change default section colour to cyan
This matches the sections from the shell prints, which are quite nice.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
3c7b53e27c ci/lava: Be a little less enthusiastic with bold
Some things can just fade away into the background.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
dead2b7e62 ci/lava: Fix colour definitions
All the foreground colours pass 1 to ANSI SGR, which sets bold. The
other arguments are either a colour from 30-37 (passed directly), or
38;5;nnn, where nnn is an extended RGB colour. It looks like most of the
definitions were cargo-culted from FG_RED, which correctly sets an
extended colour, because the arguments there were being parsed as
setting blinking, followed by 197 which was ignored as unknown.

Fix them to just set the original definition.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
3068279280 ci/lava: Truncate printed times
We don't need to go down to the microsecond.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
65f05f2231 ci/lava: Explicitly pass UTC timezone
Rather than putting it into the environment, just pass it explicitly
every time we need a timestamp.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
2b4d468421 ci/lava: Hide more boot details into sections
Make sure we keep as much of the boot as we can behind sections, so
by default people only see the actual test run.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
964b979131 ci/lava: Add section for device wait
This way it's easier to see how long it took.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
586abb1e10 ci/lava: Break section-header print into separate function
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Daniel Stone
46c8423489 ci/lava: Remove pointless messages
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31602>
2024-10-20 11:32:42 +01:00
Vignesh Raman
d43fec5da9 ci/lava: set exit code in exception case
Set exit_code to 1 in case of an exception; otherwise,
the job exits with 0, and GitLab shows the job as successful.

Fixes: b9cee06f9e ("ci/lava: handle non-zero exit codes")
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31556>
2024-10-10 02:16:22 +00:00
Vignesh Raman
b9cee06f9e ci/lava: handle non-zero exit codes
The LAVA job submitter always exits with code 1, regardless
of the HWCI_TEST_SCRIPT's exit code. This commit fixes the
LAVA job submitter to exit with the actual code returned by
the test.

Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31189>
2024-09-20 10:29:39 +00:00
Martin Krastev
8bcd18c90e svga/ci: change DNS server for vmware jobs
Previous vmware-specific DNS server no longer valid.

Signed-off-by: Martin Krastev <martin.krastev@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30344>
2024-07-24 21:26:26 +00:00
Deborah Brouwer
72c182f873 ci/lava: Detect a6xx gpu recovery failures
Sporadically a6xx gpu will fail to recover causing the lava job
a660_vk_full to loop on error messages for three hours before timing
out.

A few sporadic error messages may still be recoverable, but when multiple
errors occur over a short period, successful recovery is unlikely. Parse
the logs to look for repeated error messages within a short time period.
If found, cancel the lava job and rerun it.

Also add unit tests for this behaviour.

cc: mesa-stable

Reported-by: Valentine Burley <valentine.burley@gmail.com>
Acked-by: Daniel Stone <daniel.stone@collabora.com>
Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30032>
2024-07-19 23:41:13 +00:00