fran/mesa - mesa - GNLUG git store

fran/mesa

Author	SHA1	Message	Date
Valentine Burley	f6dce6dee1	ci: Add a minimal Alpine container for running LAVA jobs Compared to the existing Debian-based x86_64_pyutils container, this Alpine-based variant reduces the image size by approximately 83%. Include all the necessary python artifacts, including lava_job_submitter in the container to avoid having to download them at the start of each test job. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34980>	2025-05-26 17:25:40 +00:00
Deborah Brouwer	72c182f873	ci/lava: Detect a6xx gpu recovery failures Sporadically a6xx gpu will fail to recover causing the lava job a660_vk_full to loop on error messages for three hours before timing out. A few sporadic error messages may still be recoverable, but when multiple errors occur over a short period, successful recovery is unlikely. Parse the logs to look for repeated error messages within a short time period. If found, cancel the lava job and rerun it. Also add unit tests for this behaviour. cc: mesa-stable Reported-by: Valentine Burley <valentine.burley@gmail.com> Acked-by: Daniel Stone <daniel.stone@collabora.com> Reviewed-by: Guilherme Gallo <guilherme.gallo@collabora.com> Signed-off-by: Deborah Brouwer <deborah.brouwer@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30032>	2024-07-19 23:41:13 +00:00
Guilherme Gallo	41cd32d10e	ci/lava: Broader R8152 error handling The r8152 error detection is now considering any order of the known patterns to detect variations of the r8152 issues during the test phase. This includes a small refactoring for eventual new issues. Additionally, adjusted the timing for setting the `start_time` in `test_lava_job_submitter.py` to ensure consistency and reliability in test execution, aligning the start time closer to the job submission process. With this fix, the bad state shown in the following job will be detected: https://gitlab.freedesktop.org/drm/msm/-/jobs/55033953 Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27688>	2024-02-20 00:48:24 +00:00
Guilherme Gallo	ffe2b31f9a	ci/lava: Detect hard resets during test phase Hard resets should not occur during the test phase. Therefore, let's detect them through specific log messages and raise an exception for a known issue if it occurs. Without this detection, the job will continue running on both Gitlab and LAVA until a timeout occurs. Real case: https://gitlab.freedesktop.org/mesa/mesa/-/jobs/53546660 Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26995>	2024-01-23 22:47:24 +00:00
Guilherme Gallo	de2c847c24	ci/lava: Detect r8152 issue during boot phase This week we found that the r8152 issue can happen during the boot phase, make the necessary adjustments to detect it. https://gitlab.freedesktop.org/vigneshraman/linux/-/jobs/53651940 Notes: - The kernel messages during the boot phase is being redirected to the feedback messages due to the namespaces from the SSH job. - Update the unit tests: - Add boot phase detection - Correctly set the boot phase when mocking LogFollower Reported-by: Vignesh Raman <vignesh.raman@collabora.com> Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27081>	2024-01-16 17:22:04 +00:00
Guilherme Gallo	bfd50f72eb	ci/lava: Turn the r8152 issue check into a counter We were just detecting if a log like [ 143.080663] r8152 2-1.3:1.0 eth0: Tx status -71 happened once before [ 316.389695] nfs: server 192.168.201.1 not responding, still trying But we can use a counter to be more assured that the device is struggling to recover and we can add let this detection happen during the boot phase. This mimics how other freedreno devices deal with this problem, see `cros_servo_run.py:64` for example. Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27081>	2024-01-16 17:22:04 +00:00
Guilherme Gallo	70f1291d8e	ci/lava: Add canceled job status We should be explicit that we are cancelling jobs once the script finds some log messages that are linked with known issues. That means the script preemptively retried the job without giving chances to recover. Adds magenta color to cancelled jobs. Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17389>	2022-07-08 12:26:05 +00:00
Guilherme Gallo	2c51b7a9c9	ci/lava: Detect R8152 issues preemptively and retry Implement a log-based retry hint for R8152 issue described in #6681, which is based on detecting these two consecutive lines: ``` r8152 <USB> eth0: Tx status -71 nfs: server <IP> not responding, still trying ``` Where <IP> and <USB> could be any IP and USB addresses, respectfully. This commit is a temporary fix since it requires a section-aware log follower, implemented in !16323. When the cited MR is merged, one will make a proper fix on top of that. Closes: #6681 Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17389>	2022-07-08 12:26:05 +00:00

8 Commits