We apparently don't have anything else making sure that it's flushed in
between use as a render target and use as a texture source, so bypass-mode
depth texture sampling could get stale data.
Fixes consistent (as far as I could see) failures in FD_MESA_DEBUG=nogmem
on:
dEQP-GLES31.functional.texture.multisample.samples_*.use_texture_depth_2d
dEQP-GLES31.functional.stencil_texturing.render.depth24_stencil8_draw
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8146>
(cherry picked from commit e4cdbeab81)
We need to pick 1u vs 1.0f based on the type of the texture, just like for
normal samples. Move the decision up to the create_sampler_view, and use
that value from both sampler paths.
Cc: mesa-stable
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8012>
(cherry picked from commit 4ba884b814)
Commit 64d6f56ad2 ("panfrost: Allocate syncobjs in panfrost_flush")
aimed at optimizing the fencing logic but it looks it also broke the
fence-based synchronization in subtle ways.
Indeed, now that the fence only waits on a single syncobj, we're not
guaranteed that all jobs queued in panfrost_flush_all_batches() will
be done when the fence is signaled, because jobs at the top level
(those stored in the batches hashmap) have not inter-dependencies.
Commit 9e397956b0 ("panfrost: signal syncobj if nothing is going to
be flushed") made this even more apparent by signaling the fence right
away if nothing was left to be drawn in the current context, thus
ignoring any of the batches left to flushed in the ->batches map.
If we want to keep relying the existing kernel APIs there's clearly no
ideal solution here. We can either go back to the original fencing
mechanism where each fence contained an array of syncobjs to be tested
or serialize jobs that have no explicit dependencies so we know the last
submitted job will also be the last one to return. The orginal approach
has proven to add quite a significant overhead (caused by the amount of
ioctls and the time spent in kernel space to gather dma fences attached
to those syncobjs and test them). So let's go for the simple solution
where we have a single syncobj bound to the context which we update to
point to the last job out_sync every time we submit a top-level job.
This approach implies reworking the way we create fences since we
need to capture the syncobj state at the time the fence is created.
Unfortunately, there's not SYNCOBJ_CLONE ioctl, which forces us to
export/create/import a fence so we have a new object that's not
subject to changes done to the context syncobj.
If we want to further optimize the logic, we should probably explore
some of those options:
1/ Adding array based SYNCOBJ ioctls (SYNCOBJ_{CREATE,DESTROY,CLONE}_ARRAY)
so we can mitigate the cost of ioctls when we need to manipulate
arrays of syncobjs
2/ Support synchronization jobs. That is, jobs that have a NULL job chain
but an array of sync_in and a sync_out to allow creating
synchronization points
3/ Add syncobj aggregators so we only have to wait on one syncobj from
userspace. The syncobj aggregator would wait for all sub syncobjs to
be signaled before signaling the top-level one.
Fixes: 64d6f56ad2 ("panfrost: Allocate syncobjs in panfrost_flush")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7831>
(cherry picked from commit 29f938a0ec)
The recent change to install kernel modules for AMD included a sed job to
disable kernel modules in the defconfig. This somehow broke booting on
a307, except the commit failed to bump the arm64_test tag so it wasn't
noticed until the next uprev. (I didn't notice when landing the next
change to that container to add the deqp runner, because I didn't get a
git conflict on rebasing my tag bump so I didn't bump the tag again to
pull in the kernel changes and catch the fail).
I've spent a while trying to debug what's happened (including what
*should* be a replication of the kernel build on my local db410c) and come
up empty. Just punt and disable the AMD kernel module changes on
baremetal to fix it. Bump every container using lava_build.sh to make
sure we don't screw anything up with the script changes.
Fixes: 60c5729d16 ("ci: Distribute ADMGPU driver to LAVA as a module")
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6971>
(cherry picked from commit bf576b449e)
I found the C++ runner hard to develop on, and we had stability issues and
outstanding feature needs that made me want something I felt good about
hacking on. Thus, Rewrite It In Rust of the deqp runner.
The new runner includes:
- Skip lists don't reshuffle the test list.
- Known-flake handling without resorting to skip lists (fixing our main CI
reliability issue on a3xx right now).
- Per-thread Vulkan shader caches should speed up VK CI runtime.
- Tracking of crashes separate from fails (so we can see progress on that
front).
- Logging of deqp stderr spam (particularly assertion failures!) in the CI
log.
- Integrated QPA filtering so we don't have bash perf issues for it.
- Logging of what caselist to go look at for a given error report (in red,
so it's easier to find in your CI log).
- The code is 1/3 unit tests, and easy to extend for more coverage.
- Non-LAVA CI runs create a failures.csv in artifacts that you can check
in as your deqp-*-fails.txt file.
- Test runtime is included in results.csv so you can debug how to speed up
your CI job.
- Pretty summary at the end of the run of slow/flaky/failed tests.
Since this is a new runner with a different RNG, the test groups are
shuffled one more time. This seems to result in some panfrost T720
stability issues (See its new deqp-panfrost-t720-flakes.txt), and one new
flake in freedreno a630.
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7434>
As it needs firmware to probe, and we cannot bundle it within the kernel
image because it is incompatible with the GPL.
Currently we rebind the driver after boot but that's slow and fragile,
as unloads of DRM drivers aren't generally tested.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7420>
This will create code that is easier to combine into MADs/FMA when the
last component is 1.0.
nir_opt_algebraic_late has an optimization to do something similar but it
only works for inexact code, if the multiplication-by-1 optimization is
done before it and if the backend enables fuse_ffma.
fossil-db (Navi):
Totals from 85583 (74.64% of 114665) affected shaders:
SGPRs: 4556060 -> 4558596 (+0.06%); split: -0.07%, +0.12%
VGPRs: 3315060 -> 3312984 (-0.06%); split: -0.23%, +0.17%
SpillSGPRs: 13552 -> 13553 (+0.01%)
CodeSize: 184962756 -> 184431388 (-0.29%); split: -0.32%, +0.03%
MaxWaves: 1208693 -> 1209361 (+0.06%); split: +0.17%, -0.11%
Instrs: 35678819 -> 35361617 (-0.89%); split: -0.91%, +0.02%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5631>
This causes our TGSI to use far more temps, since NTT is currently not
releasing temps from registers. On the other hand, this interpreter is
already spectacularly slow, and if we wanted to go fast we should probably
write a scalar NIR intrepeter.
For now, using NTT means that we test that codepath in preparation for
switching TGSI-consuming HW drivers over, so that we can eventually
garbage collect st_glsl_to_tgsi.
As this is a major restructuring, there are some impacts on piglit:
- Several tests start assert failing about 64-bit NIR registers for temp
arrays not getting split to vec2s:
- fs-frexp-dvec4-variable-index.shader_test
- arb_gpu_shader_fp64/uniform_buffers/{vs,fs,gs}-array-copy.shader_test
- arb_gpu_shader_int64/execution/indirect-array-two-accesses.shader_test
- dEQP-GLES31.functional.primitive_bounding_box.wide_points.global_state.vertex_geometry_fragment.fbo_bbox_larger
starts crashing depending on various bits of state (previous tests run
before it, presence of valgrind, presence of glib's memcheck). Doesn't
seem really NTT-specific, added to flakes list with other GS flakes.
- Almost 200 fp64/int64-related tests start passing, mostly around i/o loayout.
shader-db:
total instructions in shared programs: 3492656 -> 3081674 (-11.77%)
total loops in shared programs: 1418 -> 1387 (-2.19%)
total temps in shared programs: 340041 -> 615527 (81.02%)
total const in shared programs: 3158970 -> 1528630 (-51.61%)
total imm in shared programs: 117586 -> 101349 (-13.81%)
Total CPU time (seconds): 430.36 -> 900.94 (109.35%)
FPS results:
glmark2 texture +7.32484% +/- 3.76528% (n=10)
glmark2 desktop:effect=shadow +20% +/- 0% (n=10)
glmark2 shadow +6.49351% +/- 3.65335% (n=7)
glmark2 conditionals +18.75% +/- 2.74658% (n=9)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3395>
I wasted a bunch of time today tracking down a spurious test results
change due to a driver invoking UB by running tests where NIR validation
had failed (instruction reading from components beyond vector size). If
we need to shrink our coverage to get runtimes down, it will still be
better to be catching validation errors in CI.
To keep the test jobs runtime under 10 minutes, I've split a530's gles2 to
two different jobs.
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7203>
as implied in the surrounding code, left_components can be 0 here, in which
case creating a left swizzle is unnecessary (and triggers an assert)
this moves a failing assert farther down the stack to a more useful location
when trying to pack e.g., struct[3] { dvec3; float; }
ref spec@arb_gpu_shader_fp64@execution@inout@vs-out-fs-in-s1-s2@3-dvec2-float
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7134>
To support Android drivers, we're going to want to be tracking that Mesa's
build succeeds on a real android toolchain. This still uses the android
stubs since these libs aren't in the NDK.
Note that I had to drop the Intel and AMD drivers currently: we don't have
LLVM cross-compiled for Android in this container, and I'm honestly hoping
ACO saves us from that. Intel has dependencies on libexpat, which AOSP
really doesn't want to bring in, and it looks to me like those dependencies
could be optional.
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6700>