We cannot decompress from the compute queue. While I'm pretty sure
VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL is only useful on the
graphics queue, I cannot find a VU that prevents the transition
from happening on another queue, so we need to be careful here.
This patch ensures we do the decompression on the barrier that changes
the queue ownership.
Another problem was that DCC images were considered fast-clearable
when not DCC compressed, which resulted in a mess with concurrent
queue ownership.
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3387
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6252>
(cherry picked from commit e362ccb20c)
Currently Linux kernel gathers performance counters at fixed
intervals (~5-10ms), so if application uses AMD_performance_monitor
extension and immediately after glFinish() asks GL driver for HW
performance counter values it might not get any data (values == 0).
Fix this by moving the "read counters from kernel" code from
"is query ready" to "get counter values" callback with a loop around
it. Unfortunately it means that the "read counters from kernel"
code can spin for up to 10ms.
Ideally kernel should gather performance counters whenever we ask
it for counter values, but for now we have deal with what we have.
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5788>
(cherry picked from commit 2fbab5a1b3)
The code applied the restriction too late, which could overflow LDS size,
which started happening more often after the minimum vertex count was
increased for Sienna.
Incorporate the clamping into the previous code for rounding up the counts.
Now the LDS size can never overflow, but it may use vector lanes less
efficiently (max_gsprims can be decreased more), which will be addressed
in the next commit.
Fixes: 4ecc39e1aa ("radeonsi/gfx10: NGG geometry shader PM4 and upload")
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6137>
(cherry picked from commit 64c741ffb7)
One day, we may want copy_prop_vars or other passes to be able to see
through certain types of casts such as when someone casts a uint64_t to
a uvec2. However, for now we should just avoid casts all together.
Fixes: d8e3edb784 "nir/deref: Support casts and ptr_as_array in..."
Tested-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6072>
(cherry picked from commit 611f654fcf)
It's possible an SSA value depends on a register; in this case, chasing
the source would result in a crash as the chase helper in NIR asserts
is_ssa. Instead we should check a priori that all the argments are in
fact SSA, bailing otherwise.
In the piglit shader exhibiting this bug (by looping over the index),
bailing on the ishl instruction is -necessary-. This is not merely us
being cowardly to avoid seeing through the registers; indeed, if we
wrote away the ishl instruction, the shift itself would have to be
stored in a load/store register (r26/r27) which would preclude reading
it in the loop, creating a register allocation failure later in the
compile. So this is the correct solution due to the restricted
semantics.
Closes#3286
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reported-by: Icecream95 <ixn@keemail.me>
Fixes: f5401cb886 ("pan/midgard: Add address analysis framework")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6144>
(cherry picked from commit b2f475251e)
This effectively reverts part of 2907faee, which changed dri2_make_current() to
always take a dri2_dpy reference regardless of whether or not a new context or
surface(s) were being bound. This led to a reference count imbalance as there
was no corresponding code added to drop a reference on the dri2_dpy. As a
consequence, any application that called eglInitialize() on a default/native
display after having called eglTerminate() would always get back the old
dri2_dpy, inheriting its previous state.
As the reference count is there to prevent the dri2_dpy from being destroyed
between eglTerminate() and eglInitialize() calls when a context is still bound,
a reference should only be taken when a successful call to
dri2_dpy->core->bindContext() has been made. Fix the issue by restoring the old
reference counting behaviour.
Fixes: 4e8f95f64d ("egl_dri2: Always unbind old contexts")
Fixes: 2907faee7a ("egl/dri2: try to bind old context if bindContext failed")
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Tested-by: Nicolas Cortes <nicolas.g.cortes@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3328
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6105>
(cherry picked from commit d0e32e5f81)
We were space-leaking iris_compiled_shader objects, leaving them around
basically forever - long after the associated iris_uncompiled_shader was
deleted. Perhaps even more importantly, this left the BO containing the
assembly referenced, meaning those were never reclaimed either. For
long running applications, this can leak quite a bit of memory.
Now, when freeing iris_uncompiled_shader, we hunt down any associated
iris_compiled_shader objects and pitch those (and their BO) as well.
One issue is that the shader variants can still be bound, because we
haven't done a draw that updates the compiled shaders yet. This can
cause issues because state changes want to look at the old program to
know what to flag dirty. It's a bit tricky to get right, so instead
we defer variant deletion until the shaders are properly unbound, by
stashing them on a "dead" list and tidying that each time we try and
delete some shader variants.
This ensures long running programs delete their shaders eventually.
Fixes: ed4ffb9715 ("iris: rework program cache interface")
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6075>
(cherry picked from commit 128cbcd3a7)
Fixes some dEQP-VK.renderpass2.* flakes. Valgrind:
Test case 'dEQP-VK.renderpass2.dedicated_allocation.attachment.8.724'..
==754520== Conditional jump or move depends on uninitialised value(s)
==754520== at 0x575B21C: radv_layout_is_htile_compressed (radv_image.c:1690)
==754520== by 0x572F470: radv_handle_depth_image_transition (radv_cmd_buffer.c:5855)
==754520== by 0x572F2F2: radv_handle_image_transition (radv_cmd_buffer.c:6123)
==754520== by 0x572EEC6: radv_handle_subpass_image_transition (radv_cmd_buffer.c:3385)
==754520== by 0x572A104: radv_cmd_buffer_begin_subpass (radv_cmd_buffer.c:4843)
==754520== by 0x572A007: radv_CmdBeginRenderPass (radv_cmd_buffer.c:4913)
==754520== by 0x572A197: radv_CmdBeginRenderPass2 (radv_cmd_buffer.c:4921)
Why false?
A renderloop happens when the same attachment is both used as input
attachment and output (color, ds) attachment in a subpass. Of course
this doesn't happen outside of a renderpass and hence we can initialize
it to false at the start of the renderpass.
Fixes: 66131ceb8b "radv: Pass through render loop detection to internal layout decisions."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3074
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6068>
(cherry picked from commit 18fe130ec9)
This avoids some performance regressions on Gen12 platforms caused by
SIMD32 fragment shaders reported in titles like Dota2, TF2, Xonotic,
and GFXBench5 Car Chase and Aztec Ruins.
The most obvious pattern in the regressing shaders I identified among
these workloads is that they all had non-uniform discard statements,
which are handled rather optimistically by the current IR analysis
pass: No penalty is currently applied to the SIMD32 variant of the
shader in the form of differing branching weights like we do for other
control flow instructions in order to account for the greater
likelihood of divergence of a SIMD32 shader.
Simply changing that by giving the same treatment to discard
statements as we give to other branching instructions seemed to hurt
more than it helped on platforms earlier than Gen12, since it reversed
most of the improvement obtained from SIMD32 fragment shaders in
Manhattan for no measurable benefit in other workloads (Manhattan has
a handful of shaders with statically non-uniform discard statements
which actually perform better in SIMD32 mode due to their approximate
dynamic uniformity). For that reason this change is applied to Gen12+
platforms only.
I've been running a number of tests trying to understand the
difference in behavior between Gen12 and earlier platforms, and most
of the evidence I've gathered seems to point at EU fusion being the
culprit: Unlike previous generations, on Gen12 EUs are arranged in
pairs which execute instructions in lockstep, giving an effective warp
size of 64 threads in SIMD32 mode, which seems to increase the
likelihood for control flow divergence in some of the affected shaders
significantly.
Fixes: 188a3659ae "intel/ir: Import shader performance analysis pass."
Reported-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5910>
(cherry picked from commit 4d73988f6f)