This fixes a compilation issue in Marvel Rivals where the legalization
logic and the encoding logic don't line up, which results in an
assertion failure on this instruction:
r17 = hfma2 r17.xx -r18.xx 0x3c003c00
The fix here is a little overly restrictive because it turns out we
actually do have modifiers for all 3 sources. Those modifiers will
be added in later commits.
Fixes: 567cae69c3 ("nak: Add 16-bits float operations")
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34750>
(cherry picked from commit 1ff7135691)
Most of the churn in this commit is changing unit tests that were
testing things that are now invalid.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17122204 -> 17122669 (<.01%)
instructions in affected programs: 120669 -> 121134 (0.39%)
helped: 0 / HURT: 124
total cycles in shared programs: 895602370 -> 895613210 (<.01%)
cycles in affected programs: 17868974 -> 17879814 (0.06%)
helped: 35 / HURT: 85
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210736518 -> 210743769 (+0.00%)
Cycle count: 30377733040 -> 30377699060 (-0.00%); split: -0.00%, +0.00%
Max live registers: 66056852 -> 66056966 (+0.00%)
Totals from 1505 (0.21% of 706776) affected shaders:
Instrs: 1890151 -> 1897402 (+0.38%)
Cycle count: 48397408 -> 48363428 (-0.07%); split: -0.11%, +0.04%
Max live registers: 256821 -> 256935 (+0.04%)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
(cherry picked from commit e26270249b)
When I originally wrote that code, I didn't understand what a jerk NaN
can be.
v2: Remove the brw_type_is_uint stuff. This function is currently only
called for float types. In a later commit, integer types will be
supported but only for NZ and Z conditions. Noticed by Matt.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17122197 -> 17122204 (<.01%)
instructions in affected programs: 1691 -> 1698 (0.41%)
helped: 0 / HURT: 4
total cycles in shared programs: 895602484 -> 895602370 (<.01%)
cycles in affected programs: 912964 -> 912850 (-0.01%)
helped: 2 / HURT: 2
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 210736388 -> 210736518 (+0.00%)
Cycle count: 30377728900 -> 30377733040 (+0.00%); split: -0.00%, +0.00%
Totals from 130 (0.02% of 706776) affected shaders:
Instrs: 169911 -> 170041 (+0.08%)
Cycle count: 18021210 -> 18025350 (+0.02%); split: -0.00%, +0.02%
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 020b0055e7 ("i965/fs: Propagate conditional modifiers from compares to adds")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34509>
(cherry picked from commit 0dab520a19)
A8_UNORM on v9+ is using RGBA8_UNORM as a pixel format with the
A8_UNORM clump format to dealing with the diffences between
RGBA8 and the actual A8 in-memory layout.
The problem is, LEA_TEX only loads the InternalConversionDescriptor
which contains only the pixel format, and that's what ST_CVT uses
to do the conversion, so we'll actually store 4 components instead
of one.
This shows up with
dEQP-VK.image.load_store.without_any_format.buffer.a8_unorm* after
enabling maintenance5.
For now I've turned off the image storage capability for A8_UNORM
on all gens, but I'd be fine disabling it only on v9+ if you think
that's preferable.
Fixes: d95423686f ("pan/format: Add PAN_BIND_STORAGE_IMAGE flag")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34648>
(cherry picked from commit 9d1262e108)
This is the most serious bug we've had in a long time due to a fundamental
misunderstanding of the hardware (due to incomplete reverse-engineering). It
caught me off guard.
The texture descriptor has "mode" bits which configure two aspects of how the
address pointer is interpreted:
* whether it is indirected, pointing to a secondary page table for sparse
* whether it writes texture access counters (for Metal's idea of sparse).
...Neither of these is a "null texture" mode.
So why did I see Apple's blob using a non-normal mode for null textures, and why
did I copy those settings?
1. Because the hardware texture access counters provide a cheap way to detect
null texture accesses after the fact, which I think their GPU debug tools
use. I'm not sure why release builds of the driver do/did that, but whatever.
2. Because I assumed Cupertino knew best and I didn't bother looking too close.
We can't use them here (without doing extra memory allocations), since then
the GPU will increment access counters. And since our null texture address used
to just be a pointer in the command buffer, that mean the GPU will trash
whatever memory happened to be 0x400 bytes after the start of the null texture
descriptor. The symptom being random faults.
This bug was caught when trying to use the zero-page instead, which raised a
permission fault when the GPU tried to write counts. Then I remembered the
sparse mechanism and had a bit of a eureka moment. Immediately followed by an
"oh, f#$&" moment as I realized how many random bugs could potentially be root
caused to this.
The fix is two-fold:
1. Use normal layout instead.
2. Set the address to the zero-page (which is a fixed VA) and detect null
textures by checking the address, instead of the mode.
The latter is a good idea anyway, but both parts needs to be done atomically to
maintain bisectability.
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34703>
(cherry picked from commit 3eb7575679)
Since headless overrides create_mem, it needs to override finish_create
too. Fixes a segfault in nvk that was caused by us mixing
wsi_create_null_image_mem with wsi_finish_create_blit_context, which
would then call CmdCopyImageToBuffer with image->blit.buffer == NULL
Fixes a cts failure on nvk in:
dEQP-VK.image.swapchain_mutable.headless.2d.r8g8b8a8_unorm_b8g8r8a8_unorm_clear_copy_format_list
and several others
Fixes: 579578f10a ("vulkan/wsi/drm: Break create_prime_image in pieces")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34646>
(cherry picked from commit 60452e016e)
Currently mesa-clc bundles OpenCL headers from Clang only if the static
LLVM is used (which means Clang / LLVM are not present on the target
system). In some cases (e.g. when building in OpenEmbedded environemnt)
it is desirable to have shared LLVM library, but skip installing the
whole Clang runtime just to compile shaders. Add an option that forces
OpenCL headers to be bundled with the mesa-clc binary.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34551>
(cherry picked from commit 419a9e9d42)
I don't understand why, but somehow the changes in e38631ad...3d9ac270
is causing this 1) in a file that has not changed, and 2) on lines that
are indented identically, with the `for` in the macro being terminated
with a `;` semicolon by the caller, so it looks all good to me.
Silencing this allows me to get the release through, but I probably
won't look back either, so hopefully there won't be a legitimate
instance of that warning in the future 😇
Xe2 changed the MOCS field in few instructions, those now have a field
for the MOCS index and other the encryption enable bit but ISL returns
the combination of both aka MEMORY_OBJECT_CONTROL_STATE.
To minimize changes I have added 2 macros to extract the values
from the value returned by isl.
From all the instructions changed Mesa only make use of two, so the
other instruction will be handled in the next patch.
Cc: mesa-stable
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34592>
(cherry picked from commit 161c412a82)
si_destroy_context needs to call context->set_debug_callback(...) to
avoid the debug logs to access the destroyed context.
Adding this change introduced a different problem: when an aux context
is destroyed from si_destroy_screen, parts of the screen have been
freed already: the shader_compiler_queue_*.
c467a87e06 ("radeonsi: Destroy queues before the aux contexts") moved
the util_queue_destroy calls above the context destruction, but with
the 59a3f38ff6 change, it's not needed anymore: si_destroy_context
will finish the screen shader queues before proceeding with releasing,
so use-after-free isn't possible.
Fixes: 59a3f38ff6 ("radeonsi: clear the debug callback on ctx destroy")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12035
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34574>
(cherry picked from commit 2a381bbc3c)
Letting the shader offset instanceID by baseInstance works only if
the divisor is one. If the divisor is greater than one, the firstInstance
parameter shouldn't be applied this divisor, but it currently is. Zero
divisors are also problematic, in that they will force use of the
instance zero attribute all the time.
The only way to fix that is to tweak the offsets of the per-instance
attributes instead, like is done in the JM backend.
Fixes: 1570f0172e ("panvk: Fix base_{instance,vertex} handling")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Olivia Lee <benjamin.lee@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34642>
(cherry picked from commit b2a8e3838d)
DRM_FORMAT_MOD_QCOM_COMPRESSED forces the image to be UBWC regardless
of what's better for perf, we should respect that.
The regression is seen in GTK4 when it tries to create tiny swapchain
images.
Fixes: fc50fb35b0
("tu,freedreno: Enable linear mipmap tail for UBWC images")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34628>
(cherry picked from commit 36f22cc951)
When doing the flushing, I forgot that because the staging buffer can be
used with different formats with different cpp, we need to make sure
that CCU is properly flushed and invalidated between each copy to the
staging buffer to prevent stale cache entries from creeping in, as the
CCU seems to rely on the cpp staying the same, even on a7xx which
dropped some of the other restrictions like using the same RT
index/layer. For "normal" user-visible copies this is done via
transitioning from UNDEFINED.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34611>
(cherry picked from commit ee10938bee)
We were forgetting to reset the map count to 0 in case of dyn_bufs in
create_copy_table.
This was causing invalid copy entries to be added to the table causing
invalid copies in most situation with holes in the set definition while
still binding set 0 or at worst an assert to be triggered in
cmd_fill_dyn_bufs.
This fixes "dEQP-GLES3.functional.ubo.*" and
dEQP-GLES31.functional.ubo.*" on PanVK+ANGLE.
Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com>
Fixes: e350c334b6 ("panvk: Extend the descriptor lowering pass to support Valhall")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34652>
(cherry picked from commit 8d2e16cc11)