We can't exceed c_int::MAX, because the CTS casts to ints in a few places.
We also need to take into account max pixel size when restricting to
max_mem_alloc as this cap is pixel based, not byte based.
Cc: mesa-stable
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30739>
(cherry picked from commit eef1af8128)
vkcube, vkgears and vkmark are crashing with the following error/segfault:
$ NVK_I_WANT_A_BROKEN_VULKAN_DRIVER=1 vkcube
WARNING: NVK is not a conformant Vulkan implementation, testing use only.
Selected GPU 0: GeForce GT 640 (NVK GK107), type: DiscreteGpu
ERROR: couldn't get DataFile for op ldc_nv
Segmentation fault (core dumped)
Handling nir_intrinsic_ldc_nv as per nir_intrinsic_load_ubo in Converter::getFile()
allows to run vkcube, vkgears and vkmark on Nvidia GT640
Fixes: dc99d9b2 ("nvk,nak: Switch to nir_intrinsic_ldc_nv")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30832>
(cherry picked from commit 7b32df696e)
Make sure to preserve the depth or stencil components of D24S8 using the
fixed codepath just added. While we're here, fix the detection of
whether an attachment is bound.
Fixes: cb0f414b ("tu: Add support for suspending and resuming renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26154>
(cherry picked from commit 812c8f6abe)
We need to make sure that we don't trash a passthrough depth/stencil
aspect if we need to store the whole attachment by loading it
beforehand.
Fixes: cb0f414b ("tu: Add support for suspending and resuming renderpasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26154>
(cherry picked from commit 5377219ca0)
At least one ray tracing shader in cp2077 is over 4MB on Xe2. There
isn't a memory pool large enough for the allocation, so the driver
crashes instead. This commit adds 8MB and 16MB pools.
I intend this as a stop gap fix. I would prefer to figure out why this
shader is so much larger than on previous platforms. The shader in
question has 3824 spills and 8625 fills. That is not good. I suspect
dealing with that will also solve the problem, but that will require a
bit more time.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11739
Suggested-by: Lionel Landwerlin
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30751>
(cherry picked from commit 09cf9fe8ab)
We were currently emitting logical atomic instructions with a packed
destination region for sub-dword LSC atomics, along the lines of:
> untyped_atomic_logical(32) dst<1>:HF, ...
However, these instructions use an LSC data size D16U32, which means
that the 16b data on the return payload is expanded to 32b by the LSC
shared function, so we were lying to the compiler about the location
of the individual channels on the return payload, its execution
masking, etc. This is why the hacks that manually set the
'inst->size_written' of the instruction were required.
In some cases this worked, but any non-trivial manipulation of the
instruction destination by lowering or optimization passes could have
led to corruption, as has been reproduced in deqp-vk during
lower_simd_width() for shaders that use 16-bit atomics in SIMD32
dispatch mode.
Note that LSC sub-dword reads aren't affected by this because they use
raw UD destinations and specify the actual bit size of the operation
datatype as the immediate SURFACE_LOGICAL_SRC_IMM_ARG, which doesn't
work for atomic operations since that immediate specifies the atomic
opcode.
Instead, have the logical operation implement the behavior of 16-bit
destinations correctly instead of silently replacing the 16-bit region
with an inconsistent 32-bit region -- This is done by emitting the MOV
instructions used to pack the data from the UD temporary into the
packed destination from the lower_logical_sends() pass instead of from
the NIR translation pass.
Fixes: 43169dbbe5 ("intel/compiler: Support 16 bit float ops")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30683>
(cherry picked from commit 71ca8529c5)
this pbo shader works by iterating over the framebuffer size
and storing a value to an offset for each source pixel. if the
number of pixels being written out does not correspond to fragcoord
to the extent that certain source pixels are not written at all, however,
then this method should not be used in order to avoid giving broken results
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30689>
(cherry picked from commit c2dcecffc5)
SBID SET can only be used on SEND, SENDC, or DPAS instructions. The
existing code was handling SET for SEND/SENDC, but was using the wrong
encoding for DPAS. Add a new case to handle that and make it clear that
the existing code is only for SEND/SENDC.
While here, rewrite the encoder to use 2-bit binary immediates shifted
up into the mode [9:8] field, rather than pre-shifted hex values. This
matches the documentation better and is a little easier to follow.
On the decode side, we were incorrectly decoding MATH instructions.
Because they're marked is_unordered, we were hitting the SEND/SENDC
decoding, which is incorrect for MATH.
Fixes 22 cooperative matrix tests on Lunar Lake.
Huge thanks to Paulo Zanoni for bisecting failures to one of my commits,
then analyzing shaders and experimenting to discover that the failure
was really an unrelated bug, just being provoked by different choices of
registers. His work narrowing the problem down made it much easier to
discover and fix this bug.
Backport-to: 24.2
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30705>
(cherry picked from commit d22d6d814d)