this was only implemented for textures (I assume because drivers which implement
the corresponding intrinsic don't support multisampled images), but it's also
used for shader images
Fixes: 22fdb2f855 ("nir/spirv: Update to the latest revision")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9682>
(cherry picked from commit 50881d59e6)
The calculate_tess_lds_size function already returns the size in blocks
of the encoding granule, but we forgot to adjust config->lds_size.
This variable is not used to actually set the LDS size used for TCS,
but by ACO to make scheduling decisions.
Fossil DB stats on Sienna Cichlid:
Please note that the +3729.43% is NOT a regression.
The real LDS size used didn't change, it was just reported incorrectly.
Totals from 1342 (0.96% of 139391) affected shaders:
VGPRs: 60880 -> 80240 (+31.80%); split: -0.05%, +31.85%
CodeSize: 3378456 -> 3381224 (+0.08%); split: -0.23%, +0.31%
LDS: 687104 -> 26312192 (+3729.43%)
MaxWaves: 29794 -> 23962 (-19.57%)
Instrs: 644194 -> 644610 (+0.06%); split: -0.32%, +0.39%
Cycles: 2675068 -> 2676804 (+0.06%); split: -0.31%, +0.38%
VMEM: 428840 -> 517418 (+20.66%); split: +22.53%, -1.88%
SMEM: 91831 -> 88587 (-3.53%); split: +5.70%, -9.23%
VClause: 22740 -> 19384 (-14.76%); split: -16.18%, +1.42%
SClause: 19116 -> 18373 (-3.89%); split: -4.34%, +0.46%
Copies: 66662 -> 63448 (-4.82%); split: -5.55%, +0.73%
Fixes: cf89bdb9ba "radv: align the LDS size in calculate_tess_lds_size()"
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9557>
Vertex attribute bounds checking is supposed to be done per-attribute:
is_oob = index * stride + attrib_offset + attrib_size > buffer_size
but we were obtaining num_records by dividing the buffer size by the
stride, making it per-vertex:
is_oob = index * stride + (stride - 1) >= buffer_size
An example from Dead Cells (Wine) is:
attribute bindings: 0, 1, 2
attribute formats: r32g32, r32g32, r32g32b32a32
attribute offsets: 0, 0, 0
binding buffers: all the same buffer
binding offsets: 0, 8, 16
binding sizes: 128, 120, 112
binding strides: 32, 32, 32
Workaround this issue without switching to per-attribute descriptors by
rounding up the division. This is still incorrect, but it should now no
longer consider in-bounds attributes out-of-bounds.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3796
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4199
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9557>
Fixes several dEQP-VK.robustness.robustness2.* tests on GFX8. Generations
other than GFX8 don't fail the tests because bounds-checking is done using
the index (making it per-vertex).
fossil-db (Polaris):
Totals from 1387 (0.99% of 140385) affected shaders:
(no statistics affected)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 03a0d39366 ("aco: use MUBUF in some situations instead of splitting vertex fetches")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9557>
ACO attempts to store the output of an instruction in the same register
occupied by its operands where possible. Importantly this only works if
the operands are large enough to store the result register size. The code
failed to consider subdword operands when checking for this, causing
entire register slots to be freed up even though subdword parts were still
used.
In Mafia 3, this affected the following code:
v2b: %363:v[2][0:16], v2b: %362:v[2][16:32] = p_split_vector %360:v[2]
v1: %116:v[2] = v_cvt_f32_f16 %362:v[2][16:32]
v1: %117:v[2] = v_cvt_f32_f16 %363:v[2][0:16]
where v[2] is allocated to %116 even though its original lower 16 bits are
still used in the instruction after.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3717
Fixes: 031edbc4a5
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9557>
gl_MaxVaryingFloats was not removed from core until 4.20 and is still
available in compat shaders. Found while writing some new CTS to test
the correct declarations of this constant.
Fixes: 0ebf4257a385i ("glsl: define some GLES3 constants in GLSL 4.1")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9514>
(cherry picked from commit 684f97de80)
Before we introduced the submission thread in 829699ba63, once we
returned from vkQueueSubmit, all signaled syncobj would have a
i915_request/dma-fence waiting to be signaled by some work that would
submitted to HW by i915.
After this submission thread that is no longer the case. We added a
few checks in places like vkQueuePresentKHR() to wait for the binary
semaphores to materialize before we would hand things over to the WSI
code.
Unfortunately 829699ba63 forgot to reset the signaled binary
semaphore.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 829699ba63 ("anv: implement shareable timeline semaphores")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4276
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
(cherry-picked from commit cb74cd816c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9554>
Starting with d0d039a4d3, we emit writes to the push constant chunk
of the payload to stomp out-of-bounds data to zero for Vulkan. Then, in
369eab9420, we started emitting shader preamble code for emulated
push constants on Gen12.5 parts. In either of these cases, we can run
into issues if we don't have a proper live range for some of the payload
registers where they get used for something and then smashed by our push
handling code. We've not seen many issues with this yet because it only
happens when you have dead push constants.
Fixes: d0d039a4d3 "anv: Emit pushed UBO bounds checking code..."
Fixes: 369eab9420 "intel/fs: Emit code for Gen12-HP indirect..."
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9501>
(cherry picked from commit b9e9f92f73)
The container is moved from before and hence returns size 0. To get the
correct value, the new instruction container must be used instead.
This was flagged by clang-tidy. The fixed call still triggers the
corresponding diagnostic, hence this change silences it by adding a
redundant clear() after move.
Fixes: 7f1b537304 ("aco: add new NOP insertion pass for GFX6-9")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9432>
(cherry picked from commit 97c97781f6)