The `stride` and `offset` attributes are meaningful for the "virtual"
register files (VGRFs, UNIFORMs and ATTRs). Accumulator is an ARF so
validation should check `hstride` (part of the <V,W,H> triple) and `subnr`
instead.
Fixes: 12d7aaf2b8 ("intel/compiler: add more validation for acc register usage")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28059>
(cherry picked from commit e324fbbe68)
This ensure the region triple <V,W,H> is set correctly, in this case the
desired region is a sequential like <8,8,1>. Without the helper the
sequence we get is <0,1,0> -- which the generator currently partially
adjusts when emitting code, but is not sufficient when doing validation
earlier.
The code generated code is slightly modified. From crucible test
func.shader.subtractSaturate.uint in the fragment shader for SIMD8, the
diff looks like
```
mov(8) acc0<1>UD g21<8,8,1>UD { align1 1Q $0.dst };
-add.sat(8) g22<1>UD -acc0<0,1,0>UD g16<8,8,1>UD { align1 1Q @1 $0.dst };
+add.sat(8) g22<1>UD -acc0<8,8,1>UD g16<8,8,1>UD { align1 1Q @1 $0.dst };
```
Note that without the patch generator adjusted the hstride for acc0 used
as destination (see brw_set_dest), but kept the src region as is. For
the source, it is not clear to me why the <0,1,0> would work correctly
here since it is a scalar, but using <8,8,1> it is correct.
Fixes: 58907568ec ("intel/fs: Add SHADER_OPCODE_[IU]SUB_SAT pseudo-ops")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28059>
(cherry picked from commit db8022dc4d)
It was correct for the parameters that the driver was using, but incorrect
for other parameters.
1. The address computation must multiply the workgroup size (wave size)
by num_mem_ops to fix the case when num_dwords_per_thread > 4.
2. nir_load_ssbo shouldn't set the number of components to 4 when
num_dwords_per_thread < 4.
Fixes: 6584088cd5 - radeonsi: "create_dma_compute" shader in nir
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28119>
(cherry picked from commit e99765df08)
this should yield more consistent results and avoid weird cases where
various formats are queried for things they don't support and won't use
Fixes: 9a412c10b7 ("zink: set all usage flags when querying sparse features")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28115>
(cherry picked from commit 8fa413fef0)
only certain formats are required to have the storage bit, so be more
tolerant of failure in the case where drivers actually check flags
and reject storage usage when it's actually unsupported
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28115>
(cherry picked from commit 61e5b6ad9d)
the existing guesswork during format selection for teximage is
accurate most of the time, but it's not accurate all of the time.
GL/ES each have a set of sized formats that are required to be
color renderable, and so any time one of these is allocated as a
texture, it MUST have the rendertarget usage bit attached so that
it can later be bound as a framebuffer attachment
an alternative might be to relax this and then try to do migration
to a different format/buffer later if necessary, but that's hard and
probably not actually as useful
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28055>
(cherry picked from commit 0f66589c2a)
This fixes an issue hit by one of darktable's kernels, where the sampler
argument got assigned the location of a dead kernel parameter turning it
into a zombie and leading us to trash the kernel input buffer's layout.
Fixes: 25b8a34b48 ("rusticl/kernel: inline samplers")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28121>
(cherry picked from commit 2df640c4f6)
If you look at the sampler message header on Gfx9+, you'll see that we
mostly only use 2 dwords (dw2 & dw3). DW2 has a bunch of sampler
parameters, DW3 is the sampler handle.
On Gfx9 we can micro optimize by copying r0 into the header because
the HW mostly doesn't care about other DWs. We just have to clear dw2
on non VS/FS stages.
On Gfx11+, we always have to do a careful copy of the r0.3 bits to
mask out the bottom unrelated bits. So there, just clearing the entire
header makes more sense.
On Xe2+, the dw4 of the header references the sampler feedback surface
handle and bit0 is a boolean to know whether to use that surface or
not. So it *REALLY* matters to have that as 0. If we copy r0, we'll
get random bits in dw4, leading to enable that surface.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28082>
(cherry picked from commit 75c6ad9907)
With RADV, when VS/TES and FS are compiled separately, the PrimitiveId
is exported unconditionally because it's not possible to know if the
FS reads it or not. This happens with fast-link GPL and shader object.
Though, the PrimitiveID should be ignored when it's implicitly exported
because otherwise the stream output LDS offset is incorrect.
This fixes a bunch of failures with transform feedback and Zink/RADV
when shader object is enabled on RDNA3.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27981>
(cherry picked from commit d12984edb8)
The algorithm used to rendering smooth lines worked under the assumption
that line coords were in the [0, 1] range. This was correct when using
an orthogonal projection, but not when using a perspective projection.
With a perspective projection (where the value for 1/Wc set in the VPM
is not 1.0), line coords values are also affected by this projection, so
the values are not in this range.
To deal with this, we normalize the line coords using the Wc value so
the range becomes [0, 1], and the smooth line rendering works as
expected.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10496
Fixes: ee4d51f8b2 ("v3d: Add a lowering pass for line smoothing")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28072>
(cherry picked from commit 69fbd5cb90)
The previous version had an optimization where, instead of actually
waiting on the FALCON to return, it would just do a bunch of nops in
some cases. This seems broken at least on Turing+ and results in
registers not ending up with the right values. It only really shows up
when you set two registers back-to-back in which case the second
SET_PRIV_REG may mess up the first.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27927>
(cherry picked from commit 0ed7bce8e5)
The driver is written that we should support ETNA_NUM_VARYINGS and reporting
a bigger number will cause some troubles. I had a quick look at galcore's
hw database and there are entries that report a higher value.
So I think what we want is to the minimum value of what kernel driver reports
and what the gallium driver should be able to handle.
Fixes: 84816c22e4 ("etnaviv: ask kernel for max number of supported varyings")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27923>
(cherry picked from commit 93255abe30)
As per spec, any colors, or color components, associated with a fragment
that are not written by the fragment shader are undefined.
So we might as well just write vec4(1.0) to output, since HW doesn't allow
us to have an empty FS.
Backport-to: 23.3
Backport-to: 24.0
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24855>
(cherry picked from commit 6998c48f77)
the ES spec imposes additional requirements for copy commands,
specifically that the formats have matching component sizes
the existing check used the driver's internal formats to check
for a match, which is broken since the spec requires the match be
between the passed internalFormat and the buffer's effective internal
format (i.e., this has no relation to what the driver supports)
fixes KHR-GLES3.copy_tex_image_conversions.forbidden* on a bunch of drivers
cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28030>
(cherry picked from commit 2cd192f879)
Batches must be ignored if batch count is zero, so all batch inspections
have to be gated behind batch count. For memcpy, it's UB if either src
or dst is NULL even when size is zero.
Side note:
- For original commit, this fixes just the memcpy UB
- For current codes, this fixes to not skip ffb batch prepare
Fixes: 493a3b5cda ("venus: refactor batch submission fixup")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28071>
(cherry picked from commit 8af267eb00)