Using the same buffer without a barrier actually can lead to data races as
drivers might not properly synchronize the content. Using the stream
uploader is a neat fix which prevents us from having to use a barrier but
still keep high throughput when launching kernels back-to-back.
Fixes: 5ff33f9905 ("rusticl: use real buffer for cb0 for drivers prefering")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27666>
(cherry picked from commit 8da8c6c2d8)
On pre-Valhall HW, the fragment shader metadata was part of the RSD
(renderer state descriptor), which was emitted at draw time, but
Valhall introduces a shader program descriptor containing only the
shader information, and this one is emitted at shader preparation
time.
If we don't add the FS state BO to batch, we might end up with a batch
being executed after the shader object has been destroyed, leading to
page faults when the GPU tries to access the shader program descriptor.
We make the panfrost_batch_add_bo() unconditional since it gracefully
handles the NULL case (which will happen on v7-).
Fixes: 087b63cb07 ("panfrost: Allow uploading fragment SPDs")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28926>
(cherry picked from commit 2cc317763c)
Currently ffmpeg has a bug in VAAPI AV1 decode that in some cases
it submits the same slice data buffer as many times as there is tiles.
However, in other cases it behaves correctly and all slice data buffers
contain different parts of bitstream to decode which this change breaks.
Now that the va frontend is passing correct offsets, this fixes decoding
AV1-TEST-VECTORS/av1-1-b8-22-svc-L1T2 with ffmpeg.
This reverts commit e6701f7231.
Cc: mesa-stable
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28960>
(cherry picked from commit 88dfe04b08)
The slice parameter data offset refers to offset in the submitted data buffer.
When multiple slice data buffers are submitted, the offsets of all slices needs
to be adjusted to be based from the start of the first data buffer.
Cc: mesa-stable
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28960>
(cherry picked from commit 6746d4df6e)
D16 rounds towards zero for fp32 -> fp16, but for fixed point it rounds to
nearest even in fp16. MIMG without D16 also rounds to nearest even, but in fp32.
This means D16 and f2f16_rtz(tex@32) can produce different results.
Sadly this also means we can never use d16 if fp16 rounding isn't undefined.
Cc: mesa-stable
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28730>
(cherry picked from commit 3a35522c8a)
Fixes fs-uint-to-float-of-extract-int8.shader_test and
fs-uint-to-float-of-extract-int16.shader_test added by piglit!883.
No shader-db or fossil-db changes on any Intel platform.
v2: Expand the comment explaining the potential problem. Suggested by
Caio.
Fixes: 29ce110be6 ("i965/fs: Remove extract virtual opcodes.")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27891>
(cherry picked from commit bf5d82654a)
When a job is submitted to the flush_queue the resource dt_idx is reset,
and if a readback is requested then we have to make sure that the
corresponding kopper_preset has finished before we can acquire the image
for readback, so wait for the according fence in this case.
This fixes the validation error UNASSIGNED-Threading-MultipleThreads-Write
triggered by piglit "read-front" lavapipe.
Fixes: 8ade5588e3
zink: add kopper api
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28127>
(cherry picked from commit 811ed62865)
if swapchain creation fails (e.g., insane cts swapchain configs), the
swapchain gets demoted to a non-window image that is still accessed by
the frontend. this image should not ever hit corresponding zink entrypoints
for swapchain-only images, which requires a flag to test swapchain-edness
cc: mesa-stable
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28904>
(cherry picked from commit a50c17802a)
optimized pipeline compile jobs may still be ongoing during ctx
destroy, and these must complete too or else crashes will occur
fixes shutdown crash with dEQP-EGL.functional.sharing.gles2.multithread.simple.images.texture_source.teximage2d_render
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28900>
(cherry picked from commit bd1a3921d1)
it's possible for a shader to be precompiling its separate shader variants
during destruction, which requires that the programs set be iterated
under lock in order to prune every new variant as it is created without
crashing
fixes crashes in spec@arb_separate_shader_objects@400 combinations.*
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28900>
(cherry picked from commit f18a1d3a31)
This array has 3 components, because it's meant to hold the X, Y and Z
components of the work-group size sysval. However, mir_pick_ubo assumes
vec4 for the push-uniforms, which ends up promoting this to 4
components.
So let's make sure we don't write that last component. It's not going to
do anything good.
In practice, this leads to the viewport descriptor being smashed, which
doesn't actually do any real-world harm, because this only happens in
compute batches where that descriptor is unused. However, writing
outside of arrays is undefined behavior, so we should fix it regardless.
Fixes: 5006167061 ("panfrost: Hook-up indirect dispatch support")
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28856>
(cherry picked from commit 186f7fa915)
There're many cases in which the ring submissions must succeed. We don't
worry about real oom since things would fail earlier. For simulated oom
from random intentional allocs, there isn't robust way to fail those
must succeeds. e.g. the commands that don't have return codes or valid
error return struct defaults. So real oom propagation is still at best
effort.
Cc: mesa-stable
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28914>
(cherry picked from commit 3e16d25d1a)
We were accidentally leaving XY_BLOCK_COPY_BLT's Source and Destination
MOCS fields set to 0 (Error: Reserved for Non-Use) on Gfx12.0 systems.
This was causing assert fails in debug builds, since we try to ensure
that we don't do that. In theory, MOCS 0 is supposed to be equivalent
to MOCS 2 (all the caching), but...we probably ought to use MOCS 3
(uncached). Every Gfx12.5+ platform requires it, so although there
isn't a note about Gfx12.0 needing that, it's possible that it does.
We're currently only using the blitter for DRI PRIME blits on Gfx12.0,
anyway, and I think we're flushing all the caches regardless.
This bug was somewhat obscure to hit:
- You need a hybrid graphics system with Gfx12.0 and some other GPU
- You have to be using "reverse PRIME", i.e. rendering on the integrated
GPU and displaying on the discrete one. This is not the common case.
- You have to be using a debug build.
No observable performance delta in GfxBench5 Car Chase (an arbitrary
program) when rendering on Alderlake GT1 and displaying on an Arc A770.
Fixes: 194afe8416 ("anv/iris/blorp: use the right MOCS values for each engine")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28894>
(cherry picked from commit e6fb3ba037)
This was missing and this caused test failures for formats different
than VK_FORMAT_R8_UINT which is the only one supported for FSR.
Fixes recent
dEQP-VK.api.info.unsupported_image_usage.*.fragment_shading_rate_attachment.*.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28893>
(cherry picked from commit e8d94536d2)
Bspec 57023: RENDER_SURFACE_STATE:: Shader Channel Select Red
"Render Target messages do not support swapping of colors with
alpha. The Red, Green, or Blue Shader Channel Selects do not
support SCS_ALPHA. The Shader Channel Select Alpha does not support
SCS_RED, SCS_GREEN, or SCS_BLUE."
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28791>
(cherry picked from commit 2d8686ccd5)
We track stencil clears and writes to optimize them. Unfortunately, the
code for doing this tracks the whole resource, not individual layers or
levels within the resource, which can result in incorrect output when
different levels or layers are accessed. Modified to optimize only the first
layer/level; this will handle the common case of a single stencil texture
while allowing arrays or mipmaps to still work (albeit slightly slower).
The original optimization was introduced in a2463ec271 ("panfrost:
Constant stencil buffer tracking") but the code has been reformatted
since then, so this change won't apply as-is that far back (although it's
fairly obvious how to apply it by hand).
Fixes: a2463ec271 ("panfrost: Constant stencil value tracking")
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28832>
(cherry picked from commit dae6b6a23d)
Since the group is created from the onset, we have to make
sure that four or eight src values don't have a readport
conflict, so force a pre-loading of the values to registers
evenly distributed over the channels and let copy-propagation
take care of cleaning up un-neccesary moves.
Fixes: 79ca456b48
r600/sfn: rewrite NIR backend
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28840>
(cherry picked from commit 07995b98a8)
For integer types, the signedness is determined by flags on the muladd
instruction. The types of the sources play no role. Previously we were
using the signedness of the type and ignoring the mask.
Adjust the types passed to the dpas_intel intrinsic to match.
Fixes various
dEQP-VK.compute.*.cooperative_matrix.khr_*.matrixmuladd_cross.* tests on
different Intel platforms. Some platforms had failing tests, and some
platforms failed EU validation before the tests could fail.
Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28822>
(cherry picked from commit 2ce558d928)
The pointer "xfb" is allocated with a clone of "so->nir" and lost
without further processing.
The function panfrost_shader_compile() was the one processing "xfb".
The call of this function was removed with the commit 40372bd720.
This makes "xfb" not required anymore.
For instance, this issue is triggered on a Mali-T820 with
"piglit/bin/arb_transform_feedback2-change-objects-while-paused -auto":
Indirect leak of 32776 byte(s) in 1 object(s) allocated from:
#0 0xf78f30a6 in malloc (/usr/lib/libasan.so.6+0x840a6)
#1 0xee9cd4ee in ralloc_size ../src/util/ralloc.c:118
#2 0xee9cf7ae in create_slab ../src/util/ralloc.c:801
#3 0xee9cf7ae in gc_alloc_size ../src/util/ralloc.c:840
#4 0xef74ab82 in nir_undef_instr_create ../src/compiler/nir/nir.c:888
#5 0xef76212c in clone_ssa_undef ../src/compiler/nir/nir_clone.c:328
#6 0xef76212c in clone_instr ../src/compiler/nir/nir_clone.c:438
#7 0xef7642d8 in clone_block ../src/compiler/nir/nir_clone.c:501
#8 0xef7642d8 in clone_cf_list ../src/compiler/nir/nir_clone.c:555
#9 0xef7657dc in clone_function_impl ../src/compiler/nir/nir_clone.c:632
#10 0xef766cb8 in nir_shader_clone ../src/compiler/nir/nir_clone.c:743
#11 0xf007673e in panfrost_create_shader_state ../src/gallium/drivers/panfrost/pan_shader.c:434
#12 0xeeb6766c in st_create_common_variant ../src/mesa/state_tracker/st_program.c:781
#13 0xeeb71d1c in st_get_common_variant ../src/mesa/state_tracker/st_program.c:834
#14 0xeeb72ea2 in st_precompile_shader_variant ../src/mesa/state_tracker/st_program.c:1320
#15 0xeeb72ea2 in st_finalize_program ../src/mesa/state_tracker/st_program.c:1421
#16 0xef3806ec in st_link_glsl_to_nir ../src/mesa/state_tracker/st_glsl_to_nir.cpp:748
#17 0xef3806ec in st_link_shader ../src/mesa/state_tracker/st_glsl_to_nir.cpp:984
#18 0xef2992f6 in link_program ../src/mesa/main/shaderapi.c:1336
#19 0xef2992f6 in link_program_error ../src/mesa/main/shaderapi.c:1445
Fixes: 40372bd720 ("panfrost: Implement a disk cache")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28743>
(cherry picked from commit 4f5e9a21c5)
panfrost_initialize_surface is called when a surface is written to,
and marks that surface as valid. If the surface is a depth buffer
with a separate stencil, that separate stencil should also be marked
as valid; otherwise, readpixel will skip reading it (and hence the
stencil data will be read as uninitialized). This only affects
DEPTH32F_STENCIL8 formats.
Cc: mesa-stable
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28738>
(cherry picked from commit c939111f3f)
Remove a premature optimization. When PIPE_MAP_DISCARD_WHOLE_RESOURCE
is set we were setting create_new_bo, and then if that was set we skipped
a set of tests which if passed would cause a panfrost_flush_writer.
In fact we need that flush in some cases (e.g. when any batch is
reading the resource). Moreover, we should sometimes copy the resource
(set the copy_resource flag) and that again was being skipped if
create_new_bo was initially true due to PIPE_MAP_DISCARD_WHOLE_RESOURCE
being set.
Cc: mesa-stable
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28406>
(cherry picked from commit e3d123b7a6)
We take advantage of the needs_quad_helper_invocation information to
only enable the PER_QUAD TMU access on Fragment Shaders when it is needed.
PER_QUAD access is also disabled on stages different to fragment shader.
Being enabled was causing MMU errors when TMU was doing indexed by vertexid
reads on disabled lanes on vertex stage. This problem was exercised by some
shaders from the GTK new GSK_RENDERER=ngl that were accessing a constant buffer
offset[6], but having PER_QUAD enabled on the TMU access by VertexID was
doing hidden incorrect access to not existing vertex 6 and 7 as TMU was
accessing the full quad.
cc: mesa-stable
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28740>
(cherry picked from commit 97f5721bfc)
The device->using_sparse variable is only used at cmd_buffer_barrier()
to decide if we need to apply the heavier-weight flushes that are only
applicable to sparse resources. The big problem here is that we need
to apply the flushes to the non-image and non-buffer memory barriers,
so we were trying to limit those only to applications that ever submit
a sparse resource to the sparse queue.
The reason why we were applying this only to devices that ever
submitted sparse resources is that dxvk games have this thing where
during startup they create and then delete tiny sparse resources, so
switching device->using_sparse to true at resource creation would make
basically every dxvk game start applying the heavier-weight
workaround.
The problem with all that is that even if an application creates a
sparse resource but doesn't ever bind them, the resource should still
behave as an unbound resource (because they are bound with a NULL
bind), so the flushes affecting them should happen. This case is
exercised by vkd3d-proton/test_buffer_feedback_instructions_sm51.
In order to satisfy all the above cases and only really apply the
heavier-weight flushes to applications actually using sparse
resources, let's just count the number of sparse resources that
currently exist and then apply the workaround only if it's not zero.
That covers the dxvk case since dxvk deletes the resources as soon as
they create, so num_sparse_resources goes back to 0.
Testcase: vkd3d-proton/test_buffer_feedback_instructions_sm51
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10960
Fixes: 6368c1445f ("anv/sparse: add the initial code for Sparse Resources")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
(cherry picked from commit 95dc34cd97)
Since DGC preprocessing for IBO is supported, the driver generates
an indexed indirect draw but SQTT markers were missing and this
introduced complete non-sense in RGP captures.
Fixes: e59a16bbb8 ("radv: use an indirect draw when IBO isn't updated as part of DGC")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28710>
(cherry picked from commit 4586451b2d)
I'm not sure why zink would want dri2 here instead of kopper,
I'm sure it's some side effect of something else, let zink
use the kopper paths.
This fixes:
dEQP-GL45-ES3.info.vendor
on zink on nvk with GL cts using EGL.
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Fixes: 12a47b84b7 ("egl/dri2: trigger drawable invalidation from surface queries for zink")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28707>
(cherry picked from commit 223aedfa5d)
Indeed, the pointer processed by r300_upload_index_buffer() was not the right one.
This is the reason why "deqp-gles2 --deqp-case=dEQP-GLES2.functional.draw.draw_elements.indices.user_ptr.index_byte"
was failing (the logs are below). This change corrects this issue and makes the related deqp tests work properly.
This change considers that r300_upload_index_buffer() sets indexBuffer to NULL. The indexBuffer resource
should be properly freed once the buffer is processed. This is required to avoid another refcnt imbalance
(another kind of memory leak).
==9962==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000721f at pc 0x7fd57b54a9a0 bp 0x7fffd2c39290 sp 0x7fffd2c38a40
READ of size 30 at 0x60200000721f thread T0
#0 0x7fd57b54a99f in __interceptor_memcpy (/usr/lib64/libasan.so.6.0.0+0x3c99f)
#1 0x7fd570d10528 in u_upload_data ../src/gallium/auxiliary/util/u_upload_mgr.c:333
#2 0x7fd57114142b in r300_upload_index_buffer ../src/gallium/drivers/r300/r300_screen_buffer.c:44
#3 0x7fd57113943c in r300_draw_elements ../src/gallium/drivers/r300/r300_render.c:632
#4 0x7fd57113bbc4 in r300_draw_vbo ../src/gallium/drivers/r300/r300_render.c:840
#5 0x7fd570d212e2 in u_vbuf_draw_vbo ../src/gallium/auxiliary/util/u_vbuf.c:1487
#6 0x7fd56fceb873 in _mesa_validated_drawrangeelements ../src/mesa/main/draw.c:1709
#7 0x7fd56fcf28c5 in _mesa_DrawElementsBaseVertex ../src/mesa/main/draw.c:1852
Fixes: 330d0607ed ("gallium: remove pipe_index_buffer and set_index_buffer")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28523>
(cherry picked from commit 2b6993cb71)
Workaround states that if Destination Alpha Blend
Factor==BLENDFACTOR_ZERO, instead use BLENDFACTOR_CONST_ALPHA with the
constant alpha set to 0.
We had typo while setting the DestinationAlphaBlendFactor, use
BLENDFACTOR_CONST_ALPHA instead of BLENDFACTOR_CONST_COLOR.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28640>
(cherry picked from commit 7cc604ed1b)
As we were checking the bind count after decreasing the ref count,
we could hit a situation where the worker thread decreases the bind count
just after the ref count is decreased, but before the bind count is checked
which would lead to both threads calling the resource dtor.
Cc: mesa-stable
Signed-off-by: Axel Davy <davyaxel0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28232>
(cherry picked from commit d6044cf857)
The BSpec page "Flush Types" (46213) says the following about the Tex
Invalidate bit:
"Requires stall bit ([20] of DW) set for all GPGPU Workloads."
For newer platforms, this is documented in the description of the
texture invalidation bit in the PIPE_CONTROL page (56551):
"CS Stall bit in PIPE_CONTROL command must be always set for GPGPU
workloads when Texture Cache Invalidation Enable bit is set"
Iris had it only for GFX_VER 9 and 11, while Anv had it missing for
everything.
Please notice that this patch includes a revert of 397e728ef4.
Fixes: 397e728ef4 ("iris: Drop GPGPU Tex Invalidate restriction for TGL+")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28608>
(cherry picked from commit cf7e1f3817)
The new GTK+ GL renderer is extensively using instanced rendering. SVGA
driver was incorrectly detecting the instanced draws by only checking
whether the instance count was greater than 1. Base instance has to
be also checked to make sure that the draw correctly offsets the vertex
buffer.
Fix instanced draw detection by checking both the instance count and
the base instance. Fixes the new GTK+ 4 GL renderer.
Signed-off-by: Zack Rusin <zack.rusin@broadcom.com>
Fixes: ccb4ea5a43 ("svga: Add GL4.1(compatibility profile) support in svga driver")
Reviewed-by: Neha Bhende <neha.bhende@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28616>
(cherry picked from commit 955444e068)
When trying to increase the height alignment to unlock multi-pipe resolve for
better performance we need to be careful to not overstep the source dimensions
as this would cause the blit to be rejected.
Do so and also rearrange the code a bit to make it more obvious what is being
done.
Fixes: 797454edfc ("etnaviv: rs: fix blits with insufficient alignment for dual pipe operation")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28598>
(cherry picked from commit 2964812aac)
copy_image calls blit now for multisampled images, including
depth. But blit explicitly uses only PIPE_MASK_RGBA, so it is
incapable of copying depth buffers.
This patch checks the destination format and uses PIPE_MASK_ZS if
it is a depth or stencil. Ideally we would simply use PIPE_MASK_RGBAZS
always, but not all drivers actually handle getting this mask
(they probably should, but that's another story).
The change to copy_image was in 5027b5aa2, so in some sense this
patch "fixes" that. In fact though the issue wasn't in the copy_image
change, it was always latent in blit().
Fixes: 5027b5aa28 ("gallium: stop calling resource_copy_region for multisampled copy_image")
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28585>
(cherry picked from commit 0cb852050d)
When has_vm_control is supported it takes a different code path and
creates one context per engine and in this code path we were not
setting the protected context flag.
The lack of this is not causing any test to fail in our CI but it is
better do what we are supposed to do.
Fixes: fd40134487 ("anv: allow protected GEM context creation")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28299>
(cherry picked from commit 77c004f7ca)
some gnome tests are seeing this:
==4579== Invalid read of size 4
==4579== at 0x161B28FB: UnknownInlinedFun (simple_mtx.h:106)
==4579== by 0x161B28FB: st_save_zombie_sampler_view (st_context.c:213)
==4579== by 0x161D762A: st_texture_release_all_sampler_views.part.0 (st_sampler_view.c:272)
==4579== by 0x161D7CB6: st_texture_release_all_sampler_views (st_sampler_view.c:258)
==4579== by 0x161D7CB6: st_delete_texture_sampler_views (st_sampler_view.c:292)
==4579== by 0x16191B9E: _mesa_delete_texture_object (texobj.c:523)
==4579== by 0x16191CDC: _mesa_reference_texobj_ (texobj.c:637)
==4579== by 0x1619632F: _mesa_reference_texobj (texobj.h:92)
==4579== by 0x1619632F: _mesa_free_texture_data (texstate.c:1114)
==4579== by 0x162EEE1D: _mesa_free_context_data (context.c:1155)
==4579== by 0x161B3C0F: st_destroy_context (st_context.c:999)
==4579== by 0x160F155A: dri_destroy_context (dri_context.c:277)
==4579== by 0x1603C371: dri2_destroy_context (egl_dri2.c:1592)
==4579== by 0x1602EBF5: eglDestroyContext (eglapi.c:918)
==4579== by 0x502DF3C: gdk_gl_context_dispose (gdkglcontext.c:211)
==4579== Address 0x34dc9b08 is 7,016 bytes inside an unallocated block of size 10,336 in arena "client"
==4579==
It appears we destroy the mutex and zombie objects, but freeing
context data seems to add them back.
This might not be the complete answer.
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28565>
(cherry picked from commit f8e48b561e)
We need invalidate CCHE when we optimize out an invalidation of UCHE,
for example a storage image write to texture read. We missed this
earlier because of the blob's tendency to always over-flush, but the
blob does use this when building acceleration structures.
Fixes: 95104707f1 ("tu: Basic a7xx support")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28445>
(cherry picked from commit fb1c3f7f5d)
this breaks interfaces where the consumer reads its input indirectly
and only some of the components are written by clobbering all
the components, even if the unwritten components are never accessed
Fixes: 459b49a174 ("zink: add a new linker pass to handle mismatched i/o components")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28466>
(cherry picked from commit 460cd99ea5)
A last memory leak related to constants_remap_table is happening.
This memory leak is triggered by two deqp-gles2 tests.
For instance, this issue is triggered with
"deqp-gles2 --deqp-case=dEQP-GLES2.functional.uniform_api.random.13":
Direct leak of 336 byte(s) in 1 object(s) allocated from:
#0 0x7f1b4a5de7ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7f1b401a2cdf in rc_remove_unused_constants ../src/gallium/drivers/r300/compiler/radeon_remove_constants.c:101
#2 0x7f1b40185386 in rc_run_compiler_passes ../src/gallium/drivers/r300/compiler/radeon_compiler.c:476
#3 0x7f1b40185625 in rc_run_compiler ../src/gallium/drivers/r300/compiler/radeon_compiler.c:498
#4 0x7f1b401c14d2 in r3xx_compile_fragment_program ../src/gallium/drivers/r300/compiler/r3xx_fragprog.c:172
#5 0x7f1b401b669a in r300_translate_fragment_shader ../src/gallium/drivers/r300/r300_fs.c:516
#6 0x7f1b401baf73 in r300_pick_fragment_shader ../src/gallium/drivers/r300/r300_fs.c:592
#7 0x7f1b40128db7 in r300_create_fs_state ../src/gallium/drivers/r300/r300_state.c:1071
#8 0x7f1b3e67799d in st_create_fp_variant ../src/mesa/state_tracker/st_program.c:1073
#9 0x7f1b3e680285 in st_get_fp_variant ../src/mesa/state_tracker/st_program.c:1119
#10 0x7f1b3e6812fa in st_precompile_shader_variant ../src/mesa/state_tracker/st_program.c:1284
#11 0x7f1b3e6812fa in st_finalize_program ../src/mesa/state_tracker/st_program.c:1363
#12 0x7f1b3f13d501 in st_link_glsl_to_nir ../src/mesa/state_tracker/st_glsl_to_nir.cpp:754
#13 0x7f1b3f13d501 in st_link_shader ../src/mesa/state_tracker/st_glsl_to_nir.cpp:990
#14 0x7f1b3efeef75 in link_program ../src/mesa/main/shaderapi.c:1336
#15 0x7f1b3efeef75 in link_program_error ../src/mesa/main/shaderapi.c:1445
Fixes: 29df85788a ("r300: fix constants_remap_table memory leak")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28522>
(cherry picked from commit 24a5165cdf)
For BROADCAST and MOV_INDIRECT, required_exec_type was returning
brw_int_type(type_sz(t), false), which is an unsigned type. However,
get_exec_type(inst) returns the original type for either Q or UQ.
This meant that has_invalid_exec_type would detect a mismatch and
trigger lowering.
That lowering would insert new 64-bit MOVs, which would need to be
lowered on platforms which don't support Q/UQ. Except, we already
ran that lowering pass earlier. So, the unlowered Q/UQ MOVs would
reach the software scoreboarding pass, and trigger failures in the
inferred_exec_pipe() function, as no pipe is available to handle
64-bit integer operations.
It turns out that we don't need the region lowering pass to do
anything for these opcodes. The generator code for both BROADCAST
and MOV_INDIRECT already handle decomposing Q/UQ operations into
32-bit MOVs when they're not supported. And, it also implicitly
converts to integer types, even for floating point sources. The
inferred_exec_pipe function already special cases them to note
that they'll always be handled on the integer pipe, so that matches.
Just drop the region lowering code for these opcodes.
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
(cherry picked from commit 7a24f29fbb)
We are overriding the type to Q/UQ, so we need to split to two MOVs
if 64-bit integer math is not supported. For reference, Meteorlake
does support 64-bit floats but would still not work correctly here.
See also brw_broadcast(), which does similar indirects but correctly
checks has_64bit_int instead of has_64bit_float.
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28458>
(cherry picked from commit a90edad9f7)
==134077== 96 bytes in 1 blocks are definitely lost in loss record 1 of 3
==134077== at 0x4840808: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==134077== by 0x6D6F690: vk_default_alloc (vk_alloc.c:26)
==134077== by 0x52EEEBE: vk_alloc (vk_alloc.h:48)
==134077== by 0x52EEEEE: vk_zalloc (vk_alloc.h:56)
==134077== by 0x52EF47E: xe_exec_process_syncs (anv_batch_chain.c:132)
==134077== by 0x52EF8F6: xe_execute_trtt_batch (anv_batch_chain.c:215)
==134077== by 0x5301670: anv_queue_submit_trtt_batch (anv_batch_chain.c:1697)
==134077== by 0x603D135: gfx125_write_trtt_entries (genX_cmd_buffer.c:6091)
==134077== by 0x5370B44: anv_sparse_bind_trtt (anv_sparse.c:595)
==134077== by 0x5370CFC: anv_sparse_bind (anv_sparse.c:629)
==134077== by 0x5370E6E: anv_init_sparse_bindings (anv_sparse.c:670)
==134077== by 0x5328037: anv_CreateBuffer (anv_device.c:5071)
Note to backporters: this is only for when xe.ko is being used and
ANV_SPARSE_USE_TRTT=1 is exported. This is not the regular code path.
Fixes: 18bd00c024 ("anv/trtt: don't wait/signal syncobjs using the CPU anymore")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28455>
(cherry picked from commit 38af7254e2)
Xe KMD don't refcount anything, so resources could be freed while they
are still in use if we don't wait for exec_queue to be idle.
This issue was found with Xe KMD error capture, VM was already
destroyed when it attemped to capture error state but it can also
happen in applications that did not hang.
This fixed the '*ERROR* GT0: TLB invalidation' errors when running
piglit all test list.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27500>
(cherry picked from commit 665d30b544)
When LLVM 18 is used, pass the no-verify-fixpoint option when
running the instcombine pass. Otherwise LLVM may abort with an
error.
The background here is that this option is enabled by default for
testing purposes, because instcombine is normally only explicitly
invoked like this inside tests. If it is used in an actual
production pipeline, the no-verify-fixpoint option needs to be
enabled.
This should fix the issue reported at
https://bugzilla.redhat.com/show_bug.cgi?id=2268800.
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28101>
(cherry picked from commit 99f0449987)
When tu_shader object is destroyed through vk_pipeline_cache, the relevant
destroy callback should relay to the general tu_shader_destroy function
that will also clean up owned resources.
During shader creation, the ir3_shader object should be destroyed once the
shader variants are retrieved. Since those variants are owned by tu_shader
they should be freed up in tu_shader_destroy.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: a03525d8db ("tu: Split program draw state into per-shader states")
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27847>
(cherry picked from commit 3ee81ffe14)
It was by pure luck that all sources (and the result) of nir_dpas_intel
had the same number of components. It is possible to support matrix
sizes where the accumlator matrix and the result matrix are larger
(e.g., 16x8 * 8x16 = 16x16).
This breaks all of the assumptions of NIR's infrastructure for code
generating intrinsics. Fix the by making the accumulator matrix be the
first source. The accumulator and the result will always have the same
dimensions (due to rules of matrix multiplication) and the same type
(due to restructions of the cooperative matrix extension). This forces
them to have the same number of components.
This doesn't fix all the potential problems. NIR expects that all
0-sized sources will have the same number of components. This just
ensures that the result has the correct number of components.
Fixes: 6b14da33ad ("intel/fs: nir: Add nir_intrinsic_dpas_intel")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
(cherry picked from commit a8115221e5)
If the destination was the accumulator but is no longer, having the flag
set is not correct. On Xe2 this also causes a validation error.
v2: Reword the comment to be more clear. Suggested by Jordan.
Fixes: efa4e4bc5f ("intel/fs: Introduce regioning lowering pass.")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28404>
(cherry picked from commit be4fa59a72)
This reverts commit be8b7980e6.
Store the cache entry so that the fast path picks up the optimized
pipeline when its available from a background optimized_compile_job().
Observed traces where it would take the fast path back and forth using
an unoptimized pipeline and never pick up the optimized pipeline leading
to >50% fps drop.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28440>
(cherry picked from commit d6978b1af2)
opt_copy_propagation() can sometimes propagate FIXED_GRF sources into
SHADER_OPCODE_SENDs as the message payload. For example, GS input
reads, which simply take a URB handle and have the offset in the
descriptor. For non-VGRFs, there isn't a payload to split, so just
skip past such send messages.
Fixes: 589b03d02f ("intel/fs: Opportunistically split SEND message payloads")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28067>
(cherry picked from commit ba11127944)
External images translate to 2D images in ntv, so we will have to emit
OpImageQuerySizeLod instead of OpImageQuerySize (thanks Faith for
pointing that out). This quells
VUID-VkShaderModuleCreateInfo-pCode-08737
Image must have either 'MS'=1 or 'Sampled'=0 or 'Sampled'=2
%32 = OpImageQuerySize %v2int %31
triggred by piglit
spec@oes_egl_image_external_essl3@oes_egl_image_external_essl3
on Zink.
Fixes: 3f783a3c50
zink: omit Lod image operand in ntv when not using an image texture dim
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28389>
(cherry picked from commit b6c1390354)
These are both handled by inserting them directly at the top of the
nir_function_impl. However, if the cursor is already at the top, it
never gets updated so we end up inserting other stuff after the newly
inserted undef or decl_reg. It's an odd edge case to be sure but I hit
it with my new NIR CF pass for NAK.
Fixes: 1be4c61c95 ("nir/builder: Add a helper for creating undefs")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28300>
(cherry picked from commit a782809f81)
When shaders might read metadata (DCC) this must be flushed.
VK_ACCESS_2_MEMORY_READ_BIT includes all READ bits that are relevant.
I think this issue has been uncoverd since vkd3d-proton d1425ee4
("vkd3d: Use VK_ACCESS_MEMORY_{READ,WRITE}_BIT where appropriate")
because RADV used to be missing VK_ACCESS_2_MEMORY_{READ,WRITE} in the
past and vkd3d-proton added a special workaround that has been removed.
This fixes some DCC corruption in WWE 2K24.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10774
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28332>
(cherry picked from commit 585b4c5a01)
in a sequence like:
* resize A
* clear
* resize B
* clear
* resize C
* clear
for a swapchain resource, the geometry for a given op after the resize
may desync for the op with which it was executed, but this is fine
since the underlying swapchain object will have to be re-created anyway
fixes#10827
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28214>
(cherry picked from commit ee13512a62)
For instance, this issue is triggered with
"piglit/bin/glslparsertest tests/spec/arb_bindless_texture/compiler/images/arith-bound-image.frag pass 3.30 GL_ARB_bindless_texture GL_ARB_shader_image_load_store":
Direct leak of 176 byte(s) in 1 object(s) allocated from:
#0 0x7f84c3fbe9a7 in calloc (/usr/lib64/libasan.so.6+0xb19a7)
#1 0x7f84ba7e0801 in ac_nir_translate ../src/amd/llvm/ac_nir_to_llvm.c:4391
#2 0x7f84ba53fdf4 in si_llvm_translate_nir ../src/gallium/drivers/radeonsi/si_shader_llvm.c:759
#3 0x7f84ba542bb7 in si_llvm_compile_shader ../src/gallium/drivers/radeonsi/si_shader_llvm.c:836
#4 0x7f84ba337b8e in si_compile_shader ../src/gallium/drivers/radeonsi/si_shader.c:2874
#5 0x7f84ba43a7c1 in si_init_shader_selector_async ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:3176
#6 0x7f84b81c3448 in util_queue_thread_func ../src/util/u_queue.c:309
#7 0x7f84b821ea6a in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#8 0x7f84c2fea38a (/lib64/libc.so.6+0x8438a)
Direct leak of 136 byte(s) in 1 object(s) allocated from:
#0 0x7f84c3fbff57 in operator new(unsigned long) (/usr/lib64/libasan.so.6+0xb2f57)
#1 0x7f84b1a5f749 in LLVMCreateBuilderInContext (/usr/local/lib64/libLLVM-17.so+0xc84749)
#2 0x7f84ba7817b0 in ac_llvm_context_init ../src/amd/llvm/ac_llvm_build.c:54
#3 0x7f84ba542b7a in si_llvm_context_init ../src/gallium/drivers/radeonsi/si_shader_llvm.c:120
#4 0x7f84ba542b7a in si_llvm_compile_shader ../src/gallium/drivers/radeonsi/si_shader_llvm.c:832
#5 0x7f84ba337b8e in si_compile_shader ../src/gallium/drivers/radeonsi/si_shader.c:2874
#6 0x7f84ba43a7c1 in si_init_shader_selector_async ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:3176
#7 0x7f84b81c3448 in util_queue_thread_func ../src/util/u_queue.c:309
#8 0x7f84b821ea6a in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#9 0x7f84c2fea38a (/lib64/libc.so.6+0x8438a)
Indirect leak of 176 byte(s) in 1 object(s) allocated from:
#0 0x7f84c3fbe7ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7f84b81b9b3f in ralloc_size ../src/util/ralloc.c:118
#2 0x7f84b81b9fee in rzalloc_size ../src/util/ralloc.c:152
#3 0x7f84b81b9fee in rzalloc_array_size ../src/util/ralloc.c:232
#4 0x7f84b81b05c7 in _mesa_hash_table_init ../src/util/hash_table.c:163
#5 0x7f84b81b05c7 in _mesa_hash_table_create ../src/util/hash_table.c:186
#6 0x7f84ba7e06ae in ac_nir_translate ../src/amd/llvm/ac_nir_to_llvm.c:4381
#7 0x7f84ba53fdf4 in si_llvm_translate_nir ../src/gallium/drivers/radeonsi/si_shader_llvm.c:759
#8 0x7f84ba542bb7 in si_llvm_compile_shader ../src/gallium/drivers/radeonsi/si_shader_llvm.c:836
#9 0x7f84ba337b8e in si_compile_shader ../src/gallium/drivers/radeonsi/si_shader.c:2874
#10 0x7f84ba43a7c1 in si_init_shader_selector_async ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:3176
#11 0x7f84b81c3448 in util_queue_thread_func ../src/util/u_queue.c:309
#12 0x7f84b821ea6a in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#13 0x7f84c2fea38a (/lib64/libc.so.6+0x8438a)
Indirect leak of 176 byte(s) in 1 object(s) allocated from:
#0 0x7f84c3fbe7ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7f84b81b9b3f in ralloc_size ../src/util/ralloc.c:118
#2 0x7f84b81b9fee in rzalloc_size ../src/util/ralloc.c:152
#3 0x7f84b81b9fee in rzalloc_array_size ../src/util/ralloc.c:232
#4 0x7f84b81b05c7 in _mesa_hash_table_init ../src/util/hash_table.c:163
#5 0x7f84b81b05c7 in _mesa_hash_table_create ../src/util/hash_table.c:186
#6 0x7f84ba7e06e4 in ac_nir_translate ../src/amd/llvm/ac_nir_to_llvm.c:4382
#7 0x7f84ba53fdf4 in si_llvm_translate_nir ../src/gallium/drivers/radeonsi/si_shader_llvm.c:759
#8 0x7f84ba542bb7 in si_llvm_compile_shader ../src/gallium/drivers/radeonsi/si_shader_llvm.c:836
#9 0x7f84ba337b8e in si_compile_shader ../src/gallium/drivers/radeonsi/si_shader.c:2874
#10 0x7f84ba43a7c1 in si_init_shader_selector_async ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:3176
#11 0x7f84b81c3448 in util_queue_thread_func ../src/util/u_queue.c:309
#12 0x7f84b821ea6a in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#13 0x7f84c2fea38a (/lib64/libc.so.6+0x8438a)
Indirect leak of 128 byte(s) in 1 object(s) allocated from:
#0 0x7f84c3fbe7ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7f84b81b9b3f in ralloc_size ../src/util/ralloc.c:118
#2 0x7f84b81b046c in _mesa_hash_table_create ../src/util/hash_table.c:182
#3 0x7f84ba7e06e4 in ac_nir_translate ../src/amd/llvm/ac_nir_to_llvm.c:4382
#4 0x7f84ba53fdf4 in si_llvm_translate_nir ../src/gallium/drivers/radeonsi/si_shader_llvm.c:759
#5 0x7f84ba542bb7 in si_llvm_compile_shader ../src/gallium/drivers/radeonsi/si_shader_llvm.c:836
#6 0x7f84ba337b8e in si_compile_shader ../src/gallium/drivers/radeonsi/si_shader.c:2874
#7 0x7f84ba43a7c1 in si_init_shader_selector_async ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:3176
#8 0x7f84b81c3448 in util_queue_thread_func ../src/util/u_queue.c:309
#9 0x7f84b821ea6a in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#10 0x7f84c2fea38a (/lib64/libc.so.6+0x8438a)
Indirect leak of 128 byte(s) in 1 object(s) allocated from:
#0 0x7f84c3fbe7ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7f84b81b9b3f in ralloc_size ../src/util/ralloc.c:118
#2 0x7f84b81b046c in _mesa_hash_table_create ../src/util/hash_table.c:182
#3 0x7f84ba7e06ae in ac_nir_translate ../src/amd/llvm/ac_nir_to_llvm.c:4381
#4 0x7f84ba53fdf4 in si_llvm_translate_nir ../src/gallium/drivers/radeonsi/si_shader_llvm.c:759
#5 0x7f84ba542bb7 in si_llvm_compile_shader ../src/gallium/drivers/radeonsi/si_shader_llvm.c:836
#6 0x7f84ba337b8e in si_compile_shader ../src/gallium/drivers/radeonsi/si_shader.c:2874
#7 0x7f84ba43a7c1 in si_init_shader_selector_async ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:3176
#8 0x7f84b81c3448 in util_queue_thread_func ../src/util/u_queue.c:309
#9 0x7f84b821ea6a in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#10 0x7f84c2fea38a (/lib64/libc.so.6+0x8438a)
SUMMARY: AddressSanitizer: 920 byte(s) leaked in 6 allocation(s).
Fixes: d92d35c9db ("ac/llvm: add a return value to ac_nir_translate")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28099>
(cherry picked from commit 0fd907fc7b)
The vma_samplers vma heap is initialized unconditionally. Don't use
device->physical->indirect_descriptors as a condition on whether to
free it or not.
From my TGL machine:
==373617== 32 bytes in 1 blocks are definitely lost in loss record 1 of 1
==373617== at 0x48459F3: calloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==373617== by 0x6926DC0: util_vma_heap_free (vma.c:339)
==373617== by 0x6925ED3: util_vma_heap_init (vma.c:53)
==373617== by 0x5334EDA: anv_CreateDevice (anv_device.c:3404)
==373617== by 0x685593A: vk_tramp_CreateDevice (vk_dispatch_trampolines.c:78)
==373617== by 0x48A6D56: terminator_CreateDevice (loader.c:5833)
==373617== by 0x9C2293F: vulkan_layer_chassis::CreateDevice(VkPhysicalDevice_T*, VkDeviceCreateInfo const*, VkAllocationCallbacks const*, VkDevice_T**) (chassis.cpp:497)
==373617== by 0x48B0690: loader_create_device_chain (loader.c:4937)
==373617== by 0x48B1327: loader_layer_create_device (loader.c:4317)
==373617== by 0x48B8D79: vkCreateDevice (trampoline.c:1004)
==373617== by 0x10CC7A: MyApp::MyApp(int, bool) (sparse.cpp:608)
==373617== by 0x1201E8: main (sparse.cpp:6025)
Fixes: 7c76125db2 ("anv: use 2 different buffers for surfaces/samplers in descriptor sets")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28303>
(cherry picked from commit 6ec1e322f0)
The hand rolled etnaviv conversion functions were able to handle
negative input values when converting to fixpoint. By replacing
them with U_FIXED all negative values are clamped to zero, which
breaks usages where negative inputs are valid, like lodbias.
Fixes: 8bce68edf5 ("etnaviv: switch to U_FIXED(..) macro")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28224>
(cherry picked from commit 4b8981e471)
Fixes validation error:
VUID-VkShaderModuleCreateInfo-pCode-08737
AtomicFAddEXT: expected Pointer to point to a value of type Result
Type
%51 = OpAtomicFAddEXT %float %49 %uint_1 %uint_0 %50
when running
spec@nv_shader_atomic_float@execution@ssbo-atomicadd-float
Fixes: 9f6be8effb
zink: store and use alu types for ntv defs
v2: Fix commit message (Mike)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28243>
(cherry picked from commit 50a6c5d5fa)
Since we split misaligned attributes, we could overwrite one of these
VGPRs in the middle of loading the attribute.
For example:
v_add_u32_e32 v4, vcc, s7, v1
s_waitcnt lgkmcnt(0)
buffer_load_dword v4, v4, s[32:35], 0 idxen
buffer_load_dword v5, v4, s[32:35], 0 idxen offset:4
can overwrite the vertex index in the load of the first component.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27920>
(cherry picked from commit ec892c4d2b)
The stride given in the shader is in number of elements of the of the
type pointed by the given pointer, which may not match the matrix own
element type.
Since we cast the pointer to match the element type, the stride needs to
be ajusted accordingly.
v2:
- Fix mismatching bit-width in matrix element type and pointer type (Caio)
- Do the stride calculation in one place
Fixes dEQP-VK.compute.pipeline.cooperative_matrix.khr_*.multicomponent.*
Fixes: 3a35f8b29b ("intel/cmat: Lower cmat_load and cmat_store")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10820
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27903>
(cherry picked from commit 446f652cde)
The `stride` and `offset` attributes are meaningful for the "virtual"
register files (VGRFs, UNIFORMs and ATTRs). Accumulator is an ARF so
validation should check `hstride` (part of the <V,W,H> triple) and `subnr`
instead.
Fixes: 12d7aaf2b8 ("intel/compiler: add more validation for acc register usage")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28059>
(cherry picked from commit e324fbbe68)
This ensure the region triple <V,W,H> is set correctly, in this case the
desired region is a sequential like <8,8,1>. Without the helper the
sequence we get is <0,1,0> -- which the generator currently partially
adjusts when emitting code, but is not sufficient when doing validation
earlier.
The code generated code is slightly modified. From crucible test
func.shader.subtractSaturate.uint in the fragment shader for SIMD8, the
diff looks like
```
mov(8) acc0<1>UD g21<8,8,1>UD { align1 1Q $0.dst };
-add.sat(8) g22<1>UD -acc0<0,1,0>UD g16<8,8,1>UD { align1 1Q @1 $0.dst };
+add.sat(8) g22<1>UD -acc0<8,8,1>UD g16<8,8,1>UD { align1 1Q @1 $0.dst };
```
Note that without the patch generator adjusted the hstride for acc0 used
as destination (see brw_set_dest), but kept the src region as is. For
the source, it is not clear to me why the <0,1,0> would work correctly
here since it is a scalar, but using <8,8,1> it is correct.
Fixes: 58907568ec ("intel/fs: Add SHADER_OPCODE_[IU]SUB_SAT pseudo-ops")
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28059>
(cherry picked from commit db8022dc4d)
It was correct for the parameters that the driver was using, but incorrect
for other parameters.
1. The address computation must multiply the workgroup size (wave size)
by num_mem_ops to fix the case when num_dwords_per_thread > 4.
2. nir_load_ssbo shouldn't set the number of components to 4 when
num_dwords_per_thread < 4.
Fixes: 6584088cd5 - radeonsi: "create_dma_compute" shader in nir
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28119>
(cherry picked from commit e99765df08)
this should yield more consistent results and avoid weird cases where
various formats are queried for things they don't support and won't use
Fixes: 9a412c10b7 ("zink: set all usage flags when querying sparse features")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28115>
(cherry picked from commit 8fa413fef0)
only certain formats are required to have the storage bit, so be more
tolerant of failure in the case where drivers actually check flags
and reject storage usage when it's actually unsupported
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28115>
(cherry picked from commit 61e5b6ad9d)
the existing guesswork during format selection for teximage is
accurate most of the time, but it's not accurate all of the time.
GL/ES each have a set of sized formats that are required to be
color renderable, and so any time one of these is allocated as a
texture, it MUST have the rendertarget usage bit attached so that
it can later be bound as a framebuffer attachment
an alternative might be to relax this and then try to do migration
to a different format/buffer later if necessary, but that's hard and
probably not actually as useful
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28055>
(cherry picked from commit 0f66589c2a)
This fixes an issue hit by one of darktable's kernels, where the sampler
argument got assigned the location of a dead kernel parameter turning it
into a zombie and leading us to trash the kernel input buffer's layout.
Fixes: 25b8a34b48 ("rusticl/kernel: inline samplers")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28121>
(cherry picked from commit 2df640c4f6)
If you look at the sampler message header on Gfx9+, you'll see that we
mostly only use 2 dwords (dw2 & dw3). DW2 has a bunch of sampler
parameters, DW3 is the sampler handle.
On Gfx9 we can micro optimize by copying r0 into the header because
the HW mostly doesn't care about other DWs. We just have to clear dw2
on non VS/FS stages.
On Gfx11+, we always have to do a careful copy of the r0.3 bits to
mask out the bottom unrelated bits. So there, just clearing the entire
header makes more sense.
On Xe2+, the dw4 of the header references the sampler feedback surface
handle and bit0 is a boolean to know whether to use that surface or
not. So it *REALLY* matters to have that as 0. If we copy r0, we'll
get random bits in dw4, leading to enable that surface.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28082>
(cherry picked from commit 75c6ad9907)
With RADV, when VS/TES and FS are compiled separately, the PrimitiveId
is exported unconditionally because it's not possible to know if the
FS reads it or not. This happens with fast-link GPL and shader object.
Though, the PrimitiveID should be ignored when it's implicitly exported
because otherwise the stream output LDS offset is incorrect.
This fixes a bunch of failures with transform feedback and Zink/RADV
when shader object is enabled on RDNA3.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27981>
(cherry picked from commit d12984edb8)
The algorithm used to rendering smooth lines worked under the assumption
that line coords were in the [0, 1] range. This was correct when using
an orthogonal projection, but not when using a perspective projection.
With a perspective projection (where the value for 1/Wc set in the VPM
is not 1.0), line coords values are also affected by this projection, so
the values are not in this range.
To deal with this, we normalize the line coords using the Wc value so
the range becomes [0, 1], and the smooth line rendering works as
expected.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10496
Fixes: ee4d51f8b2 ("v3d: Add a lowering pass for line smoothing")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28072>
(cherry picked from commit 69fbd5cb90)
The previous version had an optimization where, instead of actually
waiting on the FALCON to return, it would just do a bunch of nops in
some cases. This seems broken at least on Turing+ and results in
registers not ending up with the right values. It only really shows up
when you set two registers back-to-back in which case the second
SET_PRIV_REG may mess up the first.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27927>
(cherry picked from commit 0ed7bce8e5)
The driver is written that we should support ETNA_NUM_VARYINGS and reporting
a bigger number will cause some troubles. I had a quick look at galcore's
hw database and there are entries that report a higher value.
So I think what we want is to the minimum value of what kernel driver reports
and what the gallium driver should be able to handle.
Fixes: 84816c22e4 ("etnaviv: ask kernel for max number of supported varyings")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27923>
(cherry picked from commit 93255abe30)
As per spec, any colors, or color components, associated with a fragment
that are not written by the fragment shader are undefined.
So we might as well just write vec4(1.0) to output, since HW doesn't allow
us to have an empty FS.
Backport-to: 23.3
Backport-to: 24.0
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24855>
(cherry picked from commit 6998c48f77)
the ES spec imposes additional requirements for copy commands,
specifically that the formats have matching component sizes
the existing check used the driver's internal formats to check
for a match, which is broken since the spec requires the match be
between the passed internalFormat and the buffer's effective internal
format (i.e., this has no relation to what the driver supports)
fixes KHR-GLES3.copy_tex_image_conversions.forbidden* on a bunch of drivers
cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28030>
(cherry picked from commit 2cd192f879)
Batches must be ignored if batch count is zero, so all batch inspections
have to be gated behind batch count. For memcpy, it's UB if either src
or dst is NULL even when size is zero.
Side note:
- For original commit, this fixes just the memcpy UB
- For current codes, this fixes to not skip ffb batch prepare
Fixes: 493a3b5cda ("venus: refactor batch submission fixup")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28071>
(cherry picked from commit 8af267eb00)
This fixes the validation error
VUID-VkShaderModuleCreateInfo-pCode-08737
triggered by piglit:
spec@arb_gpu_shader5@execution@built-in-functions@fs-interpolateatsample-block-array:
GLSL.std.450 InterpolateAtSample: expected Sample to be 32-bit integer
%47 = OpExtInst %float %1 InterpolateAtSample %45 %float_0
Fixes: 9f6be8effb
zink: store and use alu types for ntv defs
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28043>
(cherry picked from commit b7d6d90dab)
We had a "Don't read out-of-bounds" sanity check for creating an alpha
when ATEST was needed, but that check happened only after we already
did a bi_extract(), which meant that the bi_extract could get into
trouble and assert() when there weren't enough components. Fixed by
re-arranging the calculation.
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund>@collabora.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28045>
(cherry picked from commit 0e1862a2ab)
The code path for emitting tessellation commands when the TES needed
scratch space was failing to emit 3DSTATE_TE, and instead only emitting
3DSTATE_DS. This meant that you could get HS and DS enabled with
tessellation itself turned off, which is utter nonsense and would
cause a GPU hang.
Alchemist and later takes a different path and don't take this bug,
but all earlier hardware would hit it. Discovered while working on
compiler changes that caused a single piglit test to spill minorly,
and thus break entirely.
Fixes: 4256f7ed58 ("iris: Fill out scratch base address dynamically")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28032>
(cherry picked from commit 9e5fd49cbe)
If the app requests a swizzle on the shadow sampler which doesn't just
return the red channel or literal 0s/1s, we'll crash attempting to build
the result vector. Use something that's probably valid.
Cc: mesa-stable
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28001>
(cherry picked from commit cda6877cb6)
We precompile static state and count it as dynamic, so we have to
manually clear bitset that tells which dynamic state is set, in order to
make sure that future dynamic state will be emitted. The issue is that
framework remembers only a past REAL dynamic state and compares a new
dynamic state against it, and not against our static state masquaraded
as dynamic.
Example:
- Set dynamic state S with value A
- Bind pipeline with dynamic state S
- Draw
- Bind pipeline with static state S with value B
- Draw
- Set dynamic state S with value A
- Bind pipeline with dynamic state S
- Draw
Previously, at the last draw the dynamic state S was not dirty and
current dynamic state was equal to the past dynamic state, so
it was not emitted, while GPU used value B from static pipeline.
This fix, at the point of static pipeline binding, clears the
bitset which tells that dynamic state S was previously set.
This forces the next dynamic state to be re-emitted.
Fixes broken rendering in Arma 3, and probably some other
games running through DXVK.
Fixes: 97da0a7734
("tu: Rewrite to use common Vulkan dynamic state")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27961>
(cherry picked from commit a76fcebfc0)
There are multiple problems currently :
- blorp blitter commands overwrite the protection value coming from
the driver
- anv & iris are using render target MOCS for compute commands
Driver already have the ability to pass the MOCS values so we choose
to stick to that in this change. But now the driver need to select the
right MOCS depending on the engine the commands are going to run onto.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27956>
(cherry picked from commit 194afe8416)
ZINK_BIND_RESOURCE_DESCRIPTOR and ZINK_BIND_SAMPLER_DESCRIPTOR are
always used together, so that we can replace these two values with
ZINK_BIND_DESCRIPTOR and use only one bit to represent the value.
With that we can also remove the aliasing of ZINK_BIND_DESCRIPTOR with
PIPE_BIND_CONST_BW.
Fixes: 13c6ad0038
zink: use a single descriptor buffer for all non-bindless types
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28016>
(cherry picked from commit 8e239dda41)
technically this needs to be MUCH higher since there's no limitation
on the number of semaphore waits that can be submitted, but this is
enough to handle zink usage
fixes KHR-GL46.sparse_buffer_tests.BufferStorageTest
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27992>
(cherry picked from commit 9a53e3b1fd)
I started seeing
ACO ERROR:
In file ../src/amd/compiler/aco_validate.cpp:98
Operand and Definition types do not match: s1: %44 = p_parallelcopy %158
test_basic: ../src/amd/compiler/aco_interface.cpp:85: void validate(aco::Program*):
Assertion `is_valid' failed.
since commit 52ee4cf229 ("nir/builder: Teach nir_pack_bits and
nir_unpack_bits about 32_4x8").
Fixes: e0d232c2fc ("aco: implement nir_op_pack_32_4x8"). I
Suggested-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27972>
(cherry picked from commit 3d4dfae7eb)
Indeed, main_shader_part_ngg_es was not freed.
For instance, this issue is triggered on a radeonsi/gfx10 gpu with
"piglit/bin/arb_gpu_shader5-tf-wrong-stream-value -auto -fbo":
Direct leak of 1464 byte(s) in 1 object(s) allocated from:
#0 0x7f17904b99a7 in calloc (/usr/lib64/libasan.so.6+0xb19a7)
#1 0x7f1785d65ac2 in si_init_shader_selector_async ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:3132
#2 0x7f1783af67d8 in util_queue_thread_func ../src/util/u_queue.c:309
#3 0x7f1783b51dfa in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#4 0x7f178f69d38a (/lib64/libc.so.6+0x8438a)
Indirect leak of 2024 byte(s) in 1 object(s) allocated from:
#0 0x7f17904b97ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7f1785d5443a in read_chunk ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:221
#2 0x7f1785d62cf5 in si_load_shader_binary ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:293
#3 0x7f1785d65255 in si_shader_cache_load_shader ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:423
#4 0x7f1785d65ef9 in si_init_shader_selector_async ../src/gallium/drivers/radeonsi/si_state_shaders.cpp:3169
#5 0x7f1783af67d8 in util_queue_thread_func ../src/util/u_queue.c:309
#6 0x7f1783b51dfa in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
#7 0x7f178f69d38a (/lib64/libc.so.6+0x8438a)
Fixes: 8f72f137ad ("radeonsi/gfx10: add as_ngg variant for TES as ES to select Wave32/64")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27958>
(cherry picked from commit f93f215898)
Commit 90e364edb0 contained a typo in the glsl_dvec4_type() helper,
instead returning a glsl_ivec4_type. As an ivec4 is 2x smaller than
a dvec4, this also broke piglit sanity on crocus/hsw.
This also fixes the dvec2 helper, though it has not been specifically
tested anywhere.
Fixes: 90e364edb0 ("compiler/types: Add a few more helpers to get builtin types")
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27917>
(cherry picked from commit f9acfeeb59)
For instance, this issue is triggered with
"piglit/bin/object-namespace-pollution glBitmap program -auto -fbo":
Direct leak of 112 byte(s) in 7 object(s) allocated from:
#0 0x7f472540e7ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7f471a9ce18f in rc_remove_unused_constants ../src/gallium/drivers/r300/compiler/radeon_remove_constants.c:101
#2 0x7f471a9b0836 in rc_run_compiler_passes ../src/gallium/drivers/r300/compiler/radeon_compiler.c:476
#3 0x7f471a9b0ad5 in rc_run_compiler ../src/gallium/drivers/r300/compiler/radeon_compiler.c:498
#4 0x7f471a9ec862 in r3xx_compile_fragment_program ../src/gallium/drivers/r300/compiler/r3xx_fragprog.c:172
#5 0x7f471a9e1ab2 in r300_translate_fragment_shader ../src/gallium/drivers/r300/r300_fs.c:516
#6 0x7f471a9e6303 in r300_pick_fragment_shader ../src/gallium/drivers/r300/r300_fs.c:591
#7 0x7f471a9544fe in r300_create_fs_state ../src/gallium/drivers/r300/r300_state.c:1073
#8 0x7f4718f2ebe5 in st_create_fp_variant ../src/mesa/state_tracker/st_program.c:1070
#9 0x7f4718f374b5 in st_get_fp_variant ../src/mesa/state_tracker/st_program.c:1116
#10 0x7f4718f38273 in st_precompile_shader_variant ../src/mesa/state_tracker/st_program.c:1281
#11 0x7f4718f38273 in st_finalize_program ../src/mesa/state_tracker/st_program.c:1345
#12 0x7f4718f389e9 in st_program_string_notify ../src/mesa/state_tracker/st_program.c:1378
#13 0x7f47199d9f99 in set_program_string ../src/mesa/main/arbprogram.c:413
Fixes: 1c2c4ddbd1 ("r300g: copy the compiler from r300c")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27957>
(cherry picked from commit 29df85788a)
Do not continue and call drmIoctl on an invalid file descriptor.
Fix defect reported by Coverity Scan.
Argument cannot be negative
The negative argument will be interpreted as a very large unsigned value.
CID: 1544377
Cc: mesa-stable
Signed-off-by: Corentin Noël <corentin.noel@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27788>
(cherry picked from commit b6962bbfc8)
When lower_simd_width() encounters an instruction that needs a larger
SIMD, for example SHADER_OPCODE_TXS_LOGICAL in Gfx4 needs at least
SIMD16. In this case the builder needs to be at least as large as
max_width, otherwise the group() setup will assert.
Turns out this did not assert before "by accident", since it was
relying on the default fs_visitor builder that had a dispatch width of 64,
a bogus placeholder value, expected not to be used.
However, when we changed the code to remove that builder (and the bogus
value), we created a new builder in the pass shader dispatch_width --
which work fine except in the case where we want to "lower" the SIMD above
the shader dispatch width. The fix is to also consider the already
calculated max_width when creating the builder.
Fixes: 5b8ec015f2 ("intel/compiler: Don't use fs_visitor::bld in remaining places")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10338
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27782>
(cherry picked from commit 337641cfcc)
Xe KMD also checks if cpu_caching caching set during bo creationg
matches with caching of the PAT index set in the VM unbind.
This was being unnoticed until now by luck and lack of testing in MTL.
So here always setting PAT index for all VM operations that has a bo
associated.
Fixes: eb18a92ef9 ("iris: Fill PAT fields in Xe KMD gem_create and vm_bind uAPIs")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27893>
(cherry picked from commit 963c08b623)
Let's say you have an image in R32_UINT format, a view is created in
R32_SFLOAT and used as color attachment.
When resolving the attachment, our current code uses the image format
(R32_UINT in this case). But resolve mode might apply only to SFLOAT,
so we currently run into an assert in blorp.
We should instead use the view format. There is an exception for
depth/stencil view because the format we want to resolve is actually
the depth/stencil format, not just the depth or stencil aspect.
This fixes vkd3d-proton's test_multisample_resolve_formats.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27875>
(cherry picked from commit 5a7e58a430)
If monolithic shaders were inlined, there might not be a radv_shader
associated with some stages. Zero out the shader allocation info in that
case, the shader will get identified by hash instead.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27890>
(cherry picked from commit b588cb29a3)
The stw_device and its screen are set up independently. It's possible
to have a device without a screen if the DLL is loaded but never
called into, since DllMain for PROCESS_ATTACH sets up the stw_device,
but the screen is initialized later on the first call to get pixel
formats. If the DLL is loaded and then unloaded, don't crash.
Cc: mesa-stable
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27892>
(cherry picked from commit f96d31bc8a)
Found by inspection. Original code was returning the size instead of the
number of levels. This was probably an over zealous search-and-replace
when PIPE_CAP_MAX_TEXTURE_2D_LEVELS was changed to _SIZE.
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Fixes: 0c31fe9ee7 ("gallium: Redefine the max texture 2d cap from _LEVELS to _SIZE.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27800>
(cherry picked from commit 1b890825f6)
this is a bandaid fix that allows users (zink) to actually call the
functions intended to be called. the real fix would be to figure out
which extensions are enabled on the device and then only GPA the
functions associated with those extensions
that's too hard though so I'm slapping some flex tape on it
cc: mesa-stable
Acked-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27834>
(cherry picked from commit 5d91db9666)
This extension has been broken ever since the initial commit. It created
an XRGB DRIImage for the driver to render to, so whilst the presentation
was opaque, the buffer also completely lacked an alpha channel.
Fix it by making sure we only modify the FourCC we send to the Wayland
server when creating a buffer.
Closes: mesa/mesa#5886
Fixes: 9aee7855d2 ("egl: implement EGL_EXT_present_opaque on wayland")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27709>
(cherry picked from commit 9ea9a963aa)
When we bind a descriptor set with dynamic descriptors, we can't ignore
dynamic descriptors in previously-bound higher descriptor sets. For
example, assume we have descriptor sets A and B, each of which has one
dynamic storage buffer, and we do:
CmdBindDescriptorSets(firstSet=1, descriptorSetCount=1, A)
CmdBindDescriptorSets(firstSet=0, descriptorSetCount=1, B)
and in the first CmdBindDescriptorSets the pipeline layout includes a
descriptor set layout compatible with B in set 0. Then, following
"Pipeline Layout Compatibility," set 0 is disturbed:
When binding a descriptor set to set number N, a previously bound
descriptor set bound with lower index M than N is disturbed if the
pipeline layouts for set M and N are not compatible for set M.
Otherwise, the bound descriptor set in M is not disturbed
When it's disturbed, it's effectively turned into a set with 1 undefined
dynamic storage buffer:
When a descriptor set is disturbed by binding descriptor sets, the
disturbed set is considered to contain undefined descriptors bound
with the same pipeline layout as the disturbing descriptor set.
This disturbed set is compatible with B, so in the second
CmdBindDescriptorSets this clause doesn't apply:
If, additionally, the previously bound descriptor set for set N was
bound using a pipeline layout not compatible for set N, then all
bindings in sets numbered greater than N are disturbed.
and A remains valid to access. The code before 88db7364 worked only if
the pipeline layout when binding B contained a descriptor layout
compatible with A in set 1, because it used the pipeline layout's total
size when allocating the internal dynamic descriptors array, but that
isn't actually a requirement, so the previous code was already broken.
After 88db7364 we only allocate as much space as required by the current
descriptors being bound, because I misread the rules here, which made it
more broken and broke 3DMark Wildlife Extreme that does something like
this.
In order to properly fix this we need to keep track of the maximum ever
seen dynamic descriptor size, similar to what we already do for
descriptor sets, and use that. We have no idea what needs to be
preserved when binding a descriptor set with dynamic descriptors, so we
have to be conservative.
Fixes: 88db7364 ("tu: Rework dynamic offset handling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27750>
(cherry picked from commit db0291c235)
Previously we were optimistic and tied this to certain format but wa
description lists other formats and bspec clearly disallows the usage.
Issue can be seen with different 16bpp tests, effect looks a bit like
dithering pattern but it is not, it is just rep16 failing.
Fixes:
GTF-GL46.gtf42.GL3Tests.texture_storage.texture_storage_texture_as_framebuffer_attachment
on DG2 and MTL, some 565 EGL tests on Android and internal issue on game
that displays a dither like pattern on the background while it's not
supposed to do that.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10646
Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27794>
(cherry picked from commit 1a4f220c29)
For instance, this issue is triggered with
"piglit/bin/ext_framebuffer_multisample-accuracy all_samples color depthstencil -auto -fbo":
Direct leak of 1160 byte(s) in 1 object(s) allocated from:
#0 0x7fbe8897d7ef in __interceptor_malloc (/usr/lib64/libasan.so.6+0xb17ef)
#1 0x7fbe7e7abfcc in rc_constants_copy ../src/gallium/drivers/r300/compiler/radeon_code.c:47
#2 0x7fbe7e7ec902 in r3xx_compile_fragment_program ../src/gallium/drivers/r300/compiler/r3xx_fragprog.c:174
#3 0x7fbe7e7e1b22 in r300_translate_fragment_shader ../src/gallium/drivers/r300/r300_fs.c:516
#4 0x7fbe7e7e6373 in r300_pick_fragment_shader ../src/gallium/drivers/r300/r300_fs.c:591
#5 0x7fbe7e75456e in r300_create_fs_state ../src/gallium/drivers/r300/r300_state.c:1073
#6 0x7fbe7cd2ebe5 in st_create_fp_variant ../src/mesa/state_tracker/st_program.c:1070
#7 0x7fbe7cd374b5 in st_get_fp_variant ../src/mesa/state_tracker/st_program.c:1116
#8 0x7fbe7cd38273 in st_precompile_shader_variant ../src/mesa/state_tracker/st_program.c:1281
#9 0x7fbe7cd38273 in st_finalize_program ../src/mesa/state_tracker/st_program.c:1345
#10 0x7fbe7d798ca8 in st_link_glsl_to_nir ../src/mesa/state_tracker/st_glsl_to_nir.cpp:724
#11 0x7fbe7d798ca8 in st_link_shader ../src/mesa/state_tracker/st_glsl_to_nir.cpp:952
#12 0x7fbe7d6790d5 in link_program ../src/mesa/main/shaderapi.c:1336
#13 0x7fbe7d6790d5 in link_program_error ../src/mesa/main/shaderapi.c:1447
...
SUMMARY: AddressSanitizer: 2528456 byte(s) leaked in 1057 allocation(s).
Fixes: 54f6e72b27 ("r300: better register allocator for vertex shaders")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27792>
(cherry picked from commit 4d00edda00)
readback should trigger on the current backbuffer, not the most recently
presented buffer. if e.g., a clear is only triggered through glFlush,
this clear should be read back rather than the contents of the last-presented
buffer
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27553>
(cherry picked from commit d2ed77072c)
the previous code could recycle a currently-submitting state by hitting
a race condition where zink_screen_check_last_finished(batch_id) returned
true because batch_id was 0
this can no longer recycle the current batch, but the race should still be
eliminated for consistency: check 'submitted' since this guarantees batch_id
is valid
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27729>
(cherry picked from commit 3283415bbd)
The EXT_texture_format_BGRA8888 spec clearly defines GL_BGRA as a
color-renderable format, so we need to support it here as well.
This has been broken since the day support for the extension was added.
Oh well, let's fix it up!
Fixes: 1d595c7cd4 ("gles2: Add GL_EXT_texture_format_BGRA8888 support")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27720>
(cherry picked from commit 3b23e9d89d)
With TES, the primitive ID is an input variable but it's considered a
sysval by SPIRV->NIR. Though, its value is greater than
VARYING_SLOT_VAR0 which means its location was adjusted by mistake.
This fixes compiling a tessellation evaluation shader in debug build
with Enshrouded.
Fixes: dfbc03fa88 ("spirv: Fix locations for per-patch varyings")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27413>
(cherry picked from commit 78ea304a06)
We need to allocate "shared size" bytes for each workgroup but
we were incorrectly multiplying by the number of workgroups in
each supergroup instead, which would typically cause us to allocate
less memory than actually required.
The reason this issue was not visible until now is that the kernel
driver is using a large page alignment on all BO allocations and
this causes us to "waste" a lot of memory after each allocation.
Incidentally, this wasted memory ensured that out of bounds
accesses would not cause issues since they would typically land
in unused memory regions in between aligned allocations, however,
experimenting with reduced memory aligments raised the issue,
which manifested with the UE4 Shooter demo as a GPU hang caused
by corrupted state from out of bounds memory writes to CS
shared memory.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27675>
(cherry picked from commit 1880e7cfed)
Indeed, vertex_buffer was not properly freed.
For instance, this issue is triggered with:
"piglit/bin/fcc-read-after-clear blit rb -auto -fbo"
while setting GALLIUM_REFCNT_LOG=refcnt.log.
Fixes: 8a963d122d ("r300g/swtcl: don't do stuff which is only for HWTCL")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27678>
(cherry picked from commit 3b90c46bdf)
There are a couple mistakes here :
- using a bitfield as an index to generate a bitfield...
- in anv_nir_push_desc_ubo_fully_promoted(), confusing binding
table access of the descriptor buffer with actual descriptors
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ff91c5ca42 ("anv: add analysis for push descriptor uses and store it in shader cache")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27504>
(cherry picked from commit cf193af762)
according to spec, these should return NONE if the format is
not supported for a given texture target, but mesa was incorrectly
returning a hardcoded value for all cases without checking the driver
instead, check whether the driver can create a texture for a given
format to correctly handle this non-support case
cc: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27621>
(cherry picked from commit 893780b362)
Left-shifting by 11*8 or 14*8 is undefined. This fixes many
dEQP-VK.query_pool.statistics_query.* failures (but not pre-existing
flakes) for release builds using clang.
Fixes: 48aabaf225 ("radv: do not harcode the pipeline stats mask for query resolves")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27651>
(cherry picked from commit ec5d0ffb04)
this is the case where:
* a batch A is submitted
* a no-op flush occurs
* the frontend gets the fence from already-flushed batch A
* zink recycles batch A
* the frontend waits on fence A
fixes#10598
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27623>
(cherry picked from commit fb2ae7736f)
now that zink_gfx_lib_cache::stages_present exists (and is correct),
this value can be used directly to effect cache eviction instead of depending
on the prog->stages_present value, which may not even be the same prog that
owns a given zink_gfx_lib_cache instance
this fixes the case where a shader used in multiple progs with differing shader
masks would never have all its gpl pipelines freed
fixes leaks with caselist:
KHR-Single-GL46.arrays_of_arrays_gl.InteractionUniformBuffers1
KHR-Single-GL46.subgroups.quad.framebuffer.subgroupquadbroadcast_3_float_vertex
Fixes: d786f52f1f ("zink: prevent crash when freeing")
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27358>
(cherry picked from commit e8ce53a33d)
some passes (e.g., opt_shrink_vector) operate on the assumption that
sparse tex ops have a certain number of components and then remove components
and unset the sparse flag if they can optimize out the sparse usage
zink's sparse ops do not have the standard number of components, which
causes such passes to make incorrect assumptions and tag them as
not being sparse, which breaks everything
fix#10540
Fixes: 0d652c0c8d ("zink: shrink vectors during optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27414>
(cherry picked from commit 2085d60438)
This works around some Unity engine behaivor with ANGLE-on-Venus, when
cmd pools are created on main thread once while the render thread only
does descriptor pool creation for set allocations during recording time.
This change also explicitly forces async pipeline create for threads
creating the device instead of implicitly via feedback cmd pool create.
This ensures intended behavior when feedback is disabled.
Fixes: d17ddcc847 ("venus: dispatch background shader tasks to secondary ring")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27347>
(cherry picked from commit 1718980e85)
The runtime is turning GENERAL layouts into FEEDBACK_LOOP ones when it
detects feedback loops in a render pass. This is breaking drivers that
would like to use a different HW layout for those 2 layouts because if
the application inserts barrier in the render pass, the barriers the
driver sees are inconsistent.
This could lead to barrier of this type :
- GENERAL -> FEEDBACK_LOOP (runtime)
- GENERAL -> GENERAL (app)
- FEEDBACK_LOOP -> GENERAL (runtime)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23523>
(cherry picked from commit 76cf391255)
this logic relies on constant indexing for compact arrays, but this is
frequently not the case for compact array builtins (e.g., gl_TessLevelOuter).
the usual strategy of lowering to temps isn't viable in TCS, which means
io lowering has to be able to handle indirect access to these builtins
without crashing
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27534>
(cherry picked from commit 9e2c7314f2)
The round up in 'next_address_8kb = DIV_ROUND_UP(push_constant_kb, 8)'
was not decreasing the amount of URB available for Mesh and Task, what
could cause an over allocation of URB.
There was also no minimum entries enforcement for Mesh and Task, what
could cause 0 r.mesh_entries to be set in a case where tue_size_dw is
90% > than mue_size_dw. Same for r.task_entries when Task is enabled.
Also adding a few more asserts to help debug.
This fixes at least dEQP-VK.mesh_shader.ext.properties.mesh_payload_size
in LNL but it has potential to fixes other Mesh tests as well.
Cc: mesa-stable
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27555>
(cherry picked from commit d0fba810b3)
When doing query result copies in 3D mode, we're flushing the render
target cache, but the shader writes go through the dataport.
Fixes flakes/fails in piglit with shader query copies forced with Zink :
$ query_copy_with_shader_threshold=0 ./bin/arb_query_buffer_object-coherency -auto -fbo
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b3b12c2c27 ("anv: enable CmdCopyQueryPoolResults to use shader for copies")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26797>
(cherry picked from commit c53a4711cb)
When bitrate or fps change is detected, only update rate control
parameters instead of completely reinitializing encode session.
This fixes an issue where if application changed bitrate or fps often,
the output bitrate would significantly overshoot the target bitrate in some
cases. In other cases, the output bitrate would be extremely low instead.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27548>
(cherry picked from commit 8d44a11508)
It is possible to free memory backing images before images are
destroyed :
VkFreeMemory:
"Memory can be freed whilst still bound to resources, but those
resources must not be used afterwards."
The spec leaves us the option to keep a reference on the associated
memory and free it only when all the bound resources have been
destroyed. Here we choose to free memory immediately.
One particular test in the CTS
(dEQP-VK.synchronization.internally_synchronized_objects.pipeline_cache_graphics)
does the following :
imgA = vkCreateImage()
imgB = vkCreateImage()
memA = vkAllocateMemory()
vkBindImageMemory(imgA, memA) # Aux mapping with ref count = 1
vkFreeMemory(memA) # Aux mapping removed, ref count = 0
memB = vkAllocateMemory() # Same address as memA
vkBindImageMemory(imgB, memB)
vkDestroyImage(imgA) # Removes the mapping of imgB-memB
vkQueueSubmit() # hang with pagefault in AUX-TT
The solution implemented in this change is to not do anything AUX-TT
related in vkFreeMemory(). This soluation has some consequences,
because a virtual memory address range freed and reallocated cannot be
rebound in the AUX-TT until all the associated resources have released
their AUX-TT mapping (to bring back the AUX-TT refcount of the range
to 0). This should still be better than keeping the memory allocated
through refcounting of the anv_bo.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 7b87e1afbc ("anv: track & unbind image aux-tt binding")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10528
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27566>
(cherry picked from commit e0b4dfbbda)
If a program does two blits in a row, we internally do a sequence of
operations that involves binding vb0.
Previously, the vb0 state after each operation would look something like:
| operation | cmd->state.gfx.vb0 | hardware | save->vb0 |
| ---------------------------- | ------------------ | --------- | --------- |
| | user | user | |
| nvk_meta_begin() | user | user | user |
| BindVertexBuffers(internal0) | internal0 | internal0 | user |
| nvk_meta_end() | internal0 | user | |
| nvk_meta_begin() | internal0 | user | internal0 |
| BindVertexBuffers(internal1) | internal1 | internal1 | internal0 |
| nvk_meta_end() | internal1 | internal0 | |
That is, CmdBindVertexBuffers() would update cmd->state.gfx.vb0, but
nvk_meta_end() would not. This meant that the last operation would bind a
driver-internal buffer instead of the original value that the user set.
This change fixes the issue by tracking cmd->state.gfx.vb0 in
nvk_cmd_bind_vertex_buffer(), which both CmdBindVertexBuffers() and
nvk_meta_end() call into.
After this commit, the state looks like:
| operation | cmd->state.gfx.vb0 | hardware | save->vb0 |
| ---------------------------- | ------------------ | --------- | --------- |
| | user | user | |
| nvk_meta_begin() | user | user | user |
| BindVertexBuffers(internal0) | internal0 | internal0 | user |
| nvk_meta_end() | user | user | |
| nvk_meta_begin() | user | user | user |
| BindVertexBuffers(internal1) | internal1 | internal1 | user |
| nvk_meta_end() | user | user | |
To test this commit, build gtk4 commit 87b66de1, run:
GSK_RENDERER=vulkan gtk4-demo --run=image_scaling
then select trilinear filtering in the dropdown and check for rendering
artifacts.
Fixes: e1c66501 ("nvk: Use vk_meta for CmdClearAttachments")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27559>
(cherry picked from commit d98ff2cc4a)
SPECviewperf creo-03 needs GL_EXT_shader_image_load_store in order for
its shaders to compile but we don't support a few corner cases that
didn't make it into the ARB variant. It seems to run fine with an
override, so just do that for now.
Cc: mesa-stable
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27429>
(cherry picked from commit 24d3c83212)
It can be the case that a collect and one of its sources are assigned
to non-overlapping parts of the same merge set, for example:
ssa_1 = ...
ssa_2 = ...
ssa_3 = ...
ssa_4 = collect ssa_1, ssa_2 (kill), ssa_3
... = ssa_4 (kill)
ssa_5 = collect ssa_1, ssa_3
... = ssa_1 (kill)
... = ssa_3 (kill)
... = ssa_5 (kill)
If we merge the first collect first, we get a merge set:
ssa_1 (offset 0)
ssa_2 (offset 2)
ssa_3 (offset 4)
ssa_4 (offset 0)
Now, we decide to merge ssa_1 and ssa_5:
ssa_1 (offset 0)
ssa_2 (offset 2)
ssa_3 (offset 4)
ssa_4 (offset 0)
ssa_5 (offset 0)
ssa_3 cannot become a child of ssa_5 in the interval tree, just like a
source not in the same merge set, so we should not remove it and then
reinsert it assuming that RA will make it a child of ssa_5.
This fixes an RA validation error in Farming Simulater.
Fixes: 0ffcb19 ("ir3: Rewrite register allocation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27497>
(cherry picked from commit aeed5fd98d)
If dual blending is enabled, only 1 output is supported. Multiple
outputs confuse the write combining pass in this case, leading to
incorrect output and/or an assert failure in emit_fragment_store.
The fix is straightforward, just skip the speculative emitting of
multiple outputs in the case where dual source blending is enabled.
This also adds an extra sanity check in `pan_nir_lower_zs_store` to
check for only one blend store being present.
Fixes: c65a9be421 ("panfrost: Preprocess shaders at CSO create time")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9487
Co-Authored-By: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26474>
(cherry picked from commit 49c1b404e5)
The CTS image allocation sometimes doesn't try to allocate a complete
DPB, but the amdgpu kernel module checks for this, so always make
the DPB max sized on uvd instances.
Fixes part of video decode on Fiji/Polaris
Cc: mesa-stable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27186>
(cherry picked from commit df9bc11589)
As per this optimisations description:
"Takes assignments to variables that are dereferenced only
once and pastes the RHS expression into where the variables
dereferenced."
However the optimisation is run at compile time before multiple
shaders from the same stage could have been pasted together.
So this optimisation can incorrectly assume a global is only
referenced once since it cannot see the other pieces of the
shader stage until link time.
Here we skip the optimisation if the variable is a global. We
could change it to only run at link time however this
optimisation is only run at link time if we are being forced
to use GLSL IR to inline a function that glsl to nir cannot
handle and this will also be removed in a future patchset.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10482
Fixes: d75a36a9ee ("glsl: remove do_copy_propagation_elements() optimisation pass")
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27351>
(cherry picked from commit bc0178af57)
Similar to what was done for Wayland in 58f90fd03f:
the glthread unmarhsal thread needs to be idle to avoid
concurrent calls to get_back_bo.
Also the existing code flushed after setting dri2_surf->back
to NULL so a new back buffer was always allocated by the
glthread flush:
|---------------> dri2_drm_swap_buffers
| get_back_bo (back=0x55eb93c6c488) > # First get_back_bo call
| get_back_bo (back=0x55eb93c6c488 age: 0)<
| # dri2_surf->back = NULL
|-----> FLUSH
| get_back_bo (back=nil) > # Another get_back_bo call
| get_back_bo (back=0x55eb93c6c4c8 age: 3)<
|-----< FLUSH
|---------------< dri2_drm_swap_buffers
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10437
Cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27143>
(cherry picked from commit 6f47e87a60)
GL ARB_sparse_buffer allows unbound regions in buffers.
VK sparseBinding insists all regions must be bound before first use.
This means we need to use sparseResidencyBuffer to back GL
sparse buffers to get the same semantics.
Fixes GL and piglit sparse buffer tests on zink/nvk.
Fixes: c90246b682 ("zink: implement sparse buffer creation/mapping")
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27404>
(cherry picked from commit ff50e80574)
LLVM_LIB_DIR is a variable used for runtime compilations.
When cross compiling, LLVM_LIB_DIR must be set to the
libclang path on the target. So, this path should not
be retrieved during compilation but at runtime.
dladdr uses an address to search for a loaded library.
If a library is found, it returns information about it.
The path to the libclang library can therefore be
retrieved using one of its functions. This is useful
because we don't know the name of the libclang library
(libclang.so.X or libclang-cpp.so.X)
v2 (Karol): use clang::CompilerInvocation::CreateFromArgs for dladdr
v3 (Karol): follow symlinks to fix errors on debian
Fixes: e22491c832 ("clc: fetch clang resource dir at runtime")
Signed-off-by: Antoine Coutant <antoine.coutant@smile.fr>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by (v1): Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25568>
(cherry picked from commit 445aacb421)
If the lane from which the hardware writes the unifa address
is disabled, then we may end up with a bogus address and invalid
memory accesses from follow-up ldunifa.
Instead of always disabling unifa loads in non-uniform control
flow we can try to see if the address is prouced from a nir
register (which is the only case where we do conditional writes
under non-uniform control flow in ntq_store_def), and only
disable it in that case.
When enabling subgroups for graphics pipelines, this fixes a
GMP violation in the simulator with the following test
(which has non-uniform control flow writing unifa with lane 0
disabled, which is the lane from which the unifa takes the
address):
dEQP-VK.subgroups.ballot_broadcast.graphics.subgroupbroadcastfirst_int
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
(cherry picked from commit 5b269814fc)
c->execute is 0 (not the block index) for lanes currently active
under non-uniform control flow.
Also this simplifies a bit the instructions we emit for flag
generation, both for uniform and non-uniform control flow.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
(cherry picked from commit 7bdc8898b1)
If the ELSE block is cheap then we don't emit the branch instruction
but we still want to generate the flags, since these are setting
the flags for the THEN block too.
Fixes: e401add741 ("broadcom/compiler: skip jumps in non-uniform if/then when block cost is small")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>
(cherry picked from commit 29d4924e5e)
descriptor buffer uses mapped buffers. mapping/unmapping buffers
uses a ctx in the function params, but at this time there is no ctx.
since the ctx is not actually used for unmapping descriptor buffers,
this can instead use a special buffer unmap function to avoid invalid access
Fixes: b06f6e00fb ("zink: fix heap-use-after-free on batch_state with sub-allocated pipe_resources")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27344>
(cherry picked from commit 0a97d1ebfa)
this is already implied since the buffers must be BAR-allocated,
but it ensures the context isn't accessed during unmap
Fixes: b06f6e00fb ("zink: fix heap-use-after-free on batch_state with sub-allocated pipe_resources")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27344>
(cherry picked from commit c900cca96c)
../src/compiler/nir/nir_builder.h: In function ‘nir_build_deref_follower’:
../src/compiler/nir/nir_builder.h:1607:1: error: control reaches end of non-void function [-Werror=return-type]
1607 | }
Fixes: 4a4e175738
nir: Support deref instructions in lower_var_copies
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27345>
(cherry picked from commit 0ab3b3c641)
../src/compiler/nir/nir_lower_int64.c: In function ‘lower_int64_intrinsic’:
../src/compiler/nir/nir_lower_int64.c:1347:1: error: control reaches end of non-void function [-Werror=return-type]
1347 | }
Fixes: bf7a114246
nir/lower_int64: Add lowering for some 64-bit subgroup ops
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27345>
(cherry picked from commit 80a1b91601)
../src/amd/vulkan/radv_sampler.c: In function ‘radv_tex_wrap’:
../src/amd/vulkan/radv_sampler.c:50:1: error: control reaches end of non-void function [-Werror=return-type]
50 | }
| ^
../src/amd/vulkan/radv_sampler.c: In function ‘radv_tex_compare’:
../src/amd/vulkan/radv_sampler.c:76:1: error: control reaches end of non-void function [-Werror=return-type]
76 | }
| ^
Fixes: 4de305cb8a
radv: move sampler related code to radv_sampler.c
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27345>
(cherry picked from commit ca47138fb1)
Memory can be free before images it is bound to. When unmapping the
CCS range in the AUX-TT, we cannot rely on the anv_bo::offset field
because the anv_bo might have been freed.
Just save the mapping address/size and use those values at unmapping
time.
Fixes an assert on CI with :
dEQP-VK.synchronization.internally_synchronized_objects.pipeline_cache_graphics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e519e06f4b ("anv: add missing alignment for AUX-TT mapping")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27304>
(cherry picked from commit 9d31680e79)
- no need to flush anything before as we're working on a clean
buffer (SI_OP_SKIP_CACHE_INV_BEFORE)
- L2 must be flushed after the job to avoid rendering artifacts.
Instead of setting it manually, use SI_OP_SYNC_AFTER +
SI_COHERENCY_NONE.
Fixes: 1a99f50c7f ("radeonsi: use a compute shader to convert unsupported indices format")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27095>
(cherry picked from commit cce5920025)
This fixes#9807 but I don't understand why.
Emitting cache flushes before VGT_PRIMITIVE_TYPE is what makes
the problem go away but changing the order in si_draw() is clearer.
The only cases where sctx->flags is modified in si_emit_draw_registers
is handled using si_emit_cache_flush_direct so we can move cache
flushing up without any addtional conditionals.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9807
Fixes: 1e4b539042 ("radeonsi: handle deferred cache flushes as a state (si_atom)")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27095>
(cherry picked from commit 0e16da89fe)
Buffers that are not dedicated can also be used for CCS mapped images,
so they need to be aligned to the AUX-TT requirements.
GTK+ is running into such case where it creates an image with a CCS
modifier. When requesting the alignment through
vkGetImageMemoryRequirements() the 64KB/1MB alignment is returned, but
the binding fails with an assert because the VkDeviceMemory has not
been aligned to the AUX-TT requirement and we cannot disable CCS since
the modifier requires it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4cdd3178fb ("anv: Meet CCS alignment reqs with dedicated allocs")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10433
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27258>
(cherry picked from commit e519e06f4b)
When updating an AFBC-packed resource, we need to make sure it is
legalized before blitting the staging resource to it. We can't rely
on the blit to properly convert the resource as it will result in
blit recursion and a crash.
If the whole texture is updated however, there is no need to unpack
as the content can be discarded. Just create a new BO with the right
format.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 33b48a5585 ("panfrost: Add debug flag to force packing of AFBC textures on upload")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27208>
(cherry picked from commit 1aa832e5f5)
There might be a more efficient path when legalizing a resource if
we don't need to worry about its content. For example, it doesn't
make sense to copy the resource content when converting the modifier
if the resource content is discarded anyway.
Signed-off-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Fixes: 33b48a5585 ("panfrost: Add debug flag to force packing of AFBC textures on upload")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27208>
(cherry picked from commit ee77168d57)
The hardware uses the lane index for per-vertex TCS output reads rather
than the vertex index. Fortunately, it's a pretty easy calculation to
go from one to the other.
Fixes: abe9c1fea2 ("nak: Add NIR lowering for attribute I/O")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27284>
(cherry picked from commit 99ef70d8aa)
VK_ACCESS_2_SHADER_STORAGE_READ_BIT specifies read access to a
storage buffer, physical storage buffer, storage texel buffer, or
storage image in any shader pipeline stage.
Any storage buffers or images written to must be invalidated and
flushed before the shader can access them.
This fixes the following tests on LNL:
- dEQP-VK.synchronization2.op.single_queue.barrier.write\*_specialized_access_flag
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27212>
(cherry picked from commit 3e93ccbc1b)
No driver supports urol/uror on all bit sizes. Intel gen11+ only for 16
and 32 bit, Nvidia GV100+ only for 32 bit. Etnaviv can support it on 8,
16 and 32 bit.
Also turn the `lower` into a `has` option as only two drivers actually
support `uror` and `urol` at this momemt.
Fixes crashes with CL integer_rotate on iris and nouveau since we emit
urol for `rotate`.
v2: always lower 64 bit
Fixes: fe0965afa6 ("spirv: Don't use libclc for rotate")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by (Intel and nir): Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27090>
(cherry picked from commit f2b7c4ce29)
When building the CFG the instructions are taken of the list in
fs_visitor and added to the lists inside each block. The single
"exec_node" in the instruction is used for those memberships.
In the case the pass rebuilt the CFG, it had no instructions, so
calculate_cfg() had nothing to work with. For now fix the bug by
pulling all the instructions back to the original list.
We can do better here, but punting until upcoming work on
CFG itself.
Issue found in an unpublished CTS test. Small reproduction in our
unit tests now enabled.
Fixes: 65237f8bbc ("intel/fs: Don't add MOV instructions to DO blocks in combine constants")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27131>
(cherry picked from commit 4dbf9181cd)
We're missing the ISA code in renderdoc. You can reproduce with the
Sascha Willems graphics pipeline demo.
The change is large here because we have to fix a confusion between
anv_shader_bin & anv_pipeline_executable. anv_pipeline_executable is
there as a representation for the user and multiple
anv_pipeline_executable can point to a single anv_shader_bin.
In this change we split the anv_shader_bin related logic that was
added in anv_pipeline_add_executable*() and move it to a new
anv_pipeline_account_shader() function.
When importing RT libraries, we add all the anv_pipeline_executable
from the libraries.
When importing Gfx libraries, we add the anv_pipeline_executable only
if not doing link time optimization.
anv_shader_bin related properties are added whenever we're importing a
shader from a library, compiling or finding in the cache.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 3d49cdb71e ("anv: implement VK_EXT_graphics_pipeline_library")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26594>
(cherry picked from commit 58c9f817cb)
This fixes VUID-vkCmdDraw-None-08600 violation when running gpl cts:
dEQP-VK...graphics_library.misc.bind_null_descriptor_set.*, where the
final pipeline layout is falsely dropped, leading to incompatible with
the pipeline layout of the bound descriptor set.
Fixes: a65ac274ac ("venus: Do pipeline fixes for VK_EXT_graphics_pipeline_library")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27054>
(cherry picked from commit 80a5df16fe)
The panfrost driver now makes an ioctl to retrieve some new memory
parameters, and DRM_PANFROST_PARAM_MEM_FEATURES is required (does not
default in the caller). This caused drm-shim to stop working. This
patch adds some defaults to get drm-shim working again.
Signed-off-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Fixes: 91fe8a0d28 ("panfrost: Back panfrost_device with pan_kmod_dev object")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27162>
(cherry picked from commit a50b2f8f25)
GFX10 has a hw bug and it can't handle 0-sized index buffer. The
non-indirect draw path was fine but not the indirect path where RADV
emits the index buffer.
This fixes flakes with dEQP-VK.*maintenance6* on NAVI14, and possibly
GPU hangs if there is an indirect draw with a valid index buffer right
before because it would re-use the same index buffer.
Fixes: db9816fd66 ("radv: add support for NULL index buffer")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27142>
(cherry picked from commit 783e3c096f)
On chordal graphs, a greedy coloring can be done in a way that never uses
more colors than are required for the largest clique. However, since we
have vector values and force phi resources into the same spill slots, the
interference graphs are not chordal, and thus, this assumption doesn't hold.
Use twice as many spill slots as upper bound.
Totals from 10 (0.01% of 79242) affected shaders: (GFX11)
MaxWaves: 52 -> 54 (+3.85%)
Instrs: 271386 -> 271779 (+0.14%)
CodeSize: 1362544 -> 1365432 (+0.21%)
VGPRs: 2536 -> 2532 (-0.16%)
SpillVGPRs: 778 -> 818 (+5.14%)
Scratch: 73472 -> 76800 (+4.53%)
Latency: 3331718 -> 3328798 (-0.09%); split: -0.14%, +0.05%
InvThroughput: 1665860 -> 1643350 (-1.35%); split: -1.40%, +0.05%
VClause: 3292 -> 3329 (+1.12%); split: -0.06%, +1.18%
Copies: 46082 -> 46257 (+0.38%)
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27011>
(cherry picked from commit e3098bb232)
If someone were to remove the libraries that are needed for these,
`default` would simply not enable these tests, and the only thing we
could notice is that test jobs would suddenly take less time to run.
Instead, let's have a check to make sure dEQP's cmake has detected
everything and enabled these platforms.
Reviewed-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27041>
(cherry picked from commit 27a1b4e4f3)
DOOM Eternal builds acceleration structures with inactive primitives and
tries to make them active in later AS updates. This is disallowed by the
spec and triggers a GPU hang. Fix the hang by working around the bug.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27034>
(cherry picked from commit a9831caa14)
Commit 73eecffabd ("panvk: Use the vk_pipeline_layout base struct")
reworked the panvk logic to use vk_pipeline_layout, which contains the
number of descriptor set layout referenced by a pipeline layout, thus
deprecating panvk_pipeline_layout::num_sets.
Make panvk_fill_non_vs_attribs() use vk_pipeline_layout::set_count
instead of panvk_pipeline_layout::num_sets and kill the latter so we
can't introduce new users.
Fixes: 73eecffabd ("panvk: Use the vk_pipeline_layout base struct")
Cc: mesa-stable
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Constantine Shablya <constantine.shablya@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27107>
(cherry picked from commit b18bfed2c5)
'debian/x86_64_build' job needs 'debian/x86_64_build-base' job, but 'debian/x86_64_build-base' is not in any previous stage
Fixes: f298a0e709 ("ci: make sure we evaluate the python-test rules first")
Fixes: 2c9fdaa830 ("ci: fix python-test dependency error on merge requests")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27042>
(cherry picked from commit 2ce0b5ab0a)
For instance, this issue is triggered with
vs-to-fs-overlap.shader_test -auto -fbo:
Direct leak of 24 byte(s) in 1 object(s) allocated from:
#0 0x7fe64f58e9a7 in calloc (/usr/lib64/libasan.so.6+0xb19a7)
#1 0x7fe642ca2839 in _mesa_symbol_table_ctor ../src/mesa/program/symbol_table.c:286
#2 0x7fe642ff003d in gl_nir_cross_validate_outputs_to_inputs ../src/compiler/glsl/gl_nir_link_varyings.c:728
#3 0x7fe642d7c7d8 in gl_nir_link_glsl ../src/compiler/glsl/gl_nir_linker.c:1357
#4 0x7fe642be6931 in st_link_glsl_to_nir ../src/mesa/state_tracker/st_glsl_to_nir.cpp:562
#5 0x7fe642be6931 in st_link_shader ../src/mesa/state_tracker/st_glsl_to_nir.cpp:944
#6 0x7fe642acab55 in link_program ../src/mesa/main/shaderapi.c:1336
#7 0x7fe642acab55 in link_program_error ../src/mesa/main/shaderapi.c:1447
#8 0x7fe6424aa389 in _mesa_unmarshal_LinkProgram src/mapi/glapi/gen/marshal_generated2.c:1911
#9 0x7fe641fd912b in glthread_unmarshal_batch ../src/mesa/main/glthread.c:139
#10 0x7fe641f48d48 in util_queue_thread_func ../src/util/u_queue.c:309
#11 0x7fe641fa442a in impl_thrd_routine ../src/c11/impl/threads_posix.c:67
Fixes: 7d1948e9b5 ("glsl: implement cross_validate_outputs_to_inputs() in nir linker")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27071>
(cherry picked from commit bacace8634)
Vivante hardware handles 64bpp render targets and samplers in a odd way
by splitting the buffer and using a pair of texture samplers or a pair
of MRT outputs to access those resources. This isn't implemented in the
driver right now, so we should not advertise support for those formats.
CC: mesa-stable
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26982>
(cherry picked from commit e481c1269c)
Prior to 06b526de, the mesa format was used for these completeness checks.
That was to address the case where a *different* internal format selected
the *same* mesa format, and the texture shouldn't be considered compatible.
But this didn't address the case where the *same* internal format selected
a *different* mesa format, e.g. because the type passed to the TexImage
API was different.
An old WGL demo app called TexFilter.exe tries to redefine a mipped RGBA16
texture as RGBA8. This incorrect logic caused Mesa to try to copy the RGBA16
data from the smaller mips into the newly created RGBA8 data, because it
thought that the texture was still mip-complete, despite the format changing.
Cc: mesa-stable
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27023>
(cherry picked from commit 4cb9c77e8e)
ring_seqno_valid indicates a successful ring cmd submission, and can be
used to avoid invalid reply decoding due to failed submit alloc.
Otherwise, the garbled VkResult will mislead into initialization failure
instead of oom.
Below cts failure is fixed:
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic
Fixes: ec131c6e55 ("venus: use instance allocator for ring allocs")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27026>
(cherry picked from commit ecd50e70d4)
The max waves for RT prolog need to be recalculated after merging the
resource usage of all shaders invoked from it.
Note that there is no need to panic, as the info was only used to
calculate maximum scratch size and with the RT prolog being low
footprint, this likely only caused overestimation rather than
underestimation.
Fixes: 533ec9843e ("radv: Precompute shader max_waves.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26998>
(cherry picked from commit 63827751e1)
This was broken when I added texcoord support, the problem is that we
failed to properly count the number of used fs inputs and thus we failed
to make the proper decision when to reuse the color varying slot
Also fix the error messages, they were incorrect after the rewrite as
well. This fixes a bunch of piglits.
Fixes: d4b8e8a481
Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27003>
(cherry picked from commit 53c17d85ab)
This change adds a new venus feature: TLS ring
- co-owned by TLS and VkInstance
- initialized in TLS upon requested
- teardown happens upon thread exit or instance destroy
- teardown is split into 2 stages:
1. one owner locks and destroys the ring and mark destroyed
2. the other owner locks and frees up the tls ring storage
TLS ring supercedes the prior secondary ring and enables multi-thread
shader compilation and reduces the loading time of ROTTR from ~110s to
~21s (native is ~19s).
TLS ring is in fact a synchronous ring by design, and can be used to
redirect all exisiting synchronous submissions trivially. e.g. upon any
vn_call_*, request a TLS ring, wait for deps and then submit.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26838>
This is to prepare for a new multi-ring design. A preview is as below:
- primary ring will migrate to be asynchronous only
- synchronous commands will be via thread local rings
- pipeline creations will be synchronous and dispatched to thread local
rings unless being forced to be async on primary ring
- perf option no_multi_ring is made generic to force a single ring
Pipeline cache retrieval is temporarily moved back to primary ring, but
will be moved to thread local later since it's a synchronous command.
The dependency resolving will follow the same with pipeline create with
detailed rationale later.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26838>
Similar idea to buffer memory requirements cache but CreateImage has
many more params that may affect the memory requirements.
Instead of a sparse array, generate a SHA1 hash of all the relevant
VkImageCreateInfo params including relevant pNext structures and use
part of the hash as a key to a hash table that stores the cache entries.
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26118>
The stack/scratch blocks need to be aligned to greater than the page
size in some cases. Add support for allocating BOs with a given
alignment in GPU VA space.
To avoid having to touch all callers, this adds a new function
agx_bo_create_aligned() for this purpose.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26963>
Detroit: Become Human has an optimization issue where the same BO is
mapped and unmapped on a per-frame basis, which leads to massive
overhead in the kernel, both creating the mapping and taking page
faults.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26942>
dEQP-VK.pipeline.monolithic.cache.*
Test run totals:
Passed: 773/773 (100.0%)
Failed: 0/773 (0.0%)
Not supported: 0/773 (0.0%)
Warnings: 0/773 (0.0%)
Waived: 0/773 (0.0%)
Timing these test:
Before:
real 0m11,304s
user 0m9,442s
sys 0m0,477s
After:
real 0m3,470s
user 0m1,962s
sys 0m0,504s
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25550>
Those engines don't have PIPE_CONTROL so we can't do
ANV_TIMESTAMP_CAPTURE_AT_CS_STALL but we can support measurement
by changing the capture type to ANV_TIMESTAMP_CAPTURE_TOP/END_OF_PIPE
Right now this issue is only reproduced in Xe KMD without setting
any special parameters(other than INTEL_MEASURE) because Xe KMD allows
the usage of copy engine while i915 can't due TRTT restrictions.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26882>
EXT_color_buffer_float makes both 16 and 32 bit floating-point texture
formats color-renderable. We can't just unconditionally report that, we
need to check for support.
The RGB formats are a bit special under this extension, because it's not
specified as color-renderable. However, because the RGBA formats *are*
specified as color-renderable, and the state-tracker can emulate the RGB
formats with the RGBA ones, we don't need to test for that here.
While we're at it, move EXT_color_buffer_half_float to its correct
sorted position.
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26870>
Demoting means that we don't execute any writes to memory but
otherwise the invocation continues to execute. Particularly,
subgroup operations and derivatives must work.
Our implementation of discard does exactly this by using
setmsf to prevent writes for the affected invocations, the
only difference for us is that with discard/terminate we
want to be more careful with emitting quad loads for tmu
operations, since the invocations are not supposed to be
running any more and load offsets may not be valid, but with
demote the invocations are not terminated and thus we should
emit memory reads for them to ensure quad operations and
derivatives from invocations that have not been demoted still
work.
Since we use the sample mask to implement demotes we can't tell
whether a particular helper invocation was originally such
(gl_HelperInvocation in GLSL) or was later demoted
(OpIsHelperInvocationEXT added with SPV_EXT_demote_to_helper_invocation),
so we use nir_lower_is_helper_invocation to take care of this.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26949>
There are two "Src1.Length" with different formats in "send" description
in the PRMs. One is part of ExMsgDesc, is relevant for LSC SFIDs, and
exists if [ExDesc.IsReg]==false. The other is just a 5-bit immediate,
is relevant for other SFIDs too, and exists if ([ExDesc.IsReg]==true)
AND ([ExBSO]==true).
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25657>
Having the reg set with predication disabled shouldn't cause any problems
during the execution. But when decompiling such instruction the flag won't
be shown in the output, so the recompiling will cause
functionally-identical but binary-different code. Fixing this makes
disasm/asm testing easier.
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25657>
If we legalize AFBC late, we end up in a situation while we might need
to do a blit while inside a previous blit operation, but u_blitter
state isn't saved recursively, and that leads to crashes.
This patch solves this issue by splitting panfrost_blit into two
functions and legalizing AFBC early.
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24942>
Due to some issues with GCC and this warning in very long files, we
disabled it when compiling NIR. Unfortunately by design Meson doesn't
allow us to set flags per source file.
The warning is still enabled in clang. but it is less commonly
used during development. To avoid missing catching those warnings,
add -Werror=misleading-indentation to the GitLab CI debian-clang build.
See https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25315 for
more context. This patch is a transcription of what Eric Engestrom
suggested, except only targetting C flags (since we only disable them
for C in NIR build).
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26938>
When a file is too large, -Wmisleading-indentantion will give the warning
below, that we can't prevent from a #pragma:
```
src/compiler/nir/nir_opt_algebraic.c: In function ‘nir_opt_algebraic’:
src/compiler/nir/nir_opt_algebraic.c:1469069: note: ‘-Wmisleading-indentation’ is disabled from this point onwards, since column-tracking was disabled due to the size of the code/headers
1469069 | nir_foreach_function_impl(impl, shader) {
|
src/compiler/nir/nir_opt_algebraic.c:1469069: note: adding ‘-flarge-source-files’ will allow for more column-tracking support, at the expense of compilation time and memory
```
See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89549 for details.
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25315>
This removes output like
```
CS SIMD16 shader: 2790 inst, 0 loops, 24804 cycles, 166:106 spills:fills, 35 sends,
scheduled with mode top-down, Promoted 1 constants, compacted 44640 to 41424 bytes.
```
from the default builds. Like other debug output in intel_clc, they can
re-enabled with INTEL_DEBUG=cs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26939>
mesa/main/draw.c calls threaded_context to add a draw call, but the caller
fills it manually.
This way we don't have to fill pipe_draw_info in a local variable and later
copy it to tc_batch. tc_batch is filled from draw.c directly.
It also eliminates a few conditional jumps thanks to assumptions we can make
in DrawElements but not tc_draw_vbo.
This decreases the overhead of the GL frontend thread by 1.1%, which has
CPU usage of 26%, so it decreases the overhead for that thread by 4.2%.
(1.1 / 26)
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26584>
Instead of a series of if-else statements, use a function table containing
all the draw variants, which is indexed by:
[is_indirect * 8 + index_size_and_has_user_indices * 4 +
is_multi_draw * 2 + non_zero_draw_id]
This decreases the overhead of tc_draw_vbo by 0.7% (from 4% to 3.3%)
for the GL frontend thread in VP2020/Catia1, which has CPU usage of 26%,
so it decreases the overhead for that thread by 2.7%. (0.7 / 26)
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26584>
flags=0 is used for e.g., glFenceSync, which apps use to insert sync points
to determine when all prior work has completed. eliding these flushes into no-ops
is fine for all scenarios except when the last op was a present, in which
case the no-op (previous) fence will not sync as expected for the present and
graphical artifacts will result
in the future, this may be changed back to the previous behavior if/when presentation
gains timeline semaphore capabilities by providing the last timeline id
as a fence instead of the last batch
fixes#10386
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26935>
The built-in tiled-to-tiled copy packet doesn't support copying
between images that don't meet certain criteria such as alignment,
micro tile format, compression state etc.
To work around this, we copy the image piece by piece to a
temporary buffer that we know is supported,
and then copy it to the intended destination.
The implementation assumes that at least one pixel row of the
image fits into the temporary buffer, and will try to copy as
many rows as fit.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26913>
When either of the images is linear then the implementation can
use the same packets as used by the buffer/image copies.
However, tiled to tiled image copies use a separate packet.
Several variations of tiled to tiled copies are not supported
by the built-in packet and need a scanline copy as a workaround,
this will be implemented by an upcoming commit.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26913>
Future compilers will fail compilation due to the C type error:
…/testfile.c: In function 'main':
…/testfile.c:12:30: error: passing argument 2 of 'strtod_l' from incompatible pointer type
12 | double d = strtod_l(s, end, loc);
| ^~~
| |
| char *
/usr/include/stdlib.h:416:43: note: expected 'char ** restrict' but argument is of type 'char *'
416 | char **__restrict __endptr, locale_t __loc)
| ~~~~~~~~~~~~~~~~~~^~~~~~~~
…/testfile.c:13:29: error: passing argument 2 of 'strtof_l' from incompatible pointer type
13 | float f = strtof_l(s, end, loc);
| ^~~
| |
| char *
/usr/include/stdlib.h:420:42: note: expected 'char ** restrict' but argument is of type 'char *'
420 | char **__restrict __endptr, locale_t __loc)
| ~~~~~~~~~~~~~~~~~~^~~~~~~~
This means that the probe no longer tests is objective and always
fails.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26927>
Pipeline is unable to run due to the following error:
'python-test' job needs 'debian/x86_64_build' job, but 'debian/x86_64_build' is not in any previous stage
python-test job has the following rule:
- changes:
- bin/ci/**/*
so add it as well to debian/x86_64_build
Signed-off-by: Helen Koike <helen.koike@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26802>
This decreases the time spent in amdgpu_cs_submit_ib from 15.4% to 8.3%
in VP2020/Catia1, which is a decrease of CPU load for that thread by 46%.
Overall, it increases performance by a small number in CPU-bound benchmarks.
The biggest improvement I have seen is VP2020/Catia2, where it increases
FPS by 12%.
It no longer stores pipe_fence_handle references inside amdgpu_winsys_bo.
The idea is to have a global fixed list of queues (only 1 queue per IP
for now) where each queue generates its own sequence numbers (generated
by the winsys, not the kernel). Each queue also has a ring of fences.
The sequence numbers are used as indices into the ring of fences, which
is how sequence numbers are converted to fences.
With that, each BO only has to keep a list of sequence numbers, 1 for each
queue. The maximum number of queues is set to 6. Since the system can
handle integer wraparounds of sequence numbers correctly, we only need
16-bit sequence numbers in BOs to have accurate busyness tracking. Thus,
each BO uses only 12 bytes to represent all its fences for all queues.
There is also a 1-byte bitmask saying which sequence numbers are
initialized.
amdgpu_winsys.h contains the complete description. It has several
limitations that exist to minimize the memory footprint and updating of
BO fences.
Acked-by: Yogesh Mohan Marimuthu <yogesh.mohanmarimuthu@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26643>
We would compute the unique IDs for 1000 slab entries and then only use
a few, wasting the IDs. Assign the IDs only when we actually need to
return a new buffer.
This decreases the number of collisions we get in amdgpu_lookup_buffer,
and thus the number of times we have to search in the BO list.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26643>
../src/amd/common/ac_rgp.c:119:48: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
119 | header->flags.is_semaphore_queue_timing_etw = 1;
| ^ ~
Fixes: ed0c852243 ("radv: add initial SQTT files generation support")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26839>
Commit 659bace01a ("egl: add a few trailing commas") added a few
trailing commas to the last item in a struct to avoid that commit
e85983d772 ("egl: re-format using clang-format") moves the closing
brace into the previous line.
The wl_callback_listener was missing the comma and the brace was moved
to the previous line, which makes it harder to read the code.
Add the comma and fix the formatting.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26902>
Currently the shader cache is broken when running under flatpak-spawn
because the shader cache's parent directory will not exist. For example,
the shader cache directory might be:
/home/mcatanzaro/.var/app/org.gnome.Epiphany.Devel/cache
If /home/mcatanzaro/.var/app/org.gnome.Epiphany.Devel/ does not already
exist, we fail. Let's create the directories recursively, as if by
'mkdir -p', rather than just fail.
Fixes#8294
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25925>
INTEL_NEEDS_WA macros are valid when a workaround applies to all
platforms which have the GFX_VERx10 versions for the workaround.
Some workarounds were fixed at a stepping after the platform release.
If a workaround applies partially to any platform, then GFX_VERx10
cannot be used to correctly apply the workaround.
This change invalidates INTEL_NEEDS_WA_16014538804 and
INTEL_NEEDS_WA_22014412737, which were fixed for MTL platforms at
stepping b0. The run-time checks were already present for all uses of
these macros. Updating the poisoned macros to INTEL_WA_{num}_GFX_VER
compiles out the run-time checks on platforms where they cannot apply.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26898>
glcpp mingw failing in the following way:
```
49/56 mesa:compiler+glcpp / glcpp test (unix) FAIL 3.39s exit status 1
50/56 mesa:compiler+glcpp / glcpp test (oldmac) FAIL 3.41s exit status 1
51/56 mesa:compiler+glcpp / glcpp test (bizarro) FAIL 3.42s exit status 1
52/56 mesa:compiler+glcpp / glcpp test (windows) FAIL 3.45s exit status 1
```
The test failed because on mingw, the stderr will comes after stdout,
but all the expect files, the stderr is coming first,
so we flush(stderr) first to makesure stderr out before stdout
The failing example:
039-func-arg-obj-macro-with-comma: FAIL
---
+++
@@ -1,3 +1,5 @@
+0:12(21): preprocessor error: Error: macro foo invoked with 2 arguments (expected 1)
+
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26778>
Because we use unaligned dispatches, 1D launches only use 8 threads per
wave. Converting to 2D and fixing up launch IDs in the prolog
significantly increases occupancy.
Gives ~30% uplift in Ghostwire Tokyo.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26105>
zink_bo_create can run into a heap-use-after-free when the bo is still
referencing an batch_state from an older destroyed context. In order to
fix this, every context gives back their batch_states to the zink, where
they can be reused from for new contexts.
Cc: mesa-stable
Suggested-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26889>
This is safe to do in all circumstances due to the age of the hardware.
(we don't have UBOs, just constant registers with automatic OOB checks)
R500 hardware doesn't have standard adress register in fragment shaders
and while we have the loop register which we in theory can use for indirect
access, this is currently not possible to wire through NIR. So anytime
there is an indirect uniform array access in a loop, we end with a if
ladder with size depending on the size of the uniform array. The two worst
behaving apps here are glamor and some GTK shaders, both of which are
sometimes ending over the 512 instructions limit. Flattening the if
ladders helps a LOT, so we can get into the instruction limit in most
cases (all glamor shaders are OK now). So just enable the flattening by
setting all load_ubo_vec4 with ACCESS_CAN_SPECULATE.
Shader-db RV530:
total instructions in shared programs: 128762 -> 128440 (-0.25%)
instructions in affected programs: 540 -> 218 (-59.63%)
helped: 3
HURT: 0
total temps in shared programs: 17543 -> 17550 (0.04%)
temps in affected programs: 11 -> 18 (63.64%)
helped: 0
HURT: 3
total cycles in shared programs: 196984 -> 196657 (-0.17%)
cycles in affected programs: 592 -> 265 (-55.24%)
helped: 3
HURT: 0
LOST: 0
GAINED: 7
No changes for R300/R400 because there we don't have control flow
anyway.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6366
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26877>
Shader-db RV370:
total instructions in shared programs: 82071 -> 82155 (0.10%)
instructions in affected programs: 792 -> 876 (10.61%)
helped: 0
HURT: 12
total temps in shared programs: 12775 -> 12778 (0.02%)
temps in affected programs: 27 -> 30 (11.11%)
helped: 0
HURT: 3
total cycles in shared programs: 128403 -> 128499 (0.07%)
cycles in affected programs: 864 -> 960 (11.11%)
helped: 0
HURT: 12
The same regression for the few GTK shaders that happens with the R500
nir fcsel lowering also happens here due to the
nir_move_vec_src_uses_to_dest.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6126
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26816>
We can't call nir_lower_bool_to_float too early, because some other
passes like nir_opt_peephole_select will blow up, but we can still do
some selected parts to enable some optimiazions at a later point
(like fcsel(a,b,0) into fmul), etc.
No change in shader-db with RV370 or RV530 at this point.
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26816>
Right now this is done in the backend so move it up to NIR. Doing this
in the backend is easier, as at that time we can have a better idea
about when we hit the hardware limits of three different TMP sources,
however moving this to NIR allows for some optimizations. Specifically,
at this time if we decide we actually have to lower we still have the
info if we have plain fcsel for which we can save the comparison and
emit flrp only. During translation to TGSI all of fcsel, fcsel_gt, and
fcsel_ge translate to CMP so at that point the comparison is always needed.
Shader-db RV530:
total instructions in shared programs: 126057 -> 125823 (-0.19%)
instructions in affected programs: 11359 -> 11125 (-2.06%)
helped: 68
HURT: 12
total temps in shared programs: 17043 -> 17023 (-0.12%)
temps in affected programs: 459 -> 439 (-4.36%)
helped: 32
HURT: 12
total cycles in shared programs: 191604 -> 191294 (-0.16%)
cycles in affected programs: 11834 -> 11524 (-2.62%)
helped: 68
HURT: 12
The hurt shaders are some GTK shaders where there is some bad
interaction with nir_move_vec_src_uses_to_dest. This is known and might
be improved later by thweking the pass more.
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26816>
We do ffloor by default for adress register load so no need to do it
explicitly. This needs to happen after int lowering, otherwise we get
ftrunc by default as a bonus. This is mostly for wined3d.
Shader-db RV370:
total instructions in shared programs: 82147 -> 82071 (-0.09%)
instructions in affected programs: 2772 -> 2696 (-2.74%)
helped: 32
HURT: 0
total cycles in shared programs: 128479 -> 128403 (-0.06%)
cycles in affected programs: 2813 -> 2737 (-2.70%)
helped: 32
HURT: 0
Shader-db RV530:
total instructions in shared programs: 126141 -> 126057 (-0.07%)
instructions in affected programs: 3170 -> 3086 (-2.65%)
helped: 36
HURT: 0
total cycles in shared programs: 191688 -> 191604 (-0.04%)
cycles in affected programs: 3222 -> 3138 (-2.61%)
helped: 36
HURT: 0
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26816>
multisample textures don't have mipmaps, so store sample_stride
into mipmap offset 15 and store num_samples into last_level
We can't use mipmap_offset0 as arrays might still store some values
into it.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25398>
This is step one in a size reduction plan, this reduces lp_jit_texture
by making all the fields smaller in the struct and upsizing on the llvm
size. It goes from 280->264.
This isn't sufficient to get conformance, but it's a good step one.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25398>
If Mesa is executed under valgrind, fd_bo_init_common() calls
fd_bo_map() internally. For the heap (sub-block) allocator this causes a
segfault in fd_bo_map(), when this function tries to call the offset()
callback.
To prevent this from happening, preallocate fb->map before calling into
fd_bo_init_common(), stop calling VG_BO_ALLOC() if the memory map is
already initialised and disable the VG_BO_FREE call for the heap
allocator.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26277>
If the shader memory has been allocated with the FD_BO_NOMAP and got
later allocated a memory chunk during fd_bo_upload(), this can result in
the valgrind splat when it tries to release the free and/or cache the
BO. To fix this issue, notify valgrind about newly mmaped shader memory.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26277>
That enables testing and development of the Venus-based Vulkan HAL
on a wider range of Android systems - flavors of "Cuttlefish" are
of particular practical interest. At this point, only two gralloc
variants are supported: CrOS and IMapper v4. The fallback gralloc
and any gralloc adapter modules relying on it (GBM, QCOM) are out
of scope for Android Vulkan HAL now.
Signed-off-by: VladimirTechMan <VladimirTechMan@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26858>
This new pass aims to simplify loop control-flow by reducing the number
of break and continue statements. It also supersedes nir_opt_trivial_continues().
For this purpose, it implements 3 optimizations:
- opt_loop_terminator(), as previously
- opt_loop_merge_break_continue(), similar to opt_merge_breaks() incl. continues
- opt_loop_last_block(), a generalization of opt_if_loop_last_continue()
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24940>
Mirrors workaround done for Turnip.
A690 seem to have broken UBWC for depth/stencil, it requires
depth flushing where we cannot realistically place it, like between
ordinary draw calls writing read/depth. WSL blob seem to use ubwc
sometimes for depth/stencil.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26226>
A690 seem to have broken UBWC for depth/stencil, it requires
depth flushing where we cannot realistically place it, like between
ordinary draw calls writing read/depth. WSL blob seem to use ubwc
sometimes for depth/stencil.
Some tests that this fixes:
dEQP-VK.pipeline.monolithic.stencil.format.d24_unorm_s8_uint.states.fail_repl.pass_decw.dfail_inv.comp_never
dEQP-VK.api.image_clearing.core.partial_clear_depth_stencil_attachment.single_layer.d32_sfloat_s8_uint_separate_layouts_depth_64x11
dEQP-VK.api.image_clearing.dedicated_allocation.partial_clear_depth_stencil_attachment.single_layer.d16_unorm_33x128
dEQP-VK.glsl.builtin_var.fragdepth.point_list_d32_sfloat_s8_uint_no_depth_clamp
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26226>
The blend state is considered to be dynamic when no
VkPipelineColorBlendStateCreateInfo is passed in at pipeline creation.
Fixes:
VUID-VkGraphicsPipelineCreateInfo-renderPass-06055 when running
"Quern - Undying thoughts" when the GFX level enables blending.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26718>
Don't rely on the current default (which is 2048 bytes) buffer size for
blocks -- which ends up being too small for most shaders. Since we
already rely on value_id_bound to allocate an array of vtn_value, use
that to estimate a better value.
In addition to space for the array, we approximate the extra size of
extra data structures with the size of vtn_ssa_value, and skip it to the
next size (double it) to cover the CFG related allocations. This
results in only single system allocation necessary to back the temporary
data for the majority of the shaders.
Parsing code was slightly reordered so we can validate and read the
value_id_bound before the temporary allocator is created.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25279>
All the vtn_* structures and arrays are used only during the lifetime of
spirv_to_nir(); we don't need to free them individually nor steal
them out; and some of them are smaller than the 5-pointer header
required for ralloc allocations.
These properties make them a good candidate for using an
arena-style allocation.
Change the code to create a linear_parent and use that for all the vtn_*
allocation. Note that NIR data structures still go through ralloc,
since we steal them (through the nir_shader) at the end, i.e. they
outlive the parsing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25279>
Fixes the build on OpenBSD, where major() is in sys/types and
sys/sysmacros.h does not exist. Also include sys/mkdev.h if
MAJOR_IN_MKDEV is defined.
Fixes: 6d60115be7 ("zink: Fix enumerate devices when running compositor")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26735>
Yet another iteration on the same YUV surfaces.
The change from 87ecfdfbf0 has 2 odd things:
* it's using MAX2(original value, new value) but the point of updating
surf_slice_size / surf_size is to make it correct relative to the new
value of surf_pitch
* it's multiplying surf_pitch (= number of elements per row) by height (ok)
by surf->bpe (= number of bytes per element) by surf->blk_w (= number of
"horizontal" pixels in an element) so the end unit doesn't make sense.
Fix this by computing a reasonnable value based on unit: the surf_slice_size
is the number of elements per row (surf_pitch) x number of bytes per element
(bpe) x number of rows.
This makes the expected size correct and thus fixes users of eglCreateImageKHR,
like the issue #6131.
I tested a bunch of gst pipelines and ffmpeg scripts on various files I have
and didn't notice any issues (on gfx10.3 and gfx9).
Fixes: 87ecfdfbf0 ("ac/surface: adapt surf_size when modifying surf_pitch")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6131
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26693>
While running tests, it is be useful to have non-sequenced dumps of
certain buffers to see their contents from changes in the decompiled
CS. This introduces a function gpu_read_into_file(...) for specifying
a file to read a specific GPU buffer into after replaying the CS.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
In newer traces, in any cases where instructions need to be executed
for both cases of a predicate, such as for GMEM/sysmem. The proprietary
driver emits the TRUE and FALSE body one after another with a NOP at
the end of the TRUE condition body so the CP skips over the FALSE body.
Currently, the NOP skips over all instructions in the ELSE body which
results in them not being decoded whatsoever. This commit checks if
we encounter any NOPs while in a conditional block and appropriately
parses out them out into their own ELSE scope when we do.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
Due to the larger amount of conditional execution in newer traces
the flat view makes it hard to parse what might be conditionally
executed and what might now. This makes it easier to view by adding
a scope for conditionally executed commands which is indented to the
next level.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
When the source destination index is a constant, we can avoid generating
a lot of the intermediate code. At the very least, this makes initial
NIR dumps much easier to read.
v2: Simplify tracking of dst_index. Suggested by Caio.
Suggested-by: Caio
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
Gfx12.5 (DG2) will use DPAS instructions to accelerate the
implementation. Earlier platforms will use equivalent discrete
instructions (basically subgroup operations). Gfx12 (Tigerlake) will use
DP4A for 8-bit integer matrix multiplication. Older platforms, which
lack DP4A, will use a suboptimal instruction sequence. There is plenty
of room for improvement here.
On DG2 (Gfx12.5) gets the following results from the CTS:
Test run totals:
Passed: 1642/13982 (11.7%)
Failed: 0/13982 (0.0%)
Not supported: 12340/13982 (88.3%)
Warnings: 0/13982 (0.0%)
Waived: 0/13982 (0.0%)
On DG2 (Gfx12.5) with forced lowering, Raptor Lake (Gfx12) and Ice Lake
(Gfx11):
Test run totals:
Passed: 1662/13982 (11.9%)
Failed: 0/13982 (0.0%)
Not supported: 12320/13982 (88.1%)
Warnings: 0/13982 (0.0%)
Waived: 0/13982 (0.0%)
The difference in the number of tests run is due to
saturatingAccumulation not being set on DG2 when DPAS is used. There is
a comment in "intel/dev: Advertise integer configs with
saturatingAccumulation too" that explains how this could be added should
the need arise.
v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
VUID-RuntimeSpirv-saturatingAccumulation-08983 says:
For OpCooperativeMatrixMulAddKHR, the SaturatingAccumulation
cooperative matrix operand must be present if and only if
VkCooperativeMatrixPropertiesKHR::saturatingAccumulation is VK_TRUE.
As a result, we have to advertise integer configs both with and without
this flag set.
v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
The commit is a little ugly. The definition of anv_fixup_subgroup_size
is moved before the added call site. In addition, the bit starting at
the "Cooperative matrix extension requires..." comment is added.
v2: Dramatic simplification of SIMD selection. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
This flag tracks whether or not cooperative matrices are fully enabled
on the physica device (i.e., both the configs exist and the environment
varible is set). This is mainly to support a later commit "anv: Set
PIPELINE_SELECT systolic mode enable flag."
This could be squashed into "anv: Implement VK_KHR_cooperative_matrix."
I left it separate because we might go back to the previous method.
v3: Don't hide the extension behind an environment variable
(ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting
PIPELINE_SELECT.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
v2: Rebase on moving lowering pass to src/intel/compiler.
v3: Don't hide the extension behind an environment variable
(ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting
PIPELINE_SELECT.
v4: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Also rebase
on f99e43d606 ("anv: switch to use runtime physical device properties
infrastructure").
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
Implements integer dot product lowering both with and without
DP4A. Implements half-float dot product lowering.
There are a couple FINISHME comments describing future optimizations.
v2: Add a brw_compiler::lower_dpas flag to track when the lowering
should be applied.
v3: Use is_null() instead of checking file != ARF. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix
overlapping register allocation (via has_source_and_destination_hazard). Fix
incorrect destination register file encoding.
v3: Prevent lower_regioning from trying to "fix" DPAS sources.
v4: Add instruction latency information for scheduling and perf
estimates.
v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update
the comment in fs_inst::has_source_and_destination_hazard. Suggested
by Caio.
v6: Add some comments near the src2 calculation in
fs_inst::size_read. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
With this, a minimum test case passes:
void main()
{
coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matA;
coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matR;
matA = coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(2.0);
matR = coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(matA);
coopMatStore(matR, result, 0, N, gl_CooperativeMatrixLayoutRowMajor);
}
v2: Use nir_vec instead of explicit nir_vec{2,4}. Also fixes a typo in
one of the 4x8 cases.
v3: Use nir_pack_bits and nir_unpack_bits to dramatically simplify
coop_unary handling. This saved 67 lines of code.
v4: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.
v5: Massive update to the comment in lower_cooperative_matrix_unary_op
with some suggestions from Caio. Add a comment and assertion around
`nir_def *v[4]`. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
Also splits off another funciton get_slice_type_from_desc that will be
used in future commits.
v2: Allow packing factor 2 and packing factor 1 elements be stored in
16-bit integers.
v3: Use glsl_base_type_get_bit_size.
v4: Adjust packing so that a single row fills an entire GRF.
v5: Add comment for get_packing_factor and some other cleanups
there. s/cooperative_matrix/cmat/. Tighten the validation of len in
gt_slice_from_desc. All suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
This is just the skeleton of the implementation. Future commits will
fill it all in.
v2: Move to src/intel/compiler
v3 (idr): Use vecN instead of array[N] for slice type.
v4 (idr): Refactor lower_cooperative_matrix_load and
lower_cooperative_matrix_store into a single function.
v5 (idr): Remove old, verbose debug logging. Assert that entry is not
NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of
0xffff. s/cooperative_matrix/cmat/. All suggested by Caio.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
I put both R-b on this because, at this point, we've each done equal
parts authoring and reviewing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
This function was recently simplified based on the idea that if a
modifier is not present, then the plane count should not exceed the
plane count of the resource's external format. This seems to be true
except for lowered images. We don't enable compression modifiers on
lowered images, so this case was not handled during the transition.
As an example of the lowering that may occur: PIPE_FORMAT_YVYU is a
single plane, subsampled format that the gallium layer lowers to two
planes/formats (R8G8_UNORM and B8G8R8A8_UNORM) if not natively supported
by the hardware.
Fixes the assert failure when running the piglit test case:
ext_image_dma_buf_import-sample_yuv -fmt=YVYU -auto
ext_image_dma_buf_import-sample_yuv:
../../src/gallium/drivers/iris/iris_resource.c:1384:
iris_resource_from_handle:
Assertion `main_res->aux.surf.row_pitch_B ==
plane_res->surf.row_pitch_B' failed.
Also, replaces it with a new one in case this fails again:
ext_image_dma_buf_import-sample_yuv:
../../src/gallium/drivers/iris/iris_resource.c:1381:
iris_resource_from_handle:
Assertion `isl_drm_modifier_has_aux(whandle->modifier)' failed.
Fixes: 79222e5884 ("iris: Simplify get_main_plane_for_plane")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26826>
In iris, if whandle->modifier is DRM_FORMAT_MOD_INVALID within
iris_resource_from_handle, isl_drm_modifier_plane_is_clear_color will
assert fail on non-existent modifier info. Update that function to
return early instead.
Fixes the assert failure when running the piglit test case:
ext_image_dma_buf_import-sample_yuv -fmt=YVYU -auto
ext_image_dma_buf_import-sample_yuv: ../../src/intel/isl/isl.h:2352:
isl_drm_modifier_plane_is_clear_color: Assertion `mod_info' failed.
Fixes: 81d132d5ea ("iris: Use helpers for generic aux plane importing")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26826>
The mechanism for selecting dispatch modes has changed from previous
platforms, add a new implementation brw_wm_state_simd_width_for_ksp()
using the new kernel dispatch controls.
[ Francisco Jerez: Split from a larger patch, handle multipolygon
dispatch, add additional comments. ]
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This sets up the PS dispatch controls to a supported combination of
Kernel0/Kernel1 dispatch modes, initializing the polygon packing
controls to use a multipolygon dispatch mode if one was provided.
Rework:
* Jordan: Move into intel_update_ps_state()
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
Extend the pre-existing dual-SIMD8 compilation path in
brw_compile_fs() to attempt quad-SIMD8 and dual-SIMD16 compiles.
Instead of building every possible dispatch mode and then picking one
based on cycle-count heuristics, this attempts to only build a single
multipolygon kernel -- The different mulipolygon dispatch modes are
tried in the expected order of decreasing performance (quad-SIMD8,
dual-SIMD16 then dual-SIMD8), the first one that successfully compiles
without spills is taken as a simple heuristic, and no further
multipolygon builds are attempted.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This is needed because the information stored on the ATTR file for
multipolygon fragment shaders isn't stored as a contiguous sequence in
the GRF, instead the ATTR source may be lowered by assign_urb_setup()
to use a <16;8,0> region, which reads 4 SIMD16 GRFs for a SIMD32
instruction, even though the result of fs_inst::size_read() is
expected to be 2 GRFs. Special case ATTR sources for multipolygon PS
shaders to calculate the number of physical GRFs that will actually be
read by the instruction after lowering, based on the number of
polygons processed by the instruction.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This fixes a number of assumptions made by the multipolygon input
attribute handling code from assign_urb_setup() so it also works on
Xe2+, which has additional multipolygon dispatch modes (like SIMD4x8
and SIMD2x16) and uses a different more compact representation of the
plane parameters.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
The X and Y barycentric vectors are no longer interleaved in SIMD8
chunks (yay), so this is mostly a matter of disabling the
lower_barycentrics() pass and switching to a simpler implementation of
fetch_barycentric_reg() that simply calls fetch_payload_reg() instead
of the SIMD8 shuffling we had to do in previous generations.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
This extends fetch_payload_reg() to support fetching vector registers
like barycentrics stored on the payload as a contiguous sequence of
SIMD-wide vectors. In the SIMD32 case, both halves of the SIMD16
vector registers specified as regs[0] and regs[1] are zipped to
construct a single SIMD32-wide vector.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26606>
We only need it for indirect draws.
Improves performance on an i7-12700 and A770:
- Piglit's drawoverhead base case +150.639% +/- 2.86933% (n=15).
- gfxbench5 gl_driver2_off +19.7219% +/- 1.13778% (n=15)
- SPECviewperf2020 catiav5test1 +1.6831% +/- 0.552052% (n=10).
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26806>
Whenever we use a BO in a batch, we need to find its corresponding exec
list entry, either to a) record that it's been used, b) update whether
it's being written, c) check for cross-batch implicit dependencies.
bo->index exists to accelerate these lookups. If a BO is used multiple
times by a batch, bo->index is its location in the list. Because the
field is global, and a BO can in theory be used concurrently by multiple
contexts, we need to double-check whether it's still there. If not, we
fall back to a linear search of all BOs in the list, looking to see if
our index was simply wrong (but presumably right for another context).
However, there's one glaringly obvious case that we missed here. If
bo->index is -1, then it's wrong for /all/ contexts, and in fact implies
that said BO has never been added to any exec list, ever. This is quite
common in fact: a new BO, never been used before, say from the BO cache,
or streaming uploaders, gets used for the first time.
In this case we can simply conclude that it's not in the list and skip
the linear walk through all buffers referenced by the batch.
Improves performance on an i7-12700 and A770:
- SPECviewperf2020 catiav5test1: 72.9214% +/- 0.312735% (n=45)
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26806>
There is no need repeatedly downloading rust and d3d when building the docker locally
Download glext.h from github
Remove src directory and .git directory once compiling finished
Split piglit and depq compiling out
Clean middle files of piglit and depq
ci/msvc: Improve fetch source of spirv_samples_source
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26736>
packaging are installed because python 3.12 need it
Install of vulkan-sdk improved so that it's can be running in non-docker environment
Now vulkan-sdk have separate script so that it can be updated without update MSVC
THe choco installed packages is almost freeze to update, so split install of vulkan sdk
out of it for avoid update it when update VULKAN_SDK_VERSION on local computer
--params="/InstallDir:C:\python3" won't take effect, drop it for not misleading
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26736>
A few lines above, we disable pipelines for farm disables, but we were
missing a condition to run the container & build jobs when re-enabling
a farm, leading to invalid pipelines where test jobs for that farm are
created but the container & build jobs are missing.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26809>
These restrictions don't seem to be applicable anymore, and limiting
to SIMD8 wouldn't work since we're no longer building shaders with
that dispatch width.
[ Francisco: This one-liner change was squashed by Rohan Garg into a
previous version of my patch "Stop building SIMD8 programs", but it
makes more sense as a separate commit -- Formatted as a separate
patch. ]
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>
SIMD8 kernels are no longer able to utilize the ALUs efficiently,
since they have twice the vector width as previous platforms. However
even though there aren't many reasons to use it, SIMD8 is still
supported by the instruction set technically, and it will still be
used for some SIMD-lowering sequences.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26605>
Similar to other FS dispatch modes, attempt to build a dual-SIMD8
program if the regular SIMD8 program didn't spill and doubling the
amount of space for varyings doesn't cause us to go over the thread
payload limit. Dual-SIMD8 builds in combination with coarse pixel
shading are currently not handled.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
The copy would be discarded immediately. Until now we were relying on
DCE to eliminate these, but it seems like in some cases MOVs into the
null register emitted by lower_simd_width() are never eliminated,
likely because a lower_simd_width() call has been introduced close to
the bottom of optimize() which isn't follow by any additional DCE
passes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
This fixes a number of regressions and hangs in multipolygon fragment
shaders that have FIND_LIVE_CHANNEL sequences which would otherwise
lead to access of a dead channel. Note that the failures don't seem
to be reproducible in simulation.
Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Since the <8;8,0> regions they use in multipolygon mode could violate
regioning restrictions in some cases, depending on the execution type
of the instruction. Note that the assertion is removed from
try_copy_propagate() since a more accurate check is used within that
function than what fs_inst::can_change_types() can do.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Main motivation is that for multipolygon PS shaders the i-th plane
parameter for the j-th input attribute will no longer necessarily be a
scalar, since different channels may be processing different polygons
with different input plane parameters, so simply taking a component()
of the result of interp_reg() will no longer work. Instead of
duplicating the multipolygon handling logic in every caller of
interp_reg(), fold the component() call into interp_reg() so we can
replace it with multipolygon-correct code more easily.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Instead of treating fs_reg::nr as an offset for ATTR registers simply
consider different indices as denoting disjoint spaces that can never
be accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+. See
"intel/fs: Map all GS input attributes to ATTR register number 0." for
the rationale.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
The fs_reg::nr field currently has a somewhat inconsistent meaning for
ATTR registers depending on the shader stage. In geometry stages it
has a similar effect as fs_reg::offset except it's expressed in 32B
units instead of B units. In the PS however it's expressed in units
of logical scalar attributes (16B on present platforms), which isn't
currently handled correctly throughout the back-end since some places
assume 32B units in all cases.
The different format of the PS setup data in multi-polygon dispatch
modes would make its behavior even more irregular, which would be
worsened further (for both geometry and pixel stages) by the register
size changes coming up on Xe2, particularly in brw_ir_fs.h helpers
where neither the devinfo struct nor the shader stage are available.
Instead of treating it as an offset simply consider different
fs_reg::nr indices as denoting disjoint spaces that can never be
accessed simultaneously by a single region. From now on geometry
stages will just use ATTR #0 for everything and select specific
attributes via offset() with the native dispatch width of the program,
which should work on current platforms as well as on Xe2+.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Add fields that track the number of polygons processed per PS SIMD
thread (note that this might be lower than the value that was
specified to the compiler via brw_compile_fs_params if compilation at
the desired polygon count wasn't possible), and the dispatch width of
the multi-polygon PS kernel.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26585>
Add support to use layout metadata to communicate modifier between
exporter and importer. This is required because EXT_external_objects
does not have modifier support, so when importing a memobj we must use
a back-channel to communicate the modifier with the exporter.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25945>
Add the `FARM: google` variable to the .google-freedreno-test job,
which is extended by devices managed by the Google farm. Add the
`FARM: collabora` variable to sc7180-trogdor-kingoftown,
sc7180-trogdor-lazor-limozeen and sm8350-hdk device types which
are available in Collabora farm.
This commit also adds DEVICE_TYPE variable for the .a306-test,
.a530-test and .a630-test jobs.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25807>
Unmarshal calls that "look ahead" in the batch use it. They expect
the next call ID to be equal to a specific GL call. NUM_DISPATCH_CMD
is not equal to any GL call (it's last_call_id + 1).
This was missed in the referenced commit, causing assertion failures.
Fixes: c3b95d1507 - glthread: add a marker at the end of batches indicating the end
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26786>
vk_physical_device_check_device_features should ignore unsupported
feature structs:
Any component of the implementation (the loader, any enabled layers,
and drivers) must skip over, without processing (other than reading
the sType and pNext members) any extending structures in the chain not
defined by core versions or extensions supported by that component.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10177
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26767>
When there are lots of inlined shaders, going over each one and checking
if the call index matches becomes expensive. Instead, use a
binary-search-like selection to skip most of the checks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26380>
We can't query the actually required alignment from llvmpipe, but 16
bytes is insufficient. In particular, llvmpipe's pixel backend code for
depth/stencil will load/store 4 values at a time, which crashes
with d32s8x24 format if alignment is only 16 bytes (the color backend
does similar things but will use unaligned loads if alignment exceeds
16 bytes at least for now).
32 bytes would be enough, however the "ordinary" llvmpipe resource
layout code currently always aligns to at least 64 bytes, so use
this value as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26724>
The initial logic was to remember the place were SPI_SHADER_PGM_LO_*
are written, then assume that we can get the register offset because
the sequence would always be:
PKT3_SET_SH_REG
SPI_SHADER_PGM_LO_* register offset
VA low 32 bits value <- reg_va_low_idx
The problem is that this sequence isn't guaranteed, for instance we
can get this instead:
0 c0067600 |
1 00000046 |
2 003ffffd | SPI_SHADER_PGM_RSRC3_VS
3 00000020 | SPI_SHADER_LATE_ALLOC_VS
4 * 00002080 | SPI_SHADER_PGM_LO_VS
5 00000080 | SPI_SHADER_PGM_HI_VS
So the assert in si_state_draw.cpp would fail as well as the VA
update logic.
So instead remember which the SPI_SHADER_PGM_LO_* offset, and the low
32 bits of the VA in si_update_shaders.
Fixes: 8034a71430 ("radeonsi/sqtt: re-export shaders in a single bo")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26774>
Not expecting this to actually fix anything externally visible,
but reduces some invalid usage when the resulting vector is
not 16 elements long (e.g. the C/result matrix).
Fixes: 9df4703fbb ("radv: Add cooperative matrix lowering.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26768>
No hash updates as I didn't find a facility to do it in radeonsi
(even though there are flags like forcing fma32).
Note that we do this very late to avoid any optimizations that
might remove the dead stores. (Checked that LLVM doesn't remove
them, but it is admittedly potentially brittle)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26679>
Only shaders which explicitly allow shared memory are included for
now. The pass is very late to avoid optimizations removing the stores
and to ensure the clear gets added after MS outputs get loaded from LDS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26679>
There is nothing to be done, as we lower this op unconditionally.
We need to add an arm in the match to not panic, though.
Fixes a panic in
dEQP-VK.binding_model.shader_access.primary_cmd_buf.sampler_immutable.vertex_fragment.multiple_contiguous_descriptors.2d
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26587>
The main motivation for doing this is that some tests and even the
st tracker linking code dump out the GLSL IR for debugging before
glsl_to_nir() is called expecting it to already be in its final
form. Moving these to the linker makes those assumptions true.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26755>
There is no need to have a separate loop to determine the first stage in
the shader program. Previously there were other users of this but since
this is the last remain user this patch changes the code to simply detect
the first stage directly.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26747>
Allow aux state tracking buffer created with different offsets,
in order to support importing images with drm modifiers. We
will always need to calculate the size of an imported fast
clear region because Vulkan spec defines:
VUID-VkImageDrmFormatModifierExplicitCreateInfoEXT-size-02267
For each element of pPlaneLayouts, size must be 0
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25651>
When `MESA_LOADER_DRIVER_OVERRIDE` is set to `zink` and the display
initialization fails, fallback to software rendering.
The error was reported in #10123 and it can be reproduced with:
$ MESA_LOADER_DRIVER_OVERRIDE=zink eglinfo
`eglinfo` would crash in `dri2_display_release()` because of
`assert(dri2_dpy->ref_count > 0)`.
After bisecting the error to commit 8cd44b8843 ("egl/glx: add
autoloading for zink"), I found out that, before this change, the
display was set to initialized even when `_eglDriver.Initialize(disp)`
failed:
disp->Options.Zink = env && !strcmp(env, "zink");
// disp->Options.Zink is true
if (!_eglDriver.Initialize(disp)) {
[...]
// Zink initialization has failed at this point
// However, success is set to true:
bool success = disp->Options.Zink;
if (!disp->Options.Zink && !getenv("GALLIUM_DRIVER")) {
[...]
}
// Software initialization is ignored because success is true
if (!success) {
[...]
}
}
// The display is set as initialized even though it shouldn't
disp->Initialized = EGL_TRUE;
Resolves: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10123
Fixes: 8cd44b8843 ("egl/glx: add autoloading for zink")
Signed-off-by: José Expósito <jexposit@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26184>
In the past, multiple writes to a single register were pretty common,
but since we've transitioned to NIR, and leave the IR in SSA form for
everything not captured in a phi-web, the pattern of generating new
temporary registers at each step is a lot more common.
This pass isn't nearly as useful now. Across fossil-db on Alchemist,
this affects only 0.55% of shaders, which fall into two cases:
- Coarse pixel shading pixel-X/Y setup. There are a few cases where
we write a partial calculation into a register, then have a second
instruction read that as a source and overwrite it as a destination.
While we could use a temporary here, it doesn't actually help with
register pressure at all, since there's the same amount of values
live at both instructions regardless. So while this pass kicks in,
it doesn't do anything useful.
- Geometry shader control data bits (5 shaders total). We track masks
for handling EndPrimitive in a single register across the program,
and apparently in some cases can split the live range. However, it's
a single register...only in geometry shaders...which use EndPrimitive.
None of them appear to be in danger of spilling, either. So this tiny
benefit doesn't seem to justify the cost of running the pass.
So, just throw it out. It's not worth keeping.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26343>
Use the CustomLogger class and CLI tool to create strutured logs
for poe scripts which are used by broadcom and nouveau jobs.
Renamed stage lint to code-validation and added python-test job
which runs the tests for structured and customer logger to ci.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25179>
This commit introduces the CustomLogger class, which provides methods
for updating, creating, and managing dut jobs and phases in a structured
log in below json format.
{
"_timestamp": "2023-10-05T06:16:42.603921",
"dut_job_type": "rpi3",
"farm": "igalia",
"dut_jobs": [
{
"status": "pass",
"submitter_start_time": "2023-10-05T06:16:42.745862",
"dut_start_time": "2023-10-05T06:16:42.819964",
"dut_submit_time": "2023-10-05T06:16:45.866096",
"dut_end_time": "2023-10-05T06:24:13.533394",
"dut_name": "igalia-ci01-rpi3-1gb",
"dut_state": "finished",
"dut_job_phases": [
{
"name": "boot",
"start_time": "2023-10-05T06:16:45.865863",
"end_time": "2023-10-05T06:17:14.801002"
},
{
"name": "test",
"start_time": "2023-10-05T06:17:14.801009",
"end_time": "2023-10-05T06:24:13.610296"
}
],
"submitter_end_time": "2023-10-05T06:24:13.680729"
}
],
"job_combined_status": "pass",
"dut_attempt_counter": 1
}
This class uses the existing StructuredLogger module,
which provides a robust and flexible logging utility supporting multiple
formats. It also includes a command-line tool for updating, creating,
and modifying dut jobs and phases through command-line arguments.
This can be used in ci job scripts to update information in structured logs.
Unit tests also have been added for the new class.
Currently, only LAVA jobs create structured log files, and this will be
extended for other jobs using this tool.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25179>
On Windows, the C runtime maintains an environment variable cache for
getenv. But apps and drivers are free to statically link the C runtime,
so you might get different environment variable caches between components.
Specifically, a test trying to putenv to update the environment won't have
its update reflected by the driver if the CRT is statically linked, unless
we go to the Win32 API directly.
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26744>
These are the package versions currently shipped by Fedora. This allows
using system packages by setting
export MESON_PACKAGE_CACHE_DIR=/usr/share/cargo/registry/
Of course, other distros may place it somewhere else.
Ubuntu matches versions on syn and unicode-ident but is a tiny bit off
on quote and proc-macro2. However, given how far I was able to bump the
versions with only a tiny meson tweak to syn, I think it should work
with the Ubuntu versions as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26726>
They're enough of a special case that things are going to get confusing
when we start adding bit sizes to fmul/ffma. Let's make them a special
case so they can assert all their things.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26743>
The main change here is that we match on src2 first and then src1. This
lets make some of the src2 code common because src2 never moves around
if it's a register. This change also has another subtle effect: None
sources now work everywhere.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26743>
We need to be able to tell the difference between `F64` and other GPR
source types. In order for this to work, we also have to tighten up
some of the requirements round GPR and SSA sources.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26743>
For propagating OpDAdd, we need to check for negative zero because
negative zero is the no-op, not add with zero. We were also propagating
the upper and lower halves of fp64 sources wrong. While we're here, use
`let ... else` instead of an `if let` pattern a couple places.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26743>
This removes the last unknown flag from read/write instructions.
Because we now handle the write in CP_SET_DRAW_STATE more correctly when
emulating, we also have to update the control register definitions and
draw state emulation code to adjust.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26691>
It seems like starting with a6xx, the SQE has a special register space
for reading/writing the state of the processor itself, mainly used for
saving/restoring its state in preemption. Add support for disassembling
it, removing one of the unknown flags bits.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26691>
Turns out a5xx already had store, although not load. It was using the
high bit of the unknown flags for this.
Note that a6xx does use the high bit, and we fall back to not decoding
it at all here before properly decoding it in the next commit. Splitting
up the commits seems worth this small breakage.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26691>
ir3_nir_lower_preamble cannot handle 64b @load/store_preamble so we have
to make sure ir3_nir_opt_preamble will never produce them. Up to now,
nir_lower_locals_to_regs was run after preamble lowering so 64b locals
could still be around when lowering the preamble. This patch moves
running this pass, as well as ir3_nir_lower_64b_regs, to before the
preamble lowering.
Fixed Piglit tests:
- spec@arb_gpu_shader_fp64@execution@fs-indirect-temp-double-dst
- spec@arb_gpu_shader_fp64@execution@built-in-functions@fs-frexp-dvec4-variable-index
This patch has no impact on shader-db.
Note: a few cleanup passes used to be run after nir_lower_locals_to_regs
(nir_opt_algebraic, nir_opt_constant_folding) and after
ir3_nir_lower_64b_regs (nir_lower_alu_to_scalar, nir_copy_prop). As far
as I can tell, these are not necessary anymore when running the register
lowering earlier so this patch removes them.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26737>
Doing it at bind-time causes a 1.4% overhead (among all driver calls) in
Overwatch 2. !24502 mentions that it can be precomputed in case overhead
is a concern, so do it here.
max_waves is stored directly in the radv_shader struct, because
ac_shader_config conforms to LLVM ABI and we cannot add anything custom,
and radv_shader_info needs to be determined from NIR only.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26692>
Chain stored modifiers point to the mapping of the current feedback
shmem of the surface. The surface tracked feedback mapping will be gone
and replaced with new mapping during surface_dmabuf_feedback_done. There
are two issues here:
1. One issue is that the existing mapping is closed before been used to
compare against new modifiers in sets_of_modifiers_are_the_same.
2. The other issue is that when the chain is still optimal, the chain
persists while the mapping is still replaced with the one from the
new format table shmem.
This change makes a deep copy of the modifiers to store in the chain to
ensure the modifiers used for the current chain are immutable through
the chain lifecycle.
Cc: mesa-stable
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Leandro Ribeiro <leandro.ribeiro@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26618>
This commit adds etnaviv.xml which describes the used ISA. Quite some
time was spend to to get it into that shape by creating a big collection
of shader asm generated by the binary blob. These shaders are used as basis
for test driven development for this ISA specification.
The xml has some black spots but the internal disasm is used by
developers so that should be no problem. Some refinement will happen
during normal development.
This commit adds only disasm support.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20144>
The tessellation domain origin, type, prims and spacing are all pushed
together in SET_TESSELLATION_PARAMETERS. So to support domain origin as
dynamic we need to push all these together when the state is dynamically
changed or when a new tessellation shader is bound.
This is also needed for EXT_shader_object.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24872>
Some instructions have a minimum latency before another instruction can
execute. It's a little unclear exactly what the details on these are.
For things like OpBar it's probably something to do with when stuff
actually convergres. For MemBar, maybe we only need to wait before we
do something that also touches memory? Unclear and the few docs seem to
imply that it's a straight-up stall. For now, we model it as an
execution latency where nothing is allowed to happen until then.
The blob also inserts a NOP with a delay of 2 in these cases. It's not
entirely clear why but it's probably best we do the same. Theses
instructions tend to be pretty heavy-weight anyway so 2 cycles isn't
really going to cost much compared to the chances that we're missing
some subtle HW issues somewhere.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26716>
The clear color BO is sometimes getting unnecessarily zeroed. For
example, if the resource will be fast-cleared, the app may pick a
non-zero color, causing the initial memset to be unneeded. So, skip the
memset and mark the clear color as unknown if it has not been freshly
allocated. For now, we leave the memsets on imported dmabufs alone for
simplicity.
On non-small BAR ACM systems, this also allows internal resources using
compression to be created without executing an IOCTL for memory mapping.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26675>
On small-BAR ACM systems, we don't initialize the clear color at
resource creation time. This could cause an issue with FCV_CCS_E.
Because the rendering and sampling fields may not be in agreement, FCV
may generate fast-cleared blocks during rendering and the sampler may
interpret them with the wrong values.
Thankfully, the register bit which enables FCV on ACM is disabled by
default, so there is not an actually issue today. Regardless, this patch
implements the fix for this issue now. The next patch will actually skip
explicit clear color initialization at resource creation time. So, this
fix will be useful for other systems that do have FCV enabled today.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26675>
Fresh suballocations from fresh allocations have already been zeroed by
the kernel. So, make BO_ALLOC_ZEROED a no-op for them.
This introduces a new field, iris_bo::zeroed, which will be reused for
similar optimizations in the future.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26675>
Sync xe_drm.h with commit a8ff56e160bb ("drm/xe/uapi: Remove reset uevent for now").
This is the last xe_drm.h uAPI break.
The only relevant change for ANV and Iris is that now VM bind uAPI
is asynchronous only so I had to bring back the syncobj creation, wait
and destruction.
Is still in the Xe port TODO list to make VM binds truly asynchronous.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26699>
After all int64/double lowerings, there might still be 64b registers
left which ir3 currently doesn't handle. This only happens in a small
number of Piglit tests where those registers (or the variables they come
from) did not get DCE'd.
This patch handles 64b registers in ir3 by adding a NIR pass that does
the following:
- @decl_reg -> split in two 32b ones
- @store_reg -> unpack_64_2x32_split_x/y and two separate stores
- @load_reg -> two separate loads and pack_64_2x32_split
After this pass, the 64b vecs used for the original loads/stores are
still present and are also not handled yet by ir3. This patch removes
them by running nir_lower_alu_to_scalar and nir_copy_prop.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26175>
We only support the variablePointersStorageBuffer feature of variable pointers,
which basically ensures that pointers may only target one buffer. This means
that a particular pointer may change where it points within a given buffer but
it cannot change its value to point to some other buffer. This is a requirement
from us since we expect buffer indices on buffer loads and stores to be
constant, so we can't have a buffer load come through a pointer that may
be assigned to different buffers, since in that case the buffer index
would need to come from bcsel.
There is, however, a small complication: the spec still allows pointers to
be null, and NIR defines null pointers to use 0xffffffff for both the buffer
index and the offset, which will cause a problem in a scenario like this:
int *b = ...
if (cond) {
b = null;
discard;
}
ubo_load(b);
Here the buffer index for the ubo load may come from a bcsel choosing between
the null pointer (0xffffffff) and the valid address (let's say 0), so we don't
have a constant and we assert fail.
This change detects this scenario and upon finding it will rewrite the buffer
index on the null pointer branch of the bcsel to match that of the valid
branch so that later optimizations passes can remove the bcsel and we end up
with a constant index. This is fine because a null pointer dereference is
undefined behavior and it is not something we should see in valid applications.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>
Otherwise we may emit a load from an invalid offset from
a lane that was discarded.
This fixes an simulator assert from triggering when
executing:
dEQP-VK.spirv_assembly.instruction.terminate_invocation.terminate.no_null_pointer_load
That test emits a conditional kill and then a buffer load
which would have invalid offsets for the lines killed. Since
the buffer load is in uniform control flow we were incorrectly
emitting a full quad load, including disabled lanes which would
prompt the simulator to assert on invalid offsets being loaded
coming from the lanes that had been killed in the shader.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>
ROI (region of interest) feature implementation
in va.
It does not support ROI priority, and supports
qp delta, and the maximum number of supported
region is defined as 32, the region sequence implies
the priority, the lower the sequence number, the higher
the region priority, when region overlapping happened,
the higher priority region overwrites the lower
priority one.
And specifically for AV1, the adjust step will be
rounded by 5 when rate control is used, for example,
if qp_delta (q index) is 6, it will use 5, if
qp_delta is 8, it will use 10.
For AVC/HEVC (RC/CQP) and AV1 CQP mode, the
qp_delta granularity is 1.
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26659>
Just a few implementation details that are worth mentioning:
- Panfrost doesn't support explicit VM management. This means
panfrost_kmod_vm_create() must always be created with
PAN_KMOD_VM_FLAG_AUTO_VA and panfrost_kmod_vm_bind(op=map) must
always be passed PAN_KMOD_VM_MAP_AUTO_VA. The actual VA is assigned
at BO creation time, and returned through
drm_panfrost_create_bo.offset, or can be queried through
DRM_IOCTL_PANFROST_GET_BO_OFFSET for imported BOs.
- Evictability is hooked up to DRM_IOCTL_PANFROST_MADVISE.
- BO wait is natively supported through DRM_IOCTL_PANFROST_WAIT_BO.
The rest is just a straightforward translation between the kmod API and
the existing panfrost ioctls.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
We have generic BO management and device management layers that
directly call kernel driver-specific ioctls. The introduction of
Panthor (the new kernel driver supporting CSF hardware) forces us to
abstract some low-level operations. This could be done directly in
pan_{bo,device,props}.{c,h}, but having the abstraction clearly defined
and separated from the rest of the code makes for a cleaner
implementation.
This is also a good way to get a low-level KMD abstraction that
we can use without pulling all sort of gallium-related details in,
which will be important for some refactoring we plan to do in panvk.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
These should probably go on the instance but everything is tangled up
too badly right now. This at least moves them to some place where we
have them without a nouveau_ws_device. It's fine to do this because
debug flags are an environment variable and won't change across a run.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26615>
This breaks the dependency pass in two. The first pass builds a
dependency graph, including first-wait information for each barrier.
The second applies uses the newly constructed dependencies to place
barriers. This fixes at least two known bugs:
1. We were placing redundant write barriers. In the case where we did
a load, for example, we would add read barriers for the address and
write barriers for the result. In the fairly common case where the
result is used before someone tries to overwrite the address, we
don't actually need both barriers because a wait on the result
implies a wait on the sources.
2. There were a bunch of WaR cases which weren't being handled
correctly. In particular, when a variable-latency instruction read
a register and then a fixed fixed-latency instruction read it, the
fixed-latency read would replace the variable latency read. When we
then wrote that value with a fixed-latency instruction, we wouldn't
see the hazard. This commit fixes it by replacing the single last
use per reg with a Vec of uses in the case of reads.
This fixes all known 1.1 memory model fails.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26615>
Any kind of image copy with a conversion (between channel
size/order/content, or between tiling mode) seems liable to failure.
Since this seems like a general problem, just skip the entire battery of
tests until it can be systematically fixed.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26449>
We don't use these registers and since RDNA3 removed the explicit usage,
it is unlikely that we will properly support them in the future.
Removing the registers from the ACO IR prevents accidentally using them
without proper support.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26664>
Currently, the function handle_reset_query_cpu_job() starts to iterate
between the performance queries in the zero-index. This is not correct,
as we should start iterating the performance queries at first, which
is a index indicated by info->first.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
The copy timestamp query user extension allows the creation of a CPU job
that copies the results of a timestamp query to a BO with the possibility
to indicate the timestamp availability with a availability bit.
By using the copy timestamp query user extension, it will be possible to
use the multisync user extension to synchronize this type of job, which
currently possible with the user space implementation without stalling.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
Extend the UAPI to support the copy timestamp results user extension for the
CPU job. This user extension will allow the creation of a CPU job that
copies the results of a timestamp query to a BO with the possibility to
indicate the timestamp availability with a availability bit.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
The reset timestamp user extension allows the creation of a CPU job that
resets a timestamp query by updating its value in the timestamp BO and
resetting the availability syncobj.
Using the reset timestamp user extension, it will be possible to use the
multisync user extension to synchronize this type of job, which is not
currently possible with the user space implementation without stalling.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
Extend the UAPI to support the reset timestamp user extension for the
CPU job. This user extension will allow the creation of a CPU job that
resets a timestamp query by updating the timestamp BO and reseting the
timestamp's availability syncobj.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
The timestamp query user extension allows the creation of a CPU job that
calculates the timestamp by updating its values in a BO with the appropriate
offsets and signalling the availability syncobjs.
The CPU job should be serialized so it only executes after all previously
submitted work has completed. This is accomplished by setting job->serialize
to V3DV_BARRIER_ALL before setting the multi-sync extension.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
Extend the UAPI to support the timestamp query user extension for the
CPU job. This user extension will allow the creation of a CPU job that
calculates the query timestamp by updating a timestamp BO with its value
and signaling the availability syncobj.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
A CPU job of type V3DV_JOB_TYPE_CPU_RESET_QUERIES is only created for
performance and timestamp queries. Occlusion queries are handled with a
compute job. Therefore, there is no need to handle occlusion queries in
the CPU job.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
The indirect CSD user extension allows the creation of a CSD
job linked to a CPU job. When we submit the CPU job, the CPU job
will run when the indirect CSD dependency is completed, map the
indirect buffer to read the CSD dispatch parameters and reconfigure
the CSD job accordingly.
Using the indirect CSD user extension, allows us to use the multisync
user extension to synchronize this type of job, which is not currently
possible from user-space without stalling.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
We will be introducing a new type of queue, a CPU queue. This queue will
be responsible for handling the CPU jobs, such as timestamp queries and
indirect CSD dispatch.
Therefore, add a CPU queue to the enum v3dv_queue_type and its
respective barrier.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
Extend the UAPI to support CPU job in the kernel space using an user
extension design and also add support for the indirect CSD job
extension. This user extension will allow the creation of a CSD
job linked to a CPU job. The CPU job will wait for the indirect CSD job
dependencies and, once they are signaled, it will update the CSD job
parameters.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
Currently, set_multisync() doesn't allow using external syncobjs as
in_syncs objects in the multisync extension. Add the possibility to use
external syncobjs as in_syncs. This will ease the synchronization of CPU
jobs, as they sometimes depend on external syncobjs, such as
query availability syncobjs.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
With the support of CPU jobs by the kernelspace, now the CPU job
functions will also use the multisync extension. Therefore, move the
multisync functions to the beginning of the file to allow the CPU job
functions to call them.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26448>
With dynamic rendering, it's allowed to begin rendering with depth or
stencil only but still with a depth/stencil format. The test below
checks that unbound part of ds isn't modified, if depth is bound and
stencil not and vice versa.
This fixes a recent CTS
dEQP-VK.dynamic_rendering.primary_cmd_buff.basic.partial_binding_depth_stencil.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25350>
Reading the local root config file and then asking gitlab to evaluate it
in the context of some other version will cause issues if they are not
identical.
Instead, the local document should be a simple include of whatever is
the root config file at that commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26074>
Got a error state on DG2 with a jump to secondary. The secondary is
empty and padded with MI_NOOPs to workaround the CS prefetching.
According to the error state, the return jump address from the
secondary to the primary is 0x0. The ACTHD register value is 0x10, so
it seems that the command streamer indeed jumped to 0x0 and hanged on
a few dwords after that.
The return address should have been set edited by a previous
MI_STORE_DATA_IMM instruction. So it appears it did not complete in
time for the command stream to catch it. On Gfx12+ this can happend if
we do not set ForceWriteCompletionCheck.
This change also takes the opportunity to remove the padding MI_NOOPs
at the end of secondaries on Gfx12+ by using disabling the prefetching
just before jumping into secondaries and reenabling it at the
beginning of each secondary.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26665>
Allocate a ralloc sub-context which takes the u64 hash table as a parent
and attach a destructor to it so we can free the hash_key_u64 objects
that were allocated by _mesa_hash_table_u64_insert().
The order of creation of this sub-context is crucial: it needs to happen
after the _mesa_hash_table_create() call to guarantee that the
destructor is called before ht->table and its children are freed,
otherwise the _mesa_hash_table_u64_clear() call in the destructor leads
to a use-after-free situation.
Fixes: ff494361be ("util: rzalloc and free hash_table_u64")
Cc: stable
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26423>
When an entry exists, _mesa_hash_table_insert() updates the entry with
the new data/key pair, which causes a leak if the key has previously
been dynamically allocated, like is the case for hash_u64_key keys.
One solution to solve that is to do the insertion in two steps: first
_mesa_hash_table_search_pre_hashed(), and if the entry doesn't exist
_mesa_hash_table_insert_pre_hashed(). But approach forces us to do the
double-hashing twice.
Another approach is to extract the logic in hash_table_insert() that's
responsible for the searching and entry allocation into a separate helper
called hash_table_get_entry(), and keep the entry::{key,data} assignment
in hash_table_insert().
This way we can re-use hash_table_get_entry() from
_mesa_hash_table_u64_insert(), and lake sure we free the allocated
key if the entry was already present.
Fixes: 6649b840c3 ("mesa/util: add a hash table wrapper which support 64-bit keys")
Cc: stable
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26423>
add support to check jpeg crop decode cap and to set the crop
rectangle. the interface is avialble on libva 1.21.0 and higher.
v2: (Ruijing)
enclose the entire case block within VA_CHECK_VERSION
if attr unsupported set the return value to VA_ATTRIB_NOT_SUPPORTED
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26588>
A BO can be always resident by two ways:
1. Through kernel bookkeeping. The BO is created with
AMDGPU_GEM_CREATE_VM_ALWAYS_VALID and bo->is_local gets set to true.
2. Through the driver global BO list. On every submission, the global
BO list is added to the CS's BO list.
Until now, use_global_list reflected either 1. or 2. . This commit
changes it to reflect 2. only, and update callsites that checks for
residency to use a new helper.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26591>
meshShaderQueries has been recently disabled because it causes random
GPU hangs in CI, I'm still investigating it. But let's clean the CI
lists to avoid any confusion, I will re-introduce them if needed but
this issue can also be reproduced without mesh shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26651>
A write mask based on the pipeline creation input is stored in scratch. Another
similar mask is also stored for the dynamic color_write_enable. These can then
be updated individually, and will be combined in MME macro before use.
Each attachment has a mask for rgba. The max number of attachments in 8 so
we can fit the write mask in a single 32bit scratch.
color_write_enable is a single bit per attachment. To make it easier to combine
in with the write mask it is stored in scratch with a separate rgba bits.
The layout of the both scratch values are:
Attachment index 88887777666655554444333322221111
Component abgrabgrabgrabgrabgrabgrabgrabgr
dEQP-VK.pipeline.monolithic.color_write_enable.*
Test run totals:
Passed: 576/576 (100.0%)
Failed: 0/576 (0.0%)
Not supported: 0/576 (0.0%)
Warnings: 0/576 (0.0%)
Waived: 0/576 (0.0%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26211>
Since dbbf566588 ("aco,ac/llvm,radeonsi: lower f2f16 to f2f16_rtz in nir")
radeonsi behavior changed and some of the core fp16 ops broke as a result.
We should explicitly specify the rounding mode until we add an gallium API
for drivers to advertize what they prefer.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26655>
Crysis 2 and 3 Remastered's RT shaders non-uniformly index into SSBO
descriptor arrays without specifying the NonUniformEXT qualifier on the
relevant access chains/load ops. This leads to artifacts around objects.
To add insult to injury, the game fails to provide a meaningful
applicationName/engineName in the Vulkan part of the DX11-Vulkan interop
solution used for RT. Both of these fields are set to "nvpro-sample"
(perhaps the code has been copied from NVIDIA's sample applications).
Therefore, fall back to executable name matching.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9883
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26391>
At this point this is more a header dependency due to inline functions,
so shuffle them around. The end goal is to allow fs_builder have a
reference to a fs_visitor (really a fs_shader).
Note the header is still included, a later patch will move the includes
to the call-sites.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Use the "current bld" in nir_to_brw_state more widely, and also replace it
with an annotated version when applicable (to associate it with a NIR
instruction being lowered). After filling a block we reset it back to
the original value.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
Create a nir_to_brw_state struct that is valid only during the
NIR to backend translation and use it for nir_ssa_values array.
This removes some NIR specific handling out of the fs_visitor -- nowadays
effectively an fs_shader.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26323>
I noticed that my NVK build was always LTOing the library twice
(I managed to trace it to the vk_synchronization_helpers change)
This change fixes the double compilation/LTO issue (which should
definitely cut packaging times a bit) 🐸
Fixes: fe12c1c29e ("vulkan: Add some auto-generated synchronization helpers")
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26630>
We use SETMSF to implement discard, so we need to ensure that any
TMU writes after a SETMSF don't actually execute. We emit a TMU flush
before a discard but we also need to ensure that the QPU scheduler
honors this.
Fixes some tests in dEQP-VK.spirv_assembly.instruction.terminate_invocation.*
when we expose the extension that would otherwise fail because the
QPU scheduler would incorrectly move some image writes emitted after a SETMSF
before the SETMSF instruction.
Also fixes spec@arb_shader_atomic_counters@fragment-discard
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26631>
Once decl_reg is handled, src[0].ssa->divergent will be properly set, so
load_reg and load_reg_indirect do not need special treatment.
shader-db can run to completion on HSW, IVB, and SNB now. No other
testing was done.
v2: Refactor nir_intrinsic_load_reg and nir_intrinsic_load_reg_indirect
handling. Suggested by Daniel Schürmann.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: 4fd257d20f ("nir: Properly handle divergence for load_reg")
Fixes: 6dbb5f1e07 ("intel/fs: rerun divergence analysis prior to convert_from_ssa")
Closes: #10233
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26436>
These are hard limits and don't depend on wave size.
Accordingly, also update the usage in order to avoid
reporting unreasonable occupancy.
Totals from 192 (0.24% of 79330) affected shaders:
MaxWaves: 5814 -> 3072 (-47.16%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26521>
vkd3d-proton requires this memory bit in various cases (adding
it to the GTT memory type significantly reduces failures in the
vkd3d-proton test suite) 🐸
I'm not sure if Tegra supports this bit though so I'm not adding it
to the VRAM-less path (hopefully someone can provide an actual answer)
The next step would be adding HOST_VISIBLE bit to the VRAM type (this
will likely require more work and maybe even some KMD changes) which
would make gamescope work (Faith said that DXVK benefits from it too)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26621>
Currently, we have:
- cso_context_base is the base class, but cso_context is passed to functions
- cso_context is the derived class
Change:
- cso_context to become the base class, and is passed to functions
- cso_context_priv to become the derived class.
mesa/main will need to access the base class directly.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26619>
We want to be able to use cbufs for UBOs and descriptor buffers going
forward. This also cleans up alignments all over the code-base where
just kinda did whatever seemed like a good idea at the time. The result
is a lot more flexible and accurate.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26617>
fixes ./arb_shader_image_load_store-early-z. experimentally, an opaque pass type
works too but better match what the blob does.
also, I now have proof that sample_mask triggers occlusion query updates because
if you run it multiple times, you get >1 hits per fragment in a counting query
:p
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
In the wise words of Mike Blumenkrantz, "I hate gl_PointSize and so can you".
The mesa/st lowering won't mesh well with vertex shader epilogues, and it falls
over in various circumstances. I am too tired to go against the grain, so let's
just pretend to be a normal gallium driver and trust in the rasterizer CSO,
lowering point size internally. This properly handles transform feedback without
any hacks, both GL and GLES behaviours, etc.
Fixes:
KHR-GL31.transform_feedback.capture_vertex_separate_test
gl-2.0-large-point-fs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
Compressed textures have two additional quirks that affect the tiling
code (but not the mip offsets): they get extra stride padding in some
cases for the large miptree, and the tile size is based on the POT size
and not the real size for the small miptree.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
For compressed textures, mip levels > 0 can have additional stride
padding. This (in some cases) affects the tile stride calculation, so it
cannot be implicitly represented with the existing members.
Add an explicit array containing the stride, in elements, of each
miptree level. The tiling code uses this instead of the minified and
element-aligned width when computing tile addressing.
This commit should be a functional no-op.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
For compressed textures, the POT miptree starting size is calculated
backwards (POT then minify instead of minify then POT).
In addition, the existing POT miptree start level code does not work for
compressed textures. Due to the extra block alignment requirement each
step of the way, we can no longer get away with nice log-based O(1)
math. Switch to a loop. This should be equivalent for uncompressed
textures, but yields different results with compression (element size >
1x1).
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
This is the easy part, passes the piglits.
---
N.b.: this also includes a bug fix for ARB_base_instance that would be
nontrivial to extract out, so I'm backporting the whole feature for release. How
terrible, more features :-P
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
fixes arb_shader_image_load_store-invalid case imageLoad/address bounds test/imageCube/rgba32f
this is also better codegen since it avoids the wacko division by 6. although it
creates a div by 6 in imageSize, that's better because that one is much more
likely to hoist to the preamble. probably should've done this from the start.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
these cause robustness problems -- since the target type might not match the
shader for invalid apps -- and are a dubious microoptimization. can revisit
later. for now, fixes imageAtomic*/target mismatch test.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26614>
Most of these registers don't change, so we should not set them when they
don't. This reworks the rasterizer state to use a custom emit function and
eliminate redundant register changes. This required merging the poly_offset
state into the rasterizer state and change how the poly offset state is
updated.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26307>
Don't set do_update_shaders every time current_rast_prim changes, which can
be EVERY DRAW. Instead, just update the shader key bits and set
do_update_shaders only if any bits are different.
When we bind a new rasterizer state, do the same.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26307>
Because gcc was complaining:
../../src/intel/tools/intel_hang_viewer.cpp: In function ‘void display_hang_stats()’:
../../src/intel/tools/intel_hang_viewer.cpp:365:31: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘std::vector<hang_bo>::size_type’ {aka ‘unsigned int’} [-Werror=format=]
365 | ImGui::Text("BOs: %lu", context.bos.size());
| ~~^ ~~~~~~~~~~~~~~~~~~
| | |
| long unsigned int std::vector<hang_bo>::size_type {aka unsigned int}
| %u
../../src/intel/tools/intel_hang_viewer.cpp:366:31: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘std::vector<hang_exec>::size_type’ {aka ‘unsigned int’} [-Werror=format=]
366 | ImGui::Text("Execs %lu", context.execs.size());
| ~~^ ~~~~~~~~~~~~~~~~~~~~
| | |
| long unsigned int std::vector<hang_exec>::size_type {aka unsigned int}
| %u
../../src/intel/tools/intel_hang_viewer.cpp:367:31: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 2 has type ‘std::vector<hang_map>::size_type’ {aka ‘unsigned int’} [-Werror=format=]
367 | ImGui::Text("Maps: %lu", context.maps.size());
| ~~^ ~~~~~~~~~~~~~~~~~~~
| | |
| long unsigned int std::vector<hang_map>::size_type {aka unsigned int}
| %u
cc1plus: some warnings being treated as errors
I'm not sure if STL's size_type is defined by the spec to be anything
specific, but for the platforms we care about it seems to be size_t,
so change it to %z.
Fixes: 33fd93f3b1 ("intel/tools: hang viewer/editor")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26581>
The Blitter engine lacks support for 3 components color format so we can
just fallback to RCS companion command buffer for the blit operation.
Even though blitter supports 96-bit support it only supports linear
tiling. We can support other types of tiling by falling back to the RCS
companion command buffer.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26300>
There was a subtle bug related to CFG tracking. Namely, some branch
instructions may point *only* to the block after the DO instruction
for the loop. If the MOV instructions are in the DO block, the may not
have liveness properly tracked.
Like in !25132, having the MOV instructions in blocks that might
contain other instructions helps scheduling.
shader-db:
All Broadwell and newer Intel GPUs had similar results (Ice Lake shown)
total cycles in shared programs: 848577248 -> 848557268 (<.01%)
cycles in affected programs: 78256396 -> 78236416 (-0.03%)
helped: 361 / HURT: 18
fossil-db:
All Skylake and newer Intel GPUs had similar results (Ice Lake shown)
Totals:
Cycles: 15021501924 -> 15021372904 (-0.00%); split: -0.00%, +0.00%
Totals from 735 (0.11% of 656080) affected shaders:
Cycles: 676429502 -> 676300482 (-0.02%); split: -0.02%, +0.00%
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26439>
radv_init_metadata hits several assert failures when the image is
multi-planar. Make sure we use plane 0.
This change should make no difference in practice. Also, this is done
only to follow radeonsi. Since the opaque metadata is mainly for
validations and DCC, and we don't enable DCC for multi-planar images, we
probably don't need to call radv_query_opaque_metadata at all.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25964>
Do not report DCC modifiers for multi-planar formats. We don't support
DCC for them and drmFormatModifierPlaneCount had incorrect values.
Fix vkGetImageSubresourceLayout for multi-planar images with modifiers.
In that case, memory planes and format planes are equivalent.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25964>
This commit nak: implement SHL and SHR on SM50 caused a regression on
KHR-GL45.gpu_shader_fp64.* using zink.
This fixes the regression, by setting the wrap fields.
Fixes: 00be041ffc ("nak: implement SHL and SHR on SM50")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26586>
Summary:
- Add a perf option to force primary ring submission
- Let device own secondary ring(s) for ad-hoc spawn
- For threads where swapchain and command pool are created, track with
TLS to instruct ring dispatch.
- If the pipeline creation or cache retrieval happens on the background
threads not on the hot paths, force synchronous and dispatch to the
secondary ring after waiting for primary ring becoming current.
- If the pipeline creation or cache retrieval happens on the hot paths
threads, dispatch to the primary ring to avoid being blocked by those
tasks on the secondary ring.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
At first, no behavior change in this CL.
The instance level helper for normal command submission is left to work
with the current venus protocol. Meanwhile, we leave the helper to
submit recorded command buffer inside instance to it can later redirect
to the primary ring.
We've internalized a few ring helpers that no longer need to be exposed.
Besides, indirect submission decision is on per-ring basis since the
ring buffer can vary later.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
This change only moves the fields without changing the accessors. It's
better to let ring own its own upload cs encoder (which is backed by
shmem array) to avoid lock contention between indirect submissions
across rings.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
Now we are able to break up the original lock to allow shmem alloc to be
outside the ring mutex, as long as the reply shmem set is still coupled
with ring submission.
Add and expose vn_instance_reply_shmem_alloc helper which will be used
by rings separately later.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
This can be thread-safe only because we have dropped seeking command
stream offset, which requires comparing pool shmem to decide conditional
set stream.
This is to prepare for later sharing reply shmem pool across rings.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
More considerations and details here:
- The seek is a bit lighter than set, since it assumes renderer side
resource being immutable. It does affect perf when Venus is still
making verbose synchronous calls at runtime (e.g. descriptor set,
buffer, device memory, etc).
- Seek still requires lock protection as the reply shmem must be
immutable before the seek and the followed cmd are committed to the
ring.
- Removing seek without doing set requires renderer change to always
bump the encoder end position according to what the original request
is instead of being ad-hoc upon what the host driver tells to write.
The overhead and extra complexity there isn't negligible.
- Further, removing seek requires each ring to track the prior reply
pool shmem in the multi-ring scenario. While the additional host side
resource lookup isn't costy as the number of resources is must less
than the vk object table.
- The nice thing is that we can make shmem pool thead safe to be more
easily shared across rings.
So we just drop it.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
The ring wake up is no longer costy as the other notifies followed by
the initial call won't be blocked by ring cmd execution anymore
(without vkr side big context lock). Reducing the timeout can help cpu
bound scenarios.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
When calculating the system memory heap size, we report only 3/4 of
the total RAM size (or 1/2 for systems with less than 4GB of RAM).
In the memory budget extension query, we were reporting 90% of the
available system memory. If most of the memory in the system is free,
this could result in the total heap size being 3/4 of RAM, but the
memory available being 9/10 of RAM. But if the application tried to
allocate the memory reported as "available", it would exceed the heap
size. This can confuse some applications.
This patch makes the memory budget query clamp the available RAM to
the heap size, so it will never report more available than the heap
can provide. Unfortunately, this means that we'll report only 67.5%
of system memory as available (3/4 * 9/10). We may want to adjust
this estimate in the future.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26553>
This was mainly useful for older Gen7.x GPUs with 32-bit PPGTT, which
are now supported by hasvk rather than anv. The remaining platforms
which anv supports have 36, 47, or 48-bit PPGTT, which imposes a 3/4
limit of 48GB, 96TB, and 192TB of memory.
The GPUs with 36-bit PPGTT are Elkhart Lake and Jasper Lake, which
appear to be Atom CPUs that have a maximum supported memory
configuration of 32GB or less, so this limit should not matter there.
Nor is a multi-TB limit likely to matter on our other parts.
Drop this check to simplify the heap and memory budget calculations.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26553>
The code treated 0x00 and 0xff the same, but they aren't,
port over the codegen code.
Fixes GTF-GL45.gtf40.GL3Tests.transform_feedback3.transform_feedback3_skip_components
with zink on nvk
v2: drop padding to 0, tests still pass.
Fixes: 30f01c47c2 ("nak: Translate XFB info")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26559>
The SET_STREAM_OUT_BUFFER_LOAD_WRITE_POINTER registers have an
8 dword stride, but the code is only adding one dword between them
then the MME is calling an illegal method.
This is the simple fix, otherwise I think we'd have to multiply
somehow in the MME which seems pointless.
Fixes KHR-GL45.transform_feedback* on zink on nvk.
Fixes: 5fd7df4aa2 ("nvk: Support for vertex shader transform feedback")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26558>
This isn't supposed to work nor does it match radeonsi but setting
AUTO_RESET_CNTL=0 by default for GFX9 chips is what gets it passing
linestrip CTS tests:
dEQP-VK.rasterization.primitives.dynamic_stipple.bresenham_line_strip
dEQP-VK.rasterization.primitives.dynamic_stipple_and_topology.bresenham_line_strip
dEQP-VK.rasterization.primitives.dynamic_stipple_and_topology.bresenham_line_strip_wide
dEQP-VK.rasterization.primitives.static_stipple.bresenham_line_strip
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24623>
This is for fixes the following error:
FAILED: src/vulkan/runtime/vk_synchronization_helpers.c src/vulkan/runtime/vk_synchronization_helpers.h
"C:\CI-Tools\msys64\mingw64\bin/python3.EXE" "../../src/vulkan/util/vk_synchronization_helpers_gen.py" "--xml" "../../src/vulkan/registry/vk.xml" "--out-c" "src/vulkan/runtime/vk_synchronization_helpers.c" "--beta" "false"
Traceback (most recent call last):
File "C:/work/xemu/mesa/src/vulkan/util/vk_synchronization_helpers_gen.py", line 213, in main
f.write(TEMPLATE_C.render(**environment))
UnicodeEncodeError: 'gbk' codec can't encode character '\xa9' in position 15: illegal multibyte sequence
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26515>
For working around improper usage of sparse in DOOM Eternal.
When fully explicit sync sparse binding is implemented, this path will
remain implicit sync to also deal with the improper semaphore usage.
radv_queue_submit_bind_sparse_memory will likely get a bool parameter to
control explicit / implicit sync in that case.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26464>
IMAD64 does not exist on SM50, so we're using IMUL instead for
nir_op_{i,u}mul_high and nir_op{i,u}mul_2x32_64. Longer-term we may want
to replace this with XMAD for better perf.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26114>
SHF.{L,R} is supported, but it seems to always write 0 to dst when the
shift value is a register. The only case in nak_from_nir that actually
uses the 64-bit shift is nir_op_isign, which has an immediate shift
value.
This also avoids the SHF.I32 issue, since the only usage is now SHF.I64.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26114>
For now, we're just using this in place of IAdd3x for 64-bit adds. IADD3
with carry flags is supported on SM50, but it works completely
differently from SM75. Longer-term we'll probably want to emit this in
all of the places that we're currently using IADD3.
Also need to hook the carry register up to calc_deps, but for now I'm
just using NAK_DEBUG=serial.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26114>
Rename it to set_src_i20 and fix the assert to allow negative signed
values. Also, add a new set_src_f20 helper with the correct semantics
for float immediates. Finally, get rid of some bogus shifting in the
ALU code.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26114>
Bindings are not necessarily globally unique, and even the location
where we were trying to read the binding value out of is a union, so
we could be trying to compare binding values against data for other
arg types.
Instead, use the arg metadata offset, which is globally unique and
outside of the union.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26555>
With this commit, NAK now takes on a more Cargo-like structure with
things split better into crates:
- bitview/
- lib.rs // Formerly bitview.rs
- nak/
- lib.rs // Only pulls stuff into the module
- api.rs // Formerly nak.rs
- ...
- nvfuzz/
- main.rs // Formerly nvfuzz.rs
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26546>
Batch bos are always allocated with ANV_BO_ALLOC_HOST_CACHED_COHERENT
so there is no need to do cflush calls.
But if we ever decide to change that anv_bo_needs_host_cache_flush()
will make sure cflush is called.
Outside of batch bos, this patch is also removing the
intel_flush_range() call from anv_QueuePresentKHR because
device->debug_frame_desc is offset of workaround_bo that is also
allocated as ANV_BO_ALLOC_HOST_COHERENT.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26457>
As suggested by Lionel, here adding ANV_BO_ALLOC_HOST_COHERENT
and with that ANV_BO_ALLOC_HOST_CACHED_COHERENT is now defined by
(ANV_BO_ALLOC_HOST_COHERENT | ANV_BO_ALLOC_HOST_CACHED).
In some callers of anv_device_alloc_bo() was necessary to add
ANV_BO_ALLOC_HOST_COHERENT as no other flag was set and that
was the default behavior up to now.
A change that could look not related is the removal of the
intel_flush_range() in anv_device_init_trivial_batch(), that was done
because trivial_batch_bo is HOST_COHERENT so no flush is necessary.
And it did not made sense to make it ANV_BO_ALLOC_HOST_CACHED_COHERENT
as it was never read in CPU.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26457>
- Adds new 'all' value to the video-codecs option
- Adds 'all_free' value to video-codecs and sets
it as default value for non patent-encumbered
codecs, restoring the behavior for these codecs
before existing as options in commit 7b22dd8bfd
Fixes: 7b22dd8bfd ("meson: add vp9 and av1 codec support options")
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26320>
This fields are required to be set because those are used by
XY_FAST_COLOR_BLT instruction.
Right now it is not set causing applications to abort because
DestinationMOCS is required to be non-zero.
This fixes at least piglit@ext_external_objects-vk-image-display on MTL.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26502>
On newer platforms (Arrowlake and above) we can issue a
EXECUTE_INDIRECT_DRAW that allows us to:
* Skip issuing mi load/store instructions for indirect parameters
* Skip doing the indirect draw unroll on the CPU side when the
appropriate stride is passed
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26178>
Replacement in try_eval_const_alu() doesn't work because the replacements
are always scalar. The callers also always give a scalar dest.
This is encountered when compiling a Redout shader under ASan.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Fixes: bc170e895f ("nir/loop_analyze: Use try_eval_const_alu and induction variable basis info")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26225>
It has been in a rough few days for that farm and it doesn't seem to be
able to handle the load.
Unfortunately we need to consider it down so that it stops blocking
merges.
All these jobs are redundant and a waste of resources:
- the containers have already been built & pushed in the merge pipeline
- the mesa build variants have already all passed
- the driver tests have already all passed
None of these jobs are doing anything useful in this pipeline, but it
costs a factor of 2x to our infrastructure, so let's remove them.
In other words, the only job left in the post-merge pipeline is the
`pages` job that deploys the update to the website.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26451>
RADV currently has two paths for PS epilogs:
- the first one is mostly used by GPL to compile fragment shader epilogs
as part of the graphics pipeline. It's supposed to be optimal because
fragment shader epilogs are compiled in the pipeline and eventually
cached.
- the second one (the "on-demand" path) is required when some dynamic
states are used because otherwise it's just impossible to compile the
fragment shader. These epilogs are compiled during cmdbuf recording
when all needed info are known, they are also cached in memory. This
is the main path for Zink.
Having two different paths isn't ideal for maintenance but there is
another problem. On RDNA3, alpha to coverage needs to be exported as
part of MRTZ when either depth/stencil/samplemask are exported. The
problem being that with GPL, the PSO multisample state can be NULL when
the frag shader lib is created, which means that we can't know if atc
needs to be exported or not, even if it's static. The solution seems to
to always use on-demand fragment shader epilogs for GPL on RDNA3.
So far, I think that switching to on-demand PS epilogs unconditionally
for GPL shouldn't hurt performance and that will simplify a lot of
things.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26398>
nvc0 does nvc0_hw_query_get(push, q, 0x00, 0x0d005002 | (q->index << 5));
which corresponds to sub report.
Just noticed in code review, don't know of any tests that will notice yet.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26513>
This doesn't do anything yet because it's a Vulkan 1.2 core feature and
not in any extension but it will apply when we enable Vulkan 1.2 in the
future or if someone uses MESA_VK_VERSION_OVERRIDE=1.2/1.3.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26530>
The standard block shapes (and by extension, the tiling formats they
require) are simply incompatible with getting a 2D view of a 3D image.
I couldn't find in the Vulkan spec anything related to what are the
expectations when trying to use both at the same time.
So here we "document" that this case is known non-standard. Please
notice that since we report residencyStandard3DBlockShape as true we
were actually supposed to support this case, but I can't see how this
would be possible, so set is_known_nonstandard_format to true so we
can avoid the assert() that comes right after.
Fixes the following when using Zink:
KHR-GL46.sparse_texture_tests.SparseTextureAllocation
Also "moves forward" the following test on Zink, so it now hits a
different assertion:
KHR-GL46.sparse_texture_tests.SparseTextureCommitment
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26454>
During vkGetPhysicalDeviceSparseImageFormatProperties(), check first
if the non-sparse version of the image is supported, and return in
case it's not.
On TGL, if we don't do that, we may hit the following assertion:
deqp-vk: ../../src/intel/isl/isl.c:2584: isl_surf_init_s: Assertion `!(info->usage & ISL_SURF_USAGE_CPB_BIT) || dev->info->has_coarse_pixel_primitive_and_cb' failed.
My TGL doesn't has_coarse_pixel_primitive_and_cb.
Fixes the following on TGL:
dEQP-VK.api.maintenance5.flags.sparse_image_format_props
dEQP-VK.api.maintenance5.flags.sparse_image_format_props2
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26454>
If the size here is not a multiple of the granularity (64kb) then
we'll miss our "pages" estimation by 1. We could fix this with
DIV_ROUND_UP() or by simply putting a "+1" there, but the upper layers
should now be preventing this case so let's just put the assertion
here.
Previously it was possible to hit this case with Zink by running
under certain conditions piglit/arb_sparse_buffer-basic.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26454>
From the spec:
"Resources can be bound at some defined (sparse block) granularity."
"The sparse block size in bytes for sparse buffers and
fully-resident images is reported as
VkMemoryRequirements::alignment. alignment represents both the
memory alignment requirement and the binding granularity (in bytes)
for sparse resources."
Not only the upper layer (the Spec) doesn't allow this, the lower
layers (both the vm_bind ioctl and TR-TT) also work on a granularity.
Just check for this case and return an error.
Before this check, what would happen was:
- for the vm_bind backend, the vm_bind ioctl would fail
- for the TR-TT backend, we'd understimate l1_binds_capacity and
fail an assertion, or we'd just silently bind 64kb instead of the
original size
Currently, some Zink tests such as piglit/arb_sparse_buffer-basic can
trigger this behavior, but we're working to fix Zink for this case
(and that commit may be merged before this one).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26454>
Adds a venus dri option to advertise support for multi-plane format
modifiers to Vulkan's common WSI. Otherwise, Venus will only support
modifiers with planeCount == 1 to ensure compatibility with Xwayland's
virgl-backed Glamor backend.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26240>
Force the use of single-plane modifiers for tiled wsi images as long as
Venus is integrated with Virgl, which does not support non-format
compression metadata planes (e.g. Intel's CCS or AMD's DCC modifiers).
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26240>
For compositors that advertise modifier support, Vulkan common WSI
modifier support queries still fail in Venus on the Intel ANV driver.
This is due to the presence of VK_CREATE_IMAGE_ALIAS_BIT, without
accompanying wsi_image_create_info struct, which is implicitly excluded
from serialized messages over the venus-protocol.
By removing ALIAS_BIT, modifier queries begin to pass when the host
supports them.
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26240>
Without such, Xwayland gets back the implicit modifier token (INVALID)
when calling gbm_bo_get_modifier() for a dmabuf shared by the WSI layer.
Then mistakenly sends INVALID upon wl_buffer creation, rather than the
explicit modifier sent by WSI.
The logic of Xwayland's Glamor gbm backend is a bit circuitous, since
the modifier is sent by WSI alongside the dmabuf fd. Rather than use
that modifier directly when creating wl_buffer (via
zwp_linux_dmabuf_v1), Glamor first imports the dmabuf+modifier with
gbm_bo_import(), then uses the result of later gbm_bo_get_modifier().
Signed-off-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26240>
Instead of blatantly assuming with no assert that they're the same, add
a remap function. Also, be more careful about which enum we use where.
In the AST, we use GLenum and GL_FOO because we also need GL_ISOLINES.
When we translate to shader info, GS gets translated into mesa_prim
and tessellation gets translated into tess_primitive_mode which has
ISOLINES as a valid primitive value.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24821>
RADV compacts all descriptors for multi-planar images into one
combined image sampler, so it should be 96, and not eg. 192 for a two
planes format.
Fixes new CTS
dEQP-VK.binding_model.descriptor_buffer.ycbcr_sampler.*array.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26498>
Save a comparison, and move out the comparison to be more backend friendly.
Saves 2 instrs on AGX (as the remaining comparison now fuses with bcsel).
Results on AGX, all affected shaders in asphalt9.
total instructions in shared programs: 1813003 -> 1812611 (-0.02%)
instructions in affected programs: 119646 -> 119254 (-0.33%)
helped: 333
HURT: 0
Instructions are helped.
total bytes in shared programs: 11870344 -> 11867208 (-0.03%)
bytes in affected programs: 820888 -> 817752 (-0.38%)
helped: 333
HURT: 0
Bytes are helped.
and on Mali-G57:
total instructions in shared programs: 2677538 -> 2677205 (-0.01%)
instructions in affected programs: 206923 -> 206590 (-0.16%)
helped: 333
HURT: 0
Instructions are helped.
total cvt in shared programs: 14667.50 -> 14662.30 (-0.04%)
cvt in affected programs: 1953.64 -> 1948.44 (-0.27%)
helped: 333
HURT: 0
Cvt are helped.
total quadwords in shared programs: 1450664 -> 1450544 (<.01%)
quadwords in affected programs: 5064 -> 4944 (-2.37%)
helped: 15
HURT: 0
Quadwords are helped.
total threads in shared programs: 53282 -> 53309 (0.05%)
threads in affected programs: 27 -> 54 (100.00%)
helped: 27
HURT: 0
Threads are helped.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26489>
Otherwise, we can transition to exact before p_jump_to_epilog, then
transition to WQM again and then back to exact:
p_jump_to_epilog //transitions to exact
p_logical_end //transitions to wqm
p_end_wqm //transitions to exact
We rely on ssa elimination to clean most of this up.
fossil-db (navi21):
Totals from 1 (0.00% of 79330) affected shaders:
Instrs: 111 -> 110 (-0.90%)
CodeSize: 572 -> 568 (-0.70%)
Copies: 16 -> 15 (-6.25%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25440>
This creates a function that can be used by both pipelines and shaders.
Note that we cannot yet call tu_CreateShadersEXT directly inside the
pipeline due to things like pipeline feedback, multiview, and so on, but
further extensions will hopefully bring us closer to that ideal.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25679>
With shader objects, we won't have the pipeline layout available. This
means that the current way we implement dynamic offset descriptors in
combination with fast-linking and independent descriptor sets, where we
use the pipeline layout when fast-linking that has pre-computed offsets
for each descriptor set, won't work. Instead we need to piece together
the sizes of the descriptors in each descriptor set from the shaders.
This is already effectively what we do when we stitch together the
pipeline layout when fast-linking, but we need to make it work with just
the shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25679>
OpBreak and OpBSsy aren't very SSA friendly as they require bar_in and
bar_out to be assigned the same register. We need to encure that RA
knows about this restriction. For now, we just special-case these two
instructions. In the future we may want a more generic mechanism for
this but it's not worth it for just two instructions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26463>
The NIR intrinsics now take and return a barrier whenever one is
modified instead of modifying in-place. In NAK, we give the internal
instructions the same treatment and convert everything to use barrier
SSA values and RegRefs. In nak_from_nir, we move all barriers to/from
GPRs. We'll clean up the massive pile of OpBMov later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26463>
OpBar and OpBSync both stall the thread until other threads get to that
point. These instructions must have .yld set. Also, warp barrier ops
don't support the usual instruction barrier mechanism so they should be
marked as having a fixed latency. It's unclear if the barrier file is
internally scoreboarded or if warp barrier ops just stall the whole
thread. In either case, this seems necessary.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26463>
In a scenario like:
CmdBindPipeline(stipple line enabled)
CmdDraw()
CmdBindPipeline(stipple line disabled)
CmdDraw()
The second draw wasn't resetting the stipple line state and this might
have caused issues, though it's uncovered by VK CTS.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26429>
Pre v10 Mali GPU were describing GPU jobs as a chain of job descriptors,
but new generations moved to a command stream based approach. The
pan_cs.{h,c} name was chosen based on the assumption this job chain
would replace the command stream we have on other GPUs, things will
become a lot more confusing now that we have a real command stream.
Let's rename these files before it happens. Given all the helpers in
there are either emitting descriptors, and calculating values to be
put in such descriptors, pan_desc.{c,h} sounds like an acceptable
name.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26356>
What pan_scoreboard manipulates is a job chain, how dependencies
between jobs is implemented is an implementation detail, and shouldn't
leak through the name.
Let's rename pan_scoreboard.h pan_jc.h, and prefix the functions
accordingly.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26356>
We never steal the NIR program or free it explicitly, and the state
tracker expects drivers to take ownership of the program object. Since
panfrost doesn't need to keep the original NIR shader around for
compute, let's just free it before returning.
Fixes: 40372bd720 ("panfrost: Implement a disk cache")
Cc: stable
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26424>
Enables 10 bpc / color depth 30 bit support under XOrg
with X11/GLX/OpenGL.
Successfully tested with RaspberryPi OS 11,
running X-Server 1.20, and also with Weston,
on a RaspberryPi 400 on top of current Mesa 24.0-devel.
Alejandro Piñeiro also performed a GLES CTS run
with successful result, citing him:
"Full GLES 31 CTS finished with 0 failres. So all ok"
Note that this commit was originally developed and
successfully tested by myself against Mesa 23.1-devel
from February 2023, and therefore should apply and work
cleanly against recent Mesa stable branches. One could
see this commit as a trivial compatibility fix against
X-Server 1.20 / modesetting-ddx 1.20, which is why I'm
also nominating this commit for the current 23.3 stable
branch, and also the 23.2 stable branch, so it may make
it into RaspberryPi OS 12. Thanks for the consideration.
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Backport-to: 23.2
Backport-to: 23.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26472>
It seems like when we increased the number of tests per shard, we
started overcommitting the Renoir runner, leading to load averages
higher than the 16 CPU threads could handle, while also running at
75-96% memory usage.
By dropping the concurrency to 16, we should be able to reduce this
memory usage while also reducing the execution time.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26501>
As with the memory modifiers, put the . on the modifier rather than
having to do it as part of the print itself. Also, add printing of
accumulators but only if it's not a trivial accumulation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26246>
The NIR lowering just clamps to [-1, 1] which should turn into two IMnMx
as opposed to the 4 instructions we're emitting now. We can maybe do
better than the NIR lowering for 64-bit but that seems unnecessary.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26246>
We had the unfortunate finding on a recent platform to learn that the
bindless sampler heap is not functioning as expected.
Nowhere in the documentation is the size of the heap written down. So
most people assumed that's the max number that we can program (4Gb).
The reality is that it's only 64Mb.
Though it is appearing like it's working properly for the whole 4Gb
range for most apps, this is only because the HW bounds checking
applied is broken. Instead of clamping anything beyong 64Mb, it's only
clamping the last 4Kb of each 64Mb region.
So this heap is useless for us to make a 4Gb region of both sampler &
surface states...
This change essentially turns off the bindless sampler heap on DG2+.
The only location where we can put SAMPLER_STATE elements is the
dynamic state heap. Unfortunately we cannot align the dynamic state
heap with the bindless surface state heap. So the solution is to
allocate sampler & surface states separately, each from the own heap
in the descriptor pool.
We now have to provide 2 sets of offsets for surfaces & samplers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25897>
According to PRMs, to use self-modifying code correctly we have to
disable preparser before jumping to the generated commands, and re-enable
it with a first command in that buffer.
Old implementation did it wrong: for both inplace and inring generation
it disabled preparser before running the generation shader, had it
disabled during generation, and re-enabled it just before jumping to
the generated commands.
This usually didn't cause any trouble, because the generation shader and
generated draws are in different BOs, and the jump distance is greater than
the command FIFO depth. But we allocate them from the same pool,
so there are rare cases where the end of the BO with generation commands,
and the beginning of the BO with generated draws are adjacent. In such
cases, the wrong commands might be fetched.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10162
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26427>
bos allocated into IRIS_HEAP_DEVICE_LOCAL_PREFERRED can always be
mmaped because it is also backed to smem which is not the case for
IRIS_HEAP_DEVICE_LOCAL.
This fixes issues with small PCIe bar systems.
Fixes: 21170a58d8 ("iris: Split system memory heap into cached-coherent and uncached heaps")
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26455>
asahi will pass in 16bits, works fine if we convert before clamping. note we
don't try to be clever and make a smaller immediate because it would require
extra logic for negatives to make sure we don't have garbage in upper bits
(nir_validate checks that). do the simple, obviously correct thing.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26440>
Compute tunneling can considerably lower the latency of high-priority
compute work. Enabling it is beneficial in cases where high-priority
work is dispatched while the GPU is already busy with other work (e.g.
rendering on GFX). This is the case in VR compositors that dispatch
latency-sensitive compositing work to ACE while GFX is busy rendering
the next frame.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26462>
The Vulkan spec says:
"If samples is not VK_SAMPLE_COUNT_1_BIT, then imageType must be
VK_IMAGE_TYPE_2D, flags must not contain
VK_IMAGE_CREATE_CUBE_COMPATIBLE_BIT, mipLevels must be equal to 1..."
So, the source image is always 2D with no mipmaps.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26316>
I forgot to generate the relnotes when I did the release, and when
I generated it a couple days later the script picked `today()` instead
of the date on the tag (because it's supposed to be run _before_ tagging
the release), and I didn't notice right away.
Fixes: cad37be6c9 ("docs: add release notes for 23.3.0")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26475>
Previously, this callback could try to set the requested alignment to
NIR_ALIGN_MUL_MAX, which would then overflow the u16 value in the
struct. We don't actually need that much alignment though, and this
value only really matters if we needed to increase alignment anyway.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26347>
They have been causing hangs intermitently in CI for the past week,
until it finally caught my attention and forced me spend a couple of
hours bisecting the issue.
We'll re-introduce support for it when the issue is fixed.
Fixes: b975d4e800 ("radv: enable meshShaderQueries on GFX10.3")
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26450>
pipeline_is_dirty was never TRUE because it's emitted in the before
helper. This might fix bad interactions between DGC and RT because
they both use compute shaders and descriptor bindings need to be
re-emitted.
Found by inspection.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26417>
The Vulkan spec says multi-planar images can only be copied on a
per-plane basis. The COLOR_BIT to "all planes" expansion applies to
image memory barriers which is completely unrelated.
Remove the expansion logic to simplify the code. Add assertions to
clearly describe the invariant.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26364>
Otherwise, if a variable has multiple derefs in a shader, we'll crash
trying to remove it a second time. No idea how that can happen though,
seems derefs got sunk by opt_dead_cf.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26435>
v2 (idr): Fix expectations BottomBreakWithContinue. opt_predicated_break
will remove the IF and make the CONTINUE predicated.
v3 (idr): Temporarily disable the one test that fails.
v4 (idr): Free strings allocated by open_memstream. Fixes gitlab CI
failures in debian-testing-asan.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25216>
Imagine 3 blocks A, B, and C. A has a physical link to B, and B has a
logical link to C. Previous to this commit, if B were removed, A would
get a logical link to C. This is not correct.
This was specifically observed to occur when block A was a DO block and
B was the WHILE block. The DO block would have two logical successors,
and that is completely invalid.
v2: Assert that the links from A-to-B and B-back-to-A are the same
kind. Suggested by Caio.
v3: Assume the successor and predecessor lists are well formed. Use this
to simplify the logic. Suggested by Caio. Add checks to cfg_t::validate
to ensure the lists are well formed.
v4: Remove (now unused) bblock_link_invalid. Suggested by Curro.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25216>
The previous is_successor_of and is_predecessor_of checks prevented
creating a physical link when a logical link already existed. However, a
logical link could be added when a physical link already existed. This
change causes an existing physical link to be "promoted" to a logical
link.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25216>
When blitting resources, we need to specify the boxes in mip-level sized
coordinates. For the X and Y coordinates, missing this makes things
behave correctly, but only because we end up clipping away the excess
area.
However, for the Z coordinate of 3D textures, this will make us read
outside of the mip-chain during blitting, making us stumble and crash.
But let's fix what we do for all dimensions. And while we're at it,
rewrite the code a bit, so we don't end up computing any needless
values.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26077>
This reverts commit 70c9fc66ffab8cb85b37c74b507201097e16da85. We're now
handling non-DW-aligned UBO loads in NIR where we can handle it a bit
more completely, we don't need to carry the nak_from_nir code.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26412>
1) This better reflects the reality that we only have one timeline
of sparse binding changes.
2) Allows making it a threaded queue from the start in prep of
explicit sync stuff.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16935>
This changes allow us to support HOST_COHERENT, HOST_CACHED and
HOST_COHERENT + HOST_CACHED memory types for platforms that has
the PAT uAPI.
Be aware that Xe KMD will not be able to support cached only memory
types, anv_xe_physical_device_init_memory_types() will reflect that
but internal usage should not allocate
VK_MEMORY_PROPERTY_HOST_CACHED_BIT only memory, hence the assert
added.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25462>
Move it into nak_postprocess_nir, after lower_mem_access_bit_sizes.
Also, fix the callback for comparison and conversion ops.
For conversion ops, we don't want to lower any of them right now. We'll
need to lower some 64-bit conversions eventually but we'll figure out
those details when we get to implementing real 64-bit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26348>
This provides a basic implementation of VK_AMD_buffer_marker: we can
write the 32-bit markers from within a command buffer. Unfortunately,
our hardware has several limitations that make this difficult to
implement well:
1. We don't have insight into when specific stages finish (i.e.
all geometry shaders are done, but pixel rasterization may
still be occurring).
2. We cannot perform pipelined writes of 32-bit values to arbitrary
memory locations. PIPE_CONTROL::Write Immediate Value would be
the obvious way to implement this, but it only supports 64-bit
values, and the extension doesn't allow us to do that. We instead
use MI_STORE_DATA_IMM to write 32-bit values, but this requires
hard stalls.
Despite those limitations, the extension may still be useful for tools
to debug GPU hangs. We hope to offer another extension in the future
which offers similar functionality but is more efficient on our GPUs.
v2: Updated by Lionel Landwerlin to fix a number of flushing and
cache coherency issues with these writes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14924>
These were left unimplemented despite sparse support being added back to
venus in a55d26b5 ("venus: add back sparse binding support")
Same as vn_GetPhysicalDeviceSparseImageFormatProperties2, venus sparse
support requires queues that also support transfer so any sparse-only
queues are filtered out. If a device only supports sparse with
sparse-only queues, sparse features are disabled and these functions
return count of 0.
Fixes: a55d26b566 ("venus: add back sparse binding support")
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26297>
We actually trigger on whether or not NAK is used for everything. If
so, we claim 1.1, otherwise claim 1.0. We need NAK for subgroup ops and
other advanced shader features in later Vulkan versions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26383>
Previous code was settting whatever priority was requested even if
Xe KMD would not allow it causing warnings in dmesg:
xe 0000:00:02.0: [drm:exec_queue_set_priority [xe]] Ioctl argument check failed at drivers/gpu/drm/xe/xe_exec_queue.c:235: value > xe_exec_queue_device_get_max_priority(xe)
xe 0000:00:02.0: [drm:xe_exec_queue_set_property_ioctl [xe]] Ioctl argument check failed at drivers/gpu/drm/xe/xe_exec_queue.c:912: ret
Now it will query the maximum allowed priority and set the priority
closed to what application requested.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26325>
A "dance" is required with this uAPI, first we need to ask KMD what is
the size of the giving query id, then memory needs to be allocated to
match that size and then query again with the memory address set and
at this time Xe KMD will copy the query data to memory.
This dance was being duplicated in xe_engine_get_info() and
anv_xe_physical_device_get_parameters() and the next patch will also
use it in Iris, so here adding it common/xe and re-using as much
as possible.
There is one more implementation of this function in intel/dev but
due to how libs are linked intel/dev can't depend on to intel/common.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26325>
This solves a problem when you have a big memory chunk of which some
regions are bound to images. If the image is destroyed, currently the
aux-tt mapping stays and prevent any new image aux-tt mapping within
that region, until the memory is freed.
This maps & unmaps the aux-tt region at respectively bind & destroy
time, so that the memory chunks can be map through aux-tt.
If there is aliasing of memory to 2 different images, then the first
one "wins" the aux mapping and gets compression support. The second
one doesn´t.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: ee6e2bc4a3 ("anv: Place images into the aux-map when safe to do so")
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26335>
To implement this feature, we need to do CPU side tracking of all
L3/L2/L1 entries. This does add a little bit of CPU allocations, but
the advantage is that the traversal of the page table tree is faster.
No more need for the linear seach of find_buffer().
With this feature, we can have multiple VkImage bind to the same main
memory address, as long as they share exact same mapping parameters.
The AUX mapping will be removed when the last VkImage is destroyed.
As previously, if the L1 mapping entry parameters don't match, the
mapping fails. Anv handles this nicely by just disabling AUX on the
image.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26335>
Once NIR code is lowered and a few optimization passes have run, there
might be flag register interactions between instructions quite far
away from one another.
In the following case :
f0 = and r0, r1
...
fs_interpolate r2, r3
...
if f0
...
endif
If we lower fs_inteporlate while using the f0 register, we completely
garble the value meant for the if block.
To fix this, emit the predication for fs_interpolate in brw_fs_nir.cpp
when doing the NIR translation to the backend IR. This will guarantee
that the flag register interactions are visible to the optimization
passes, avoiding the problem above.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 68027bd38e ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9757
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26306>
We don't care about patterns like
loop {
...
if (...) {
break;
} else {
...
}
...
}
In that case, we don't need to sync after the if because there's nothing
to re-converge. Every path except one will end up breaking out of it
anyway.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26382>
`\$` is interpreted before being passed to `re.search()`, but luckily
for us the escape is also invalid and because of that, python 3.12+
warns us about it.
Use a raw string instead, so that the `\` is passed untouched to
`re.search()`.
Fixes: aa04b47c6e ("intel/perf: add support for GtSlice/GtSliceXDualsubsliceY variables")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26355>
This is required in order to return the correct value for
`gbm_dri_bo_get_offset()` for e.g. the second plane of a NV12 image.
Use the newly introduced `util_resource` helper and, while on it, also
add support for `gbm_bo_get_plane_count()`.
Cc: mesa-stable
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26283>
This is required in order to return the correct value for
`gbm_dri_bo_get_offset()` for e.g. the second plane of a NV12 image.
Use the newly introduced `util_resource` helper and, while on it, also
add support for `gbm_bo_get_plane_count()`.
Cc: mesa-stable
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26283>
When allocating 2MB chunks of memory, the kernel can use 64K pages if
both the virtual and physical addresses are aligned to 2MB. While we
can't control the physical allocation, we can ensure the virtual address
we use with softpin meets the requirements.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25447>
16K was apparently a little unrealistic - Unigine Superposition has
individual shaders that are larger than 16K. Yikes. Moving to 64K
also puts shaders into the same cache bucket as other allocations.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25447>
On non-LLC systems, most system memory isn't coherent between the CPU
and GPU by default. However, we can enable snooping or 1-way coherency
at a performance cost. In the old days, we maintained one set of cache
buckets and toggled coherency as needed via I915_GEM_SET_CACHING. But
in the modern uAPI (GEM_CREATE_EXT_SET_PAT) we have to select coherency
up front at BO creation time. So this doesn't work out well anymore.
This patch splits system memory into two distinct heaps:
- IRIS_HEAP_SYSTEM_MEMORY_CACHED_COHERENT
- IRIS_HEAP_SYSTEM_MEMORY_UNCACHED
The flags_to_heap() function will select a heap for a given allocation
based on the coherency/scanout requirements, as well as the hardware
configuration (LLC integrated, non-LLC integrated, or discrete card).
Once a heap is selected, it completely determines the cacheability and
coherency settings. A given heap will always have the same mmap mode
and PAT index. This enables us to simplify a lot of code.
Because each heap also has its own bucket cache, we no longer have to
disable bucket caching based on flags, cacheability, coherency, mmap
modes, or so on, as all BOs in each cache have matching settings.
This effectively enables bucket-caching for non-LLC systems.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25447>
Originally we only had a single bucket cache, and it was the main
allocation mechanism. Later on, we added one for device-local memory,
and then a third one for device-local-preferred. Each time, we just
copy and pasted the fields, keeping them all as direct members of the
bufmgr struct. This is getting a bit unwieldy.
This patch introduces an iris_bucket_cache structure to contain the
list of buckets and the number of buckets. It then replaces the three
inline copies with a bufmgr->bucket_cache[heap] array. This lets us
drop a bunch of copy and pasted code in favor of a loop over heaps.
This will also make it easier to add more heaps.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25447>
Iris doesn't make any call to intel_flush_range*() functions so all BOs
created without BO_ALLOC_COHERENT are not coherent between CPU writes
and GPU reads.
A lot of places don't set BO_ALLOC_COHERENT not even command buffers
have it.
And this incoherency is causing most of tests to fail after
the patch that extracted("iris: Calculate iris_mmap_mode using
intel_device_info_pat_entry when possible") the mmap mode from the PAT
entry.
Before that patch MTL was creating BO with a WB PAT index but then
mmaping as WC.
So to fix this for now making the default PAT entry for Iris a WC one.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25447>
Turns out that besides the benefits from nir_move_vec_src_uses_to_dest
itself, it also creates new opportunities for vectorization. Enable it
for vertex shaders, there is a clear instruction win and the only
downside is some increased register presure. However this is mostly
concerning few Unigine Tropics and Sanctiary shaders where we go
11->14 or 10->13 used registers. According to the docs, the increased
register usage would only lower vertex processing concurency if we go
over 15 (R300) or 25 (R500) registers, so we should be safe here.
Fragment shaders are a mixed bag so leave them be for now.
Shader-db RV530
total instructions in shared programs: 129303 -> 128762 (-0.42%)
instructions in affected programs: 13887 -> 13346 (-3.90%)
helped: 99
HURT: 0
total temps in shared programs: 17355 -> 17543 (1.08%)
temps in affected programs: 730 -> 918 (25.75%)
helped: 4
HURT: 66
total cycles in shared programs: 197190 -> 196984 (-0.10%)
cycles in affected programs: 9998 -> 9792 (-2.06%)
helped: 65
HURT: 0
Shader-db RV370:
total instructions in shared programs: 84807 -> 84225 (-0.69%)
instructions in affected programs: 10203 -> 9621 (-5.70%)
helped: 92
HURT: 0
total temps in shared programs: 13036 -> 13231 (1.50%)
temps in affected programs: 787 -> 982 (24.78%)
helped: 4
HURT: 73
total cycles in shared programs: 133178 -> 132946 (-0.17%)
cycles in affected programs: 5911 -> 5679 (-3.92%)
helped: 58
HURT: 0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25417>
* For now, move the visual mask flags into hgl_context.h
* This removes an un-needed dependency on GLView.h from glvnd
* Eventually, these need converted into normal EGL parameters
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26322>
vl_rbsp_fillbits may fill less than 32 bits if it removes emulation
prevention bytes, but will fill at least 16 bits. We need to call it
twice when reading more than 16 bits.
This fixes parsing H264 SPS packed header in va frontend when emulation
prevention bytes are at position where 32 bit values are read.
Cc: mesa-stable
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26276>
3b3b157961 removed setting the pipeline type
assuming that it was already set by nvk_pipeline_zalloc. That helper was
not actually being used, so this patch updates it to do so.
This does not fix anything as the NVK_PIPELINE_GRAPHICS = 0 anyway, so it
is just a code cleanup.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26329>
In commit 1396dc1c a new output field was added as a parameter, but this
is a problem since the signature of the function are not versionned.
The flush function didn't have a versionned output struct. So what I'm
proposing here is that if the version of the input argument is new enough
(bumped to 2 here), then we re-use the existing argument, which until now
was directly a pointer to GLsync, and instead use it as a pointer to a
versioned struct.
We're just changing one pointer type to another, so in C, this should
be fine AFAIK.
Fixes: 1396dc1c
Signed-off-by: Sylvain Munaut <tnt@246tNt.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26315>
When given (e.g.) 3x 16-bit components to store on a device that
isn't using native 16-bit loads and stores, we should be lowering
that into one 32-bit store and one masked store. Instead, the logic
here ends up returning that the best we can do is one 8-byte store,
which is clearly wrong. Stores should round down, loads should
round up.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26293>
Those are not reused, so this will be the first and only allocation, so
no need to use the "realloc" variants.
For the fs_reg arrays, there's currently no particular reason to keep
them uninitialized, so zero-initialize them too -- not ideal but better
than random values.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26302>
GFX11 uses pipeline statistics for mesh/task queries but on GFX10.3
they need to be emulated. Though the number of mesh/task shader
invocations would be copied to the pipeline statistics range to
simplify the implementation.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25950>
We now have a lot of feature toggles in fd_dev_info. Generate
env var options for all of them to quickly test whether feature
misbehaves or test its impact on the performance.
FD_DEV_FEATURES=%feature_name%=%value%:%feature_name%=%value%:...
e.g.
FD_DEV_FEATURES=has_fs_tex_prefetch=0:max_sets=4
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25939>
After introduction of A7XX it doesn't make sense to define base GPU
properties in A6xxGPUInfo. Now we move to a more clean definition:
- a6xx_base + a6xx_genX - for A6XX
- a7xx_base + a7xx_xxx - for A7XX, there is no sub-gens clearly
identifiable at the moment.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25939>
This reworks the intel_compute_pixel_hash_table_nway() pixel pipe
hashing table computation helper to handle cases where some pixel
pipes have processing power different from the others, this is helpful
for Gfx12.7+ platforms where there are pixel pipes with 1 DSS as well
as pixel pipes with 2 DSSes, which currently can lead to a serious
performance bottleneck in the pixel pipes with lower processing power.
In order to avoid such a load imbalance the
intel_compute_pixel_hash_table_nway() function will now take two pixel
pipe bitsets instead of one: Pixel pipes enabled on both bitsets will
appear with twice the frequency on the table as pixel pipes which only
appear on one bitset. See the comments below for more details on the
algorithm used to construct a pixel hashing table with the desired
properties.
With this change rendering performance improves by about 25% on a
fused MTL platform -- The list of specific configs this is expected to
show an improvement on is not included here since the list is rather
long and some of the configs may still be embargoed or may never be
productized, but in order to find out whether your Gfx12.7+ device
could be affected by this you can check the output of the
intel_dev_info tool from the Mesa tree and see if there are multiple
"pixel pipe" entries with different DSS count. That isn't expected to
occur on any DG2 configuration, only on MTL+ platforms, so this change
should have no effect at all on DG2 (it's easy to convince oneself
that it won't since for DG2 mask1 should equal mask2 so mask2 will be
set to zero at the beginning of intel_compute_pixel_hash_table_nway()
and the new swzx[] permutation will be set to the identity).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26266>
They're now printed as part of the op for textures and we've changed the
names to follow the PTX convention. For buffers and cube maps, I had to
come up with my own thing but I think the result is okay.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26291>
The new trait has separate fmt functions for dsts and the rest of the
op. There's a generic implementation of Display built on top of it
which just prints `{dsts} = {op}`. The new trait also has a default
implementation of fmt_dsts() which just walks the dsts using
dsts_as_slice() and auto-formats them all. This should be what we want
for 95% of cases and we can hand-implement the rest.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26291>
Prefix them with jm_ so we can easily specialize things for CSF.
panfrost_batch_submit_ioctl is renamed into jm_submit_jc() which
reflects the fact the ioctl() submits one job chain at a time, and
panfrost_batch_submit_jobs() is renamed jm_submit_batch() because it's
shorter and is descriptive enough.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26249>
First of all, we want to prefix those with jm_, so we can later add
CSF-specific job emission helpers. We also take this as an opportunity
to clarify what these helpers emit exactly: the draw section of a job
descriptor (suffixed with _draw) or the job descriptor itself (suffixed
with _job).
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26249>
In preparation of CSF support addition, we move any JM related bits out
of panfrost_batch into a panfrost_jm_batch struct that's embedded in
panfrost_batch inside an anonymous union. This way, we can easily
specialize things for CSF without polluting the panfrost_batch object
with CSF-specific fields.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26249>
Move the panfrost_emit_tile_map() call before any of the per-gen
calls in panfrost_batch_submit(). This is in preparation of moving
the per-gen batch submission logic to a single hook instead of
having multiple indirect calls, and given the tile map doesn't
depend on any of the states modified by the per-gen calls, moving
it before them shouldn't be an issue.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26249>
We turn the 10000 jobs limit into a 10000 draw limit because:
- on HW supporting IDVS a draw is just a single job, and having a common
limit for all HW is simpler
- draw_count < 10000 fits in the max jobs limit if we assume the worst
case scenario (3 jobs per draw)
- CI seems to be happy (no spurious timeouts after this change)
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26249>
Vulkan shaderdb stats with pattern dEQP-VK.image.*.with_format.*.*:
total instructions in shared programs: 35993 -> 33245 (-7.63%)
instructions in affected programs: 21153 -> 18405 (-12.99%)
helped: 394
HURT: 1
Instructions are helped.
total uniforms in shared programs: 8550 -> 7418 (-13.24%)
uniforms in affected programs: 5136 -> 4004 (-22.04%)
helped: 399
HURT: 0
Uniforms are helped.
total max-temps in shared programs: 6014 -> 5905 (-1.81%)
max-temps in affected programs: 473 -> 364 (-23.04%)
helped: 58
HURT: 0
Max-temps are helped.
total nops in shared programs: 1515 -> 1504 (-0.73%)
nops in affected programs: 46 -> 35 (-23.91%)
helped: 14
HURT: 2
Inconclusive result (%-change mean confidence interval includes 0).
FWIW, that one HURT on the instructions count is for just one
instruction.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25726>
Since v71, broadcom hw include specific packing/conversion
instructions, so this commit adds opcodes to be able to make use of
them, specially for image stores:
* pack_2x16_to_unorm_2x8 (on backend vftounorm8/vftosnorm8):
2x16-bit floating point to 2x8-bit unorm/snorm
* f2unorm_16/f2snorm_16 (on backend ftounorm16/ftosnorm16):
floating point to 16-bit unorm/snorm
* pack_2x16_to_unorm_2x10/pack_2x16_to_unorm_10_2 (on backend
vftounorm10lo/vftounorm10hi): used to convert a floating point to
a r10g10b10a2 unorm
* pack_32_to_r11g11b10 (on backend v11fpack): packs 2 2x16 FP into
R11G11B10.
* pack_uint_32_to_r10g10b10a2 (on backend v10pack): pack 2 2x16
integer into R10G10B10A2
* pack_4x16_to_4x8 (on backend v8pack): packs 2 2x16 bit integer
into 4x8 bits.
* pack_2x32_to_2x16 (on backend vpack): 2x32 bit to 2x16 integer
pack
For the latter, it can be easly confused with the existing
pack_32_2x16_split. But note that this one receives two 16bit integer,
and packs them on a 32bit integer. But broadcom opcode takes two 32bit
integer, takes the lower halfword, and packs them as 2x16 on a 32bit
integer.
Interestingly broadcom also defines a similar one that packs the
higher halfword. Not used yet.
Note that at this point we use agnostic names, even if we add a _v3d
suffix as they are only available for broadcom, in order to follow
current NIR conventions.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25726>
The code in `emit_load_var` that will attempt to read indirect inputs
expects the entire array of inputs to be there. Additionally the code
that populates `bld->inputs_array` will populate the array using the count
of `inputs_read`, without ensuring the inputs it copies are the ones read.
This change populates `bld->inputs_array` with the entire contents of bld->inputs
so indirect reads will always match up.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26153>
For Wayland wsi allocations, v3dv used the wl_drm protocol, which is now
being phased out in favor of dmabuf feedback.
wl_drm is used to figure out the display device (in v3dv assumed to be
vc4) and then to authenticate with the Wayland compositor in order to
allocate scanout-able buffers (in this case, dumb buffers) directly at
the display device.
Recent commit 88c03ddd34 changed the behavior of the wsi code, and
wl_drm is now passing the render device instead, which broke Wayland
wsi.
It turns out that the authentication code is not really needed and since
we would like to remove wl_drm usage and the master device is assumed to
be vc4 anyway, we can just remove some unneeded device-specific wsi code
and get Vulkan Wayland wsi back to work.
Fixes: 88c03ddd34 ("egl/drm: get compatible render-only device fd for kms-only device")
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26200>
Meson produces version scripts with an empty global node for disabled
drivers. This is reported as syntax error by the linker.
The root cause of the problem is that the version scripts are
accumulated in the out of foreach `pipe_loader_link_args` variable
although they should be only used once for their driver specific loader
library.
Fixes build errors when some of the drivers are disabled like on arm64
which disables i915 due to missing dependencies.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10166
Fixes: 667de678a0 ("gallium: Fix undefined symbols in version scripts")
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26268>
We can probably do slightly better than this if we take advantage of the
predicate destination in SHFL but not by much. All of the insanity is
still required (nvidia basically emits this), we just might be able to
save ourslves a few comparison ops.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26264>
Nothing should currently hit this path.
The next commit adds code to nir_pack_bits and nir_unpack_bits that can
lead to this path being hit.
v2: Change nir_u2uN(..., 8) to nir_u2u8(...). Suggested by Alyssa.
v3: Don't generate nir_extract_u8 if the driver has set
lower_extract_byte. These instructions were causing some problems for
dozen.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> [v2]
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24741>
Output offset for resolves was wrong, and we can't use a double for the
timestamp multiplier, because doubles might be emulated, and that emulation
is only handled by shaders that go through the GL frontend.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26210>
For guest-based blob allocations from hostmem (Host visible memory),
to make sure that virtio-gpu driver will send to the host the address
(offset in the region) of the allocated blob using RESOURCE_MAP_BLOB
command a flag VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT must be set.
Otherwise, if the upper layers didn't set it, host can't import memory
and guest allocation from Host visible memory region makes no sense.
Signed-off-by: Andrew D. Gazizov <andrew.gazizov@opensynergy.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26130>
In addition to the platform requirement (use_guest_vram), device memory
allocations from dedicated heap (guest_vram) are necessary only when:
1. VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT is set and it indicates that
memory is host visible and assumed to be accessed by CPU (vkMapMemory).
2. One of external memory handle types is set, that indicates memory
can be exported with external handle.
In other cases it's not necessary to create virtgpu_bo object in the
guest and enough just perform vkAllocateMemory on host side without
memory import from dedicated heap.
Reported-by: Yiwei Zhang <zzyiwei@chromium.org>
Signed-off-by: Andrew D. Gazizov <andrew.gazizov@opensynergy.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26130>
blob_id == 0 does not refer to an existing VkDeviceMemory and implies
a shmem allocation. So for guest_vram device memory allocations, 0 is
not a valid blob id and must be greater than 0.
Therefore, set vk_object_id as blob_id for guest_vram device memory
allocations. Considering that vk_object_id made from valid pointer, it
will be always greater than 0.
Signed-off-by: Andrew D. Gazizov <andrew.gazizov@opensynergy.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26130>
Now that we have sparse resources on Anv these tests are finally
running, but they're failing. We'll eventually fix them, but let's not
make Zink gatekeep the entirety of sparse resource on Anv.
v2: KHR-GL46.sparse_buffer_tests.BufferStorageTest was initially
reported as Crash by Mesa CI. On my second run in Mesa CI it gave me a
Timeout. On my machine it passes but takes about 4 minutes to finish,
so skip it entirely.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Regarding supporting these formats, the spec says:
"A sparse image created using VK_IMAGE_CREATE_SPARSE_RESIDENCY_BIT
supports all non-compressed color formats with power-of-two element
size that non-sparse usage supports. Additional formats may also be
supported and can be queried via
vkGetPhysicalDeviceSparseImageFormatProperties.
VK_IMAGE_TILING_LINEAR tiling is not supported."
Regarding the formats themselves, the spec says:
"VK_FORMAT_B8G8R8G8_422_UNORM specifies a four-component, 32-bit
format containing a pair of G components, an R component, and a B
component, collectively encoding a 2×1 rectangle of unsigned
normalized RGB texel data. One G value is present at each i
coordinate, with the B and R values shared across both G values and
thus recorded at half the horizontal resolution of the image. This
format has an 8-bit B component in byte 0, an 8-bit G component for
the even i coordinate in byte 1, an 8-bit R component in byte 2,
and an 8-bit G component for the odd i coordinate in byte 3. This
format only supports images with a width that is a multiple of two.
For the purposes of the constraints on copy extents, this format is
treated as a compressed format with a 2×1 compressed texel block."
Since these formats are to be considered compressed 2x1 blocks and we
don't necessarily have to support non-compressed formats that
non-sparse support, we can claim them as not supported with sparse.
In addition to all of that, if you look at isl_gfx125_filter_tiling()
you'll see that we don't even support Tile64 for these formats, so
sparse residency (i.e., non-opaque image binds) doesn't really make
sense for them yet.
The Vulkan spec defines 4 other YCBCR "2x1 compressed" formats like
the ones we have in this commit, but we don't support them even
without sparse, so there's no reason to check them here.
A recent change in VK-GL-CTS made tests that use these formats go from
unsupported to failures:
7ecc7716a983 ("Do not use and check for STORAGE image support, when
it is not used in the test")
This commit "fixes" the following VK-GL-CTS failures (by making them
return NotSupported):
dEQP-VK.sparse_resources.image_block_shapes.2d.b8g8r8g8_422_unorm.samples_1
dEQP-VK.sparse_resources.image_block_shapes.2d.g8b8g8r8_422_unorm.samples_1
dEQP-VK.sparse_resources.image_block_shapes.2d_array.b8g8r8g8_422_unorm.samples_1
dEQP-VK.sparse_resources.image_block_shapes.2d_array.g8b8g8r8_422_unorm.samples_1
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
On i915.ko we don't have the vm_bind ioctl, so sparse requires TR-TT.
Unfortunately, on gfx < 20 TR-TT is not compatible with non-render
queues, so we have to disable those when sparse is enabled. Notice
that although we don't have TR-TT for non-render queues on gfx >= 20,
vm_bind is the default, and it doesn't have this restriction.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
We need to wait for the batches to complete before we return the BOs
to the pool. We were previously doing this completely synchronously,
which made the code unnecessarily wait. Now we have a timeline syncobj
that signals completion of the previous BOs, so sometimes we check
where we are in the timeline and then return the BOs that we know are
unused.
This, in addition to the previous patch that made us wait for the
other syncobjs through the execbuf ioctl instead of through the CPU,
makes TR-TT batches at least an order of magnitude faster. Still, I
don't think we'll notice any changes in games's FPS as they don't bind
sparse resources that often.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
When sparse images are being used, applications normally use
non-opaque binds and leave opaque binds just for the miptail part.
Since miptails are always at the end of the array layers, processing
the opaque binds after processing the non-opaque binds increases the
chance that anv_sparse_submission_add() will join the miptail bind
operation with the last non-opaque opreration, especially if the user
is trying to bind the last few non-miptail levels and the miptail in
the same vkQueueBindSparse opration.
In the real world this case does happen, so we're able to save a bind
operation every once in a while in Steam games.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Move waiting/signaling to the backends so we can fix each backend
separately.
As I write this patch the vm_bind backend is back to using synchronous
vm_binds so we can't pass syncobjs to the synchronous vm_bind ioctl
anymore. We'll need more discussions and possibly some rework before
we go back to asynchronous vm_binds. This commit should allow us to
fix the TR-TT backend in the next commit and leave vm_bind for later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
If we're going to move syncobj waiting/signaling down to the backend
we're going to need a queue to signal as lost in case those operations
fail.
In some places of the stack we don't have a queue available, such as
when we're creating or destroying resources. For those, for vm_bind
cases we don't use the queue for anything so passing it as NULL is
fine. For TR-TT we are already using device->trtt.queue.
For TR-TT specifically this also means we're going to start using the
actual queues from the call stack instead of trtt->queue, but that
shouldn't make any difference since we only ever have one queue.
Still, this is more technigally correct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Our ultimate goal is to have the backend functions deal with the wait
and signal syncobjs instead of waiting for them on the CPU inside
anv_queue_submit_sparse_bind_locked(). For that, we'll need waits and
signals parameters to be passed all the way to the backend functions
that actually make the submission, and this is what this patch does,
through struct anv_sparse_submission.
This patch just deals with passing the parameters to the functions,
nothing is using the new variables yet. There should be no functional
changes here. The goal here is to make code review easier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Currently, a single vkQueueBindSparse() call may lead to multiple bind
calls in the backend (either a vm_bind ioctl or a command submission
that updates the TR-TT page tables). These operations can be quite
slow so it's better for us if we try to emit as few of them as
possible.
On top of that, this gives our "just extend the last operation's size
if possible" code a little more chance to act and save us real time.
Our ultimate goal here is to also pass submit->waits and
submit->signals to the backend so we can avoid doing CPU waits, so
having a single call to the backend helps simplify things a little
too, and we just created the structure to carry these extra pointers
forward.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Having it helped us printing the resource offset, which made debugging
some situations easier. The problem is that we want to rework the code
a little bit and we won't have a 'sparse' struct anymore to pass
around. Since it's all debug code drop it for now so it doesn't get in
the way of the rework. If we need it later we can find a way to add it
back, or we find another way to print the value.
Drive-by drop the DEBUG_SPARSE check that's already in the caller.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Same as the L1 case, but this one deals with 64bit entry addresses and
pte addresses.
Consecutive L3/L2 writes are much rarer than L1 writes since they
require some pretty big buffers, but we can still those cases in the
wild. I just don't think any change will be noticeable though.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
TR-TT is a hardware feature supported by both i915.ko and xe.ko, which
means we can now finally have Sparse Resources on i915.ko and we also
have 2 options for xe.ko (and whatever is the best should be the
default).
In this patch we use batch commands to write the page tables and
forever keep them in device memory. We maintain a mirror of both the
L3 and and L2 tables because that helps us never having to read the
tables that are in device memory.
We still have some things to improve, but with this commit, workloads
that didn't work at all due to the lack of sparse resources should
at least run.
This is still all disabled by default in i915.ko, you can turn it on
by exporting ANV_SPARSE=1 before launching the applications. For
xe.ko, switch the default with ANV_SPARSE_USE_TRTT=1.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Nothing should ever be reading them, they logically do not exist. So there's no
point validating them, especially when the validation in question is so useless
(just checking the bit width, without any semantic awareness). Yet now that we
support vec16, this loop is quite hot even on scalar ISAs, and rather
pointlessly so. Just remove it and bring the ALU src validation complexity to
O(# of channels in source) instead of O(max # of channels in NIR).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Profiling showed that maintaining this ssa_srcs set consumes ~3% of CTS time
with a debugoptimized build. Unfortunately, we really do benefit from getting
this coverage in CI. So rather than remove the validation, let's optimize the
data structure used so we can keep the coverage at a fraction of the cost.
The expensive piece is the pointer set, which is backed by a relatively
expensive hash table. It would be much cheaper to use an invasive set instead,
with a single "present" bit. We don't want to bloat nir_src for this, however
there's an easy solution: use a tagged pointer to steal a bit in the nir_src for
the job. We untag everything at the end of validation (and this meta-invariant
is asserted with an auxiliary counter), so while we mutate the IR while
validating, the mutations do not escape nir_validate.
We tag the parent pointer and not the def pointer, because it is dramatically
less used and therefore has far fewer disrupted call sites.
The M1 job is improved from 3:03 to 2:55 of deqp-runner reported time, which is
excellent.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26084>
Bit 8 in the descriptor is not encoding the primary/secondary shader
information. It's a per shader-type field.
For fragment shader descriptors, it describes what the coverage bitmask
contains for per-sample execution:
- DX-style: bits for all covered samples are set
- GL-style: only the bit for the sample the shader is executed on is set
For vertex shader, it encodes the warp limit we want to apply to the
shader execution.
Patch the existing code to match the new semantics.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26221>
We're currently updating the hitT value in the traversal result with
the hitT value from reportIntersection(), but this is not correct.
First the hitT value of reportIntersection() should update the
gl_RayTmaxEXT value (maps to brw_nir_rt_mem_ray_defs::t_far).
Second the hitT determined by traversal should only be updated if the
reportIntersection() hitT value has updated the gl_RayTmaxEXT and that
the new gl_RayTmaxEXT is smaller than the determined hitT value from
traversal.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Fixes: 303378e1dd ("intel/rt: Add lowering for combined intersection/any-hit shaders")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25146>
Add static _locked versions of both functions which can do the usual
goto cascade for error handling and wrap them in exported functions
which take a lock, call _locked, drop the lock, and return.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26242>
In environments where 3D acceleration is not available, like in a VM,
the behavior before commit 8cd44b8843 ("egl: add automatic zink
fallback loading between hw and sw drivers") was to fallback to swrast.
This was the output of `eglinfo` in that situation:
$ eglinfo
[...]
Wayland platform:
EGL driver name: swrast
OpenGL core profile renderer: llvmpipe (LLVM 17.0.4, 256 bits)
However, after commit 8cd44b8843 ("egl: add automatic zink fallback
loading between hw and sw drivers") Zink support is tested before
falling back to swrast.
Since the system doesn't support 3D acceleration, Zink + software
rendering is used instead of swrast causing issues like the ones
described in #10146.
In this case, `eglinfo` prints:
$ eglinfo
[...]
Wayland platform:
EGL driver name: zink
OpenGL core profile renderer: zink Vulkan 1.3(llvmpipe (LLVM 17.0.4,
256 bits) (MESA_LLVMPIPE))
This patch ensures that Zink + software rendering is used only when the
user opts-in by setting `LIBGL_ALWAYS_SOFTWARE` or `D3D_ALWAYS_SOFTWARE`
and swrast is used otherwise.
After the patch, the output of `eglinfo` is identical to the one before
the regression:
$ eglinfo
[...]
Wayland platform:
EGL driver name: swrast
OpenGL core profile renderer: llvmpipe (LLVM 17.0.4, 256 bits)
Resolves: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10146
Fixes: 8cd44b8843 ("egl: add automatic zink fallback loading between
hw and sw drivers")
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: José Expósito <jexposit@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26220>
Sparse backing buffers were destroyed immediately after issuing the
unbind call, which was against the Vulkan spec which requires the
destroy call to not happen before the unbind semaphore was signaled.
To tackle this, keep a reference against buffers we are unbinding within
the batch. This will keep the backing buffer long enough to not cause
use-after-free. As described in comments, we don't need to reference
every backing page used in the batch, as the resource usually keeps
references to them until they are unbound, which is now correctly
handled.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26171>
Fix defect reported by Coverity Scan.
Evaluation order violation (EVALUATION_ORDER)
write_write_typo: In block_size_bits = block_size_bits = ((surf.u.gfx9.swizzle_mode >= ADDR_SW_256KB_Z_X) ? 18 : 16),
block_size_bits is written twice with the same value.
Fixes: 44eaf50a34 ("ac/surface/tests: cosmetic changes")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26160>
When doing ASTC decoding, the image has format VK_FORMAT_ASTC_*, the
internal plane 1 has format VK_FORMAT_R8G8B8A8_UNORM, and the view has
format VK_FORMAT_R8G8B8A8_UINT. It does not need the override for
compressed formats.
Fixes: f97b449e9e ("radv: integrate meta astc compute decoder to radv")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26106>
We were previously setting "evict first" everywhere, which could
possibly be a perf issue on machines that are also running shaders
compiled with codegen, which sets "evict normal" on everything.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26201>
The firmware reports no MALL cache being present, which is wrong. We
later depend on correct L3 cache size values for choosing the attribute
ring size, so fall back to manually computing the size.
Fixes: 355242f055 ("ac/gpu_info: adjust attribute ring size for gfx11")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26147>
When lowering @load_constant to @load_ubo, the bit size is currently
hard-coded to 32. This causes validation errors when lowering a constant
with a 64b bit size.
This patch fixes this by setting the @load_ubo bit size correctly for
64b constants. This 64b load is later lowered to a 32b load by
ir3_nir_lower_64b_intrinsics.
Fixes Piglit test:
- spec@arb_gpu_shader_fp64@execution@fs-indirect-temp-double-src
This patch has no impact on shader-db.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26191>
Shader Local Memory is what NVIDIA calls it in the shader header docs as
well as the command stream headers. Better to be consistent even if it
gets my Intel brain confused. (Intel uses SLM for shared memory.)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26197>
At this point, we're fully trusting NAK to do its own lowering and we
only lower stuff in nvk_shader.c if it's relevant for Vulkan. This also
assumes that NAK is already doing the right thing everywhere.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26197>
This reverts commit 578e10e157.
The only reason for reallocating surfaces as interlaced (on drivers
that supports both progressive and interlaced) was deinterlacing
with postproc filter, but that now also supports interleaved surfaces.
With this change interlaced surfaces are no longer used on radeonsi.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26174>
The code here handled stores of actual 3-byte values (8-bit, 3-component), but didn't
correctly handle stores of larger 8-bit vectors that were constrained by write mask to
just 3 bytes. In that case, the pad-to-vec4 step was unnecessary and problematic.
Seen in CL CTS test_basic vector_swizzle test group for char3 with CLOn12.
Fixes: c70d94a8 ("nir_lower_mem_access_bit_sizes: Support unaligned stores via a pair of atomics")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26034>
This gives FS I/O the same treatment as we did for vertex attributes in
that we now have a NIR intrinsic which pretty closely matches the
hardware and we lower to that before going into NAK. This gives us a
bit more control in the NIR.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26181>
Currently, shader part caching logic is duplicated between VS prolog and
PS/TCS epilogs. This commit introduces a common abstraction to
deduplicate the code.
Additionally, there are a few design decisions that diverts from the
current implementation:
1. A simple mutex is used instead of reader-writer lock. Prolog/epilog
constructions are serialized, removing the need to free duplicate
objects in case of a race.
2. A CS-local cache is used to quickly lookup an entry without holding a
lock. This eliminates locking in over 99% of cases.
3. A set is used to reduce number of allocations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26028>
For 0x07030002 chip id different names are returned on different
phones: Adreno730v3 or Adreno725v1. Settle on 725 to disambiguate
them.
The only difference from base 730 is that it has conditional
execution of compute shader at the start of every command buffer.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25888>
They check for null, alignment, excessive size, and address space wrapping. If any of the checks
fails, `Err(CL_INVALID_VALUE)` is returned.
The caller still has to uphold the other requirements of the `from_raw_parts` fns.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26157>
Raw pointers have bad ergonomics and by using them we opt out of a lot of Rust safety guarantees.
The closure we create modifies the memory behind `svm_ptr`. Make that clear to the compiler by
representing it as slice. `pattern` could also be represented by a slice but then we'd create
overly generic code not exploiting the guarantees given to us be the OpenCL spec.
Namely that there's only a few possible sizes - all of them a power of two - and that `svm_ptr` is
aligned to that size.
Thus, represent `pattern` as one struct per possible size and have the compiler generate optimized
code paths for filling the buffer with each of them. There's one unsafe operation less and the
remaining ones as well as the casts have been documented in detail.
Based on that additional checks of the provided `size` have been added. While it's unlikely that
any application will ever run into them, the old pointer arithmetic already silently relied on
these properties.
Furthermore, since raw pointers are neither `Send` or `Sync` but the Rust types we now use are the
closure can now implement `Send` and `Sync`. That's one step toward marking `EventSig` `Send`.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26157>
Otherwise ctx->VertexProgram._VaryingInputs might not be up to date.
We can't do this in update_program because this breaks vbo_save_playback_vertex_list_gallium:
const GLbitfield enabled = node->enabled_attribs[mode];
_mesa_set_varying_vp_inputs(ctx, enabled); <-- update _VaryingInputs
if (ctx->NewState)
_mesa_update_state(ctx); <-- calls update_program, reverting the
change made above
Fixes: c97961a855 ("mesa: fix 38% decrease in display list performance of Viewperf2020/NX8_StudioAA")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9441
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25956>
On GFX10.3, VRS rates need to be copied to the HTILE buffer but in some
situations, like for mips, it's not always possible to enable HTILE.
In this case, we can fallback to our internal HTILE buffer and tweak
the depth/stencil registers to use this HTILE buffer.
This fixes a bunch of VRS crashes on GFX10.3.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26025>
Patch fxes ESO shadow pass ground corruption on Arc A750. In the colour
pass where the rendering corruption first appears, the depth resource
was used as a "PS - Texture". Immediately afterwards there's a Barrier
where it goes from
VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL =>
VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL
immediately following that there's a Clear from vkCmdBeginRendering
which appears to be a HiZ clear. Things work when using AUX_USAGE_HIZ
but AUX_USAGE_HIZ_CCS_WT (XXX: and AUX_USAGE_HIZ_CCS?) doesn't work.
current thinking is this is related to 14015264727 where we had to add
HDC and DC flushes to CCS and MCS fast clears. Maybe HiZ clears with
CCS also have similar problems? The docs don't appear to indicate that
but the docs were also wrong for color clears until recently...
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9277
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9444
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22717>
Now that NAK is the default for Turing+, we can just chalk any storage
image descriptor handle overruns up to codegen bugs. We could plumb
shader stages all the way through to here and only assert when codegen
is in use but that's a lot of work just for an assert.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
We weren't handling Zero. Also, we need to mask immediates or else the
encoder blows up. The hardware automatically masks them when they come
in as sources but when we get immediates, they're not guaranteed to fit
in the bitfield.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Only consider copies with the right file. Otherwise, the spill chooser
will blow up looking for the next use. Also, we were marking
destinations as spilled when I ment to mark them as being in W. This
was a right mess. God thing to_cssa() was also broken so it wasn't
inserting nearly as many parallel copies as it was supposed to. 🙃
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
In the presence of nested loops, there is no guarantee that we can
handle back-edges in a single pass. Instead, use a bitset as a worlist
and repair until we're out of missing sources.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
It was converting instructions to/from Op which throws away predicates.
This meant that any instruction immediately after an OUT.EMIT would
loose its predicate (if any). If an OUT.EMIT comes right before a BRA,
this results in the branch always getting taken.
While reworking things, I also totally reformatted the pass to make it
more readable IMO.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
because
* ptxas refuses inputs with .cluster before sm90
* on sm90 ptxas encodes .cluster as .gpu
* nvdisasm calls this encoding on sm75 .SM
so I don't think we have an actual .cluster on any released gpus
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This adds 4 NIR intrinsics for attribute I/O which match the Turing
hardware instructions as well as a lowering pass to lower
load/store[_per_vertex]_input/output to thewe intrinsics. This greatly
simplifies nak_to_nir.rs. Also, this pass is able to handle a bunch of
cases that the current code in nak_from_nir.rs can't:
- Misaligned access (i.e., a vec3 load at 0x0f4)
- Write masks on store[_per_vertex]_output
- Indirect load/store where we need to use AL2P to get physical
addresses, including scalarizing those cases
It also handles the casses where we need to use ISBERD on vertex indices
in the same pass. When we switch to this, we'll rip out the dedicated
per_vertex lowering pass.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
We could lower this to load_per_vertex_output in NIR but then it
confuses all sorts of NIR passes which assume load_*output only
happens in control shaders. We could also add a magic NIR intrinsic
but it's probably easier to just special-case this one.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This commits implement load/store_per_vertex* and load_output which are
required for tessellation shaders. Because things are getting a bit
complicated, it's easier to combine all the attribute load/store ops in
a single case in the match.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
NVIDIA hardware doesn't take a vertex index for per-vertex I/O.
Instead, it takes an offset into the primitive. This has to be fetched
using a combination of SR_INVOCATION_INFO and the ISBERD instruction.
To keep things simple and allow for maximum CSE, we do the lowering in
NIR and patch the load/store_per_vertex_input/output intrinsic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
We also need to call nir_lower_indirect_derefs. Even though we're not
lowering anything away, we need to call it so that it will lower clip
distance arrays. The pass always lowers compact array derfs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Rookie mistake... The liveness algorithm propagates information from
later blocks to earlier blocks so if you run bottom-up it's exactly two
passes when there aren't loops. If you run top-down, it's quadratic in
the number of blocks.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This adds mark_attrs_(read|written)() helpers to VtgIoInfo which take a
range of attribute addresses and mark the range as-needed. It adds a
similar mark_attr_read() helper to FragmentIoInfo which only marks a
single address and takes a PixelImap. This gets us down to only needing
to duplicate the address range if ladder twice. For VTG I/O, having it
take ranges will be more ergonamic when it comes time to handle non-
constant I/O offsets.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Put the stuff in nak_shader_info in nak.h into a union so we can have
other stage info exported as well. Then plumb local_size and smem_size
through ShaderInfo like we do for a bunch of the FS things. While we're
shuffling things around, pull nak_shader_info::cs into a union.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This time we map them to a consistent output address space like we do
for all other I/O and system values and do the remap in nak_from_nir.
This lets us know very precise usage information and more robustly build
the OMask in the shader header. We also handle location_frac now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
That one's a bit magic because it exists to ensure that each output GPR
ends up in the right register at the end of the shader. We tried to
handle it as a simple lowering before but it's easier and safer to
handle it directly in RA and liveness.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
We re-order and re-arrange the whole thing by instruction type. Also,
instead of returning an Option<u32>, have a has_fixed_latench() method
to check the instruction and then get_dst_latench() to get the latency
from instruction launch to the given destination index being available.
This lets us handle predicates properly which have a different number of
cycles for some reason. Oh, it's now just as correct as the estimates
in nv50_ir_target_gm107.cpp.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This is mostly a refactor but it does tuck in a three functional
changes:
1. We no lonter insert read barriers for sources that we immediately
overwrite in the instruction.
2. We track read and write barriers separately so we don't wait for
read-after-read hazards but do for write-after-read.
3. Wait on all barriers in branch instructions
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Because of its crazy behavior around overflow, we don't want the full
IADD3 opcode to support any sort of source modifier propagation. This
makes us a new OpIAdd3X opcode which contains all the crazy and lets
IAdd3 remain the usual 32-bit integer thing everyone knows and loves.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Instead of just taking live-in \ W, consider anything previously spilled
to be spilled. This lets us avoid a bunch of redundant spills because
we now allow spills to persist across blocks even if the value is in W.
In the loop header case, however, we still need to add in live-in \ W or
else we can end up in cases where a value is neither in W nor S.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Conventional SSA (also called CSSA) requires phi nodes be isolated by
parallel copies such that there is no interference between SSA values.
This is required for many out-of-SSA algorithms and, in our case, a
prerequisite for spilling.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
For spilling, we want to be able to treat TLS as if it were a register
file. It unifies and makes everything easier. Also add support to
OpCopy to for copying between GPR and Mem. We cannot, however copy
directly from Mem to Mem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Instead of using OpMov or OpParCopy, use OpCopy directly. This reserves
OpParCopy for RA type things and OpMOv for actual codegen. We can also
drop OpMov from copy propagation now.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This increases the size of RegRef to 32 bits but that's fine given that
it's usually in a union with SSARef which is 128 bits. The real
advantage is that it allows us to start treating memory as a register
type which will come in really useful for spilling.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This gets rid of most of the per-block hash maps and replaces them all
with vectors. This is both simpler and should be quite a bit more
efficient (though block lookup isn't common).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This renames CFG to CFG2 and moves to storing the blocks in a CFG
instead of a Vec. This should let us make a bunch of other data
structures drop to a vec instead of a hash map now that we can rely on
the CFG instead of BasicBlock::id.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This is a pair of vectors but it acts a bit like a vector of a pair.
Everything is inserted, removed, iterated, etc. in tandem to ensure that
we never mismatch between the two vectors. However, because they're two
separate vectors, we ensure that it can be used to store Src and Dst and
keep everything contiguous.
Of particular interest is the new retain() method which is equivalent to
Vec::retain() which allows filtering a VecPair. This tricky operation
is performed three different times by DCE and will be needed in spilling
as well.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
This one is much faster to compute because we can use bitops for the
fixed-point data-flow algorithm rather than the clumsy walking of hash
sets. The faster version is sufficient for RA and checking for register
pressure. We only need to fall back to the slower version for spilling.
Thanks to traits, we can get some of the same behavior with both.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
One could argue that this should go in the function and not be an
on-demand analysis pass but we can do that later. For now, we just
break it out into a separate data structure.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Instead of tracking pinned things in the register allocator, we split
the register allocator into RegAllocator and PinnedRegAllocator. The
RegAllocator struct only allows for very simple single-SSA allocations
and frees. It tracks locations of everything, what's used, etc. but
otherwise knows nothing about pinning or vectors.
The new PinnedRegAllocator struct wraps a RegAllocator by taking a
mutable reference to it. It provides support for pinning and all the
vector stuff. To destroy a PinnedRegAllocator, finish() is called which
re-places any evicted SSA values and populates an OpParCopy with any
needed copies. Because PinnedRegAllocator owns a mutable reference to
the RegAllocator, it's impossible to mix uses of PinnedRegAllocator and
RegAllocator. This ensures that, for as long as the pinned version
exists, nothing can be allocated which migh escape the pinning.
This fixes a bunch of corner cases when register pressure gets tight.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Since the top value is always the default register, we don't need to
worry about overflowing a u8. What we do need to worry about is
register files with zero registers which is a thing pre-Turing for
uniform register files. Use num instead of max so we don't end up
subtracting 1 from 0.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
While coping the entire RA struct from the predecessor is probably
faster, it may contain values which are not live in the current block.
We could purge those but the iteration gets tricky with Rust. It's far
clearer and more rust-friendly to deal with it as an initialization
problem.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Sometimes it's useful to have something that's per-register-file like we
do for register allocation. Since this is generally useful, add a
struct for it. In future, it might be neat to pull in the enum_map
crate which basically does this generically.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
If we walk the blocks forwards, we end up having a minimum O(n^2)
algorithm because everything has to propagate bottom to top. Looping
over the blocks bacwards ensures that all the liveness information is
propgated in the first pass in the absence of back edges.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
All passes allocate a Vec per Instr during map(). This is wasteful,
because most instances of map() produce a single instruction (by
mapping one instruction to another instruction) or no instructions at
all.
In such cases, they return an empty Vec, or a Vec with a single entry.
Rework the signatures so that a Vec is only when mapping one instruction
into many.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Instead of I/D/FMov instructions, just use [DF]Add instead. For ineg, we
add a new INeg instruction which we can lower to IADD3 later. The
reason for this is that IAdd3 is complicated and makes detecting an ineg
rather annoying. Also, if we ever bring NAK up on older hardware, not
all hardware has IAdd3 and INeg will be lowerable everywhere.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
It is possible for a single scalar to show up in any number of vector
components, including twice in the same vector. Add a bit to the
legalization pass to deal with this possibility.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Previously, we handled vectors by giving each SSAValue a number of
components which we assume matches in all uses. To deal with swizzles
and component selection, we had OpVec and OpSplit instructions to
convert vectors to/from scalars as-needed. This is fine as an SSA
representation but it leads to a lot of redundant values when it comes
time for assigning registers. There are strategies for dealing with
this such as ensuring that splits always kill the whole vector and then
re-combining into a new vector for later uses. It's possible by doing
this to ensure that each component only ever exists exactly once at the
cost of a LOT of vec/split instructions.
Another possible solution is to naievely emit vec/split but teach
liveness analysis and RA bout the duplicated values. Instead of RA
working on individual SSA values, it can work on equivalence classes of
components.
This takes a different (and currently novel to Mesa) approach of making
each SSAValue a single component but having an SSARef type which can
reference up to 4 SSAValues as a vector. Register allocation then works
on individual components and only ensures that the components of a
vector are contiguous when it's used as a vector. This isn't very
different from how it worked before. If anything, it's a bit more
straightforward now because the component/vector split uses the same
types as the rest of the IR and the SSAComp is gone.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
The previous signature for Instr::new() was too restrictive in that it
would create a new Instr from Op, but not from T: Into<Op>.
Fix that by introducing a generic version instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
In order to facilitate this, we add a little helper in C which uses
nir_ssa_scalar chasing to find an iadd of a thing and an immediate.
This should be reliable as long as we're not lowering iadd64.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Instead of having Src::src_ref and Dst share a type, make them separate
enums. We're not really getting any benefit from them being the same
enum type anymore and this means we avoid things like immediates in
destinations. We can't make the type system do all our work for us but
this seems tractable.
While we're reworking things we also implement From<> for stuff instead
of having quite as many new_*() constructor variants.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Just put modifiers on every source. It's too complicated to try and
make the type system work for us here. When the time comes that we
write a validator, we'll just validate them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Rework our helpers to take Src and Dst and handle zeros inside the
helpers. This is better because, at emit time, we actually have the
information about which register file to use.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Use a struct to store the temporary data and the SM number and over-all
reduce the amount of stuff we have to pass around. Also, rename
everything from tu102 to sm75 which is a more apt description.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Currently, all we do here is implement div_ceil() and next_multiple_of()
because these helpers are super useful but currently an experimental
Rust feature. This should be a drop-in for the feature so we can delete
it once the feature stabilizes with little to no effect.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
The nifty thing about the way NVIDIA hardware does inputs and outputs is
that it maps really well to how we do them in NIR. We can make the
driver_locations match exactly to the attribute address space. For
fragment shader outputs, we make them register numbers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24998>
Some fields in schedule_node will need to be reset each time they are
used. The `cand_generation` needs to be back to zero, and both
`unblocked_time` and `parent_count` need to be back to their initial
values, which were pre-calculated.
Rename the initial data fields and add new ones for the temporary data.
Note the helper function is `per node` to allow it "tag along" with an
existing loops.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
Pull run() and schedule_instructions() for fs, and pull a very
simplified version of those into a run() for vec4. Because of the
previous patches the duplication is small.
Since we are touching these, change run() implementations to use the
cfg from the existing reference to the visitor/shader instead of taking
one as argument.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25841>
set_foreach_remove assumes no entries have been removed. That assumption
only holds if no errors occur, since pipeline cache objects can get
removed if an error occurs during deserialization.
This fixes
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic
crashing on RADV.
Cc: mesa-stable
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26164>
Commit 50c29e1ffa ("anv: simplify buffer address+size loads from descriptor buffer")
is making use of AuxiliarySurfaceBaseAddress field to store buffer
lenght as it was not used but a LNL workaround will make use of it
so we need to bring back this non optimized version of
build_load_render_surface_state_address().
There is some conflicts so a simple revert do not works.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26152>
Enable intra-fresh in vcn encoders and support avc/hevc/av1 codecs.
Just if B frames is enabled or the number of temporal layers is
larger than 1, intra-refresh will be disabled, because it doesn't
support intra-refresh on B frames, and on sub-temporal layers.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26133>
This is a prepration for adding intra-refresh
in vcn encoders. Intra-refresh is a feature for
smoothing out fluctuation in bitrate by replacing
a whole intra frame by several intra strips distributed
in several continous frames, it is also used in
suppressing error propagation situation.
Reviewed-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26133>
The main change is to use struct radv_vs_prolog_key directly instead of
the compressed representation to simplify an upcoming rework in prolog /
epilog caching. In doing so the state struct pointer was replaced with
an inline struct.
Care was also taken to pre-mask all the states with the active attribute
mask and other masks when it makes sense; this ensures that we don't
accidentally use information not hashed into the key during compilation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023>
This was broken as the field was never assigned to. This will also be
dropped from the upcoming prolog/epilog lookup rework, as it adds to
code complexity while the benefit of saving one hash table memory access
seems questionable.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26023>
On Steam Deck, shaders are pre-compiled for better performance (less
stuttering, less CPU usage, etc). But when a compiler fix needs to be
backported, there is currently no way to handle this properly.
This introduces 3 drirc options
radv_override_{graphics,compute,ray_tracing}_shader_version in order to
force the driver to re-compile pipelines when needed. By default, the
shader version is 0 for all pipelines.
When one drirc is set for a specific game, RADV will re-compile all
pipelines only once with the compiler fix included (because the
pipeline key would be different).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26094>
With DGC, push constants can be set from the cmdbuf (CmdPushConstants())
or from the indirect layout. Instead of always emitting inlined push
constants from the DGC shader, just update the ones that come from the
indirect layout and rely on cmdbuf updates for the other ones.
With that, it should be possible to preprocess push constants with
graphics when all can be inlined in shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25935>
pipeline_flags was 64-bit yet only the first 4 bytes were hashed.
Luckily, the mask included no flag above the 32nd bit, so this was
technically working fine. Still, it's better to use explicit sizeof
constructs to be more resilient to accidental type changes.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26145>
spec@arb_shader_image_load_store@coherency will write to coherent
image in tess shader and read it in fragmant shader. There is a
geometry shader in between.
When lower ngg for the geometry shader, it will wait memory writes
before pos0 export if there's no param output to prevent fragment
shader start early and read any previous memory writes.
We need to update the memory writes info of GS with ES ones because
ES and GS is merged into one shader but when nir they are separated.
LLVM does not have this problem because it will add memory write
wait at the beginning of GS automatically.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26122>
The allocator passed to VkDevice won't be available once it is destroyed
and thefore it cannot be used to allocate `object_name` for instance
level objects such as `VkInstance` or `VkPhysicalDevice` or else there
would be no way of deallocating it when those objects are destroyed.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26085>
The current name doesn't cover all the tex related instructions and
in all usages, we already have a switch statement to dispatch
per instruction type, so is more natural to list the instructions we
care there.
In fs::is_send_from_grf() we can simply ignore them since the
instructions are either lowered directly to SEND (Gfx7+) or use
MRF (Gfx6-).
With this change, the fs_inst::size_read() generated code gets
simplified (the "tex" entries get added to the switch jump table
in gcc) and the default case loses the conditional handling tex.
This reduces shader compilation time, as illustrated by replaying
fossils (tested on my TGL laptop):
```
// Rise of the Tomb Raider (N=13)
Difference at 95.0% confidence
-1.32231 +/- 0.0170138
-4.37605% +/- 0.0563054%
(Student's t, pooled s = 0.0210159)
// Cyberpunk 2077 (N=7)
Difference at 95.0% confidence
-3.64 +/- 0.114993
-2.95188% +/- 0.0932544%
(Student's t, pooled s = 0.09873)
```
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25721>
v3d driver will implicitly clear the buffer's content on the first write
operation. This clearing operation is helpful for allocated buffers,
initializing them with zeros instead of having memory garbage.
Also, this avoids reading the buffer from the RAM to the GPU cache
before rendering, making the first write operation slightly faster.
The clearing operation should not happen for imported buffers where
the buffer may already contain valuable data and the user may want
to render into the buffer only partially.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26136>
Android.mk rules for radeonsi are updated according to commit
0a56417 "meson: be able to build radeonsi without llvm"
cflag -DFORCE_BUILD_AMDGPU is required when building radeonsi with llvm support
based on android-x86 downstream LLVM fork that follows the AOSP llvm build rules.
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26049>
We have an optimization to try to swap regular live intervals with
killed sources when evicting them fails in order to make a contiguous
space for the destination to fit in, but this doesn't work when the
destination is early-clobber.
Fixes
dEQP-GLES31.functional.synchronization.inter_invocation.image_atomic_read_write
on a650+.
Fixes: d4b5d2a ("ir3/ra: Use killed sources in register eviction")
Closes: #8886
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26004>
We previously used the 2d path for single-sampled copies which seems to
canonicalize NaNs when the source format is a 16-bit floating point
format, likely because it implicitly converts to 32 bits. The current 3d
path also implicitly converts and has the same problem. Add a new shader
variant for half-float copies and switch to using it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26042>
With the removal of DRM_IOCTL_XE_MMIO xe_gem_read_render_timestamp()
was always returning false but with DRM_XE_DEVICE_QUERY_ENGINE_CYCLES
it can be re implemented making use of
xe_gem_read_correlate_cpu_gpu_timestamp().
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24591>
Extended math instructions are now synchronized as in-order
instructions like other ALU operations, which is more efficient than
the out-of-order tracking we had to do in previous generations, and
avoids false dependencies introduced due to SBID aliasing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
Xe2 hardware has a "long" EU pipeline specifically for FP64
instructions, so these are handled as in-order instructions which
require RegDist synchronization. 64-bit integer instructions are now
handled by the normal integer pipeline, so the existing special-casing
inherited from ATS needs to be disabled.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
This implements the extended 10-bit encoding of the software
scoreboard information used by Xe2 platforms. The new encoding is
different enough that there are few opportunities for sharing code
during translation to machine code, but the high-level tgl_swsb
representation remains roughly the same.
Among other changes the 10-bit SWSB format provides 5 bits worth of
SBID tokens (though they're only usable in large GRF mode) instead of
4 bits, the extended math pipeline is handled as an in-order (RegDist)
pipeline instead of as an out-of-order one, and the dual-argument
encodings support additional combinations of RegDist and SBID
synchronization modes. A new encoding is introduced for preventing
the accumulator hardware scoreboard from being updated, but this is
currently not needed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
The workaround applies specifically to Cube and Cube Arrays, so we can
still apply the optimization for the others.
Ideally we would like to pull opt_zero_samples logic into the lowering
sends -- to avoid adding a bit to communicate between passes. However
the texture coordinates for the LOGICAL backend instructions, which
are a common target for the optimization, are combined into offsets over
a single VGRF, so we can't easily identify the constant cases. The
copy-prop pass make this more visible for opt_zero_samples.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
Inadvertently, because of a sequence of changes elsewhere, this pass
ended up not having any effect:
- Before Gfx5 the optimization is not applicable.
- On Gfx5-6 it doesn't apply because it sampler operations don't
currently use LOAD_PAYLOAD, but write the MOVs directly. Not clear to
me whether they ever did.
- On Gfx7+ it doesn't apply anymore because now the logical sampler
operations are now lowered directly to SENDs, and the is_tex() check
would skip them.
Since the LOAD_PAYLOAD implementation applies for Gfx7+ only, rework the
pass to work again by handling SEND instructions. To make the pass
easier, the optimization will happen before opt_split_sends() so only
one LOAD_PAYLOAD needs to be cared for.
Update the code to accept BAD_FILE sources in addition to zeros, these
are added in some cases as padding and effectively are don't care
values, so we can assume them zeros.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
This is a preparation to (re-)enable opt_zero_samples(), which will reduce
a SEND mlen before we split it. When that happen, opt_split_sends()
won't be able to rely on the fact that mlen covers the entire
LOAD_PAYLOAD.
Since we are changing that, take the opportunity to also not modify the
existing LOAD_PAYLOAD, just create two new ones with the exact set of
sources. This allows the pass to be further simplified by iterating
forward and not require live_variables analysis.
The helper function was added so can be used later for
opt_zero_samples().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
This app is generating viewports with scale[0]==0, so that is not a good
condition for testing viewport validity. It would result in skipping
the only viewport, and ending up with gb x/y being ~0. Triggering an
assert in the register builder.
The main reason this was done previously was to avoid an assert in
fd_calc_guardband(). Lets just flip it around and return 0x1ff on
errors instead of asserting. This also makes it more consistent with
the other error cases.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7628
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26086>
util_fast_urem32 is used in the hot path of hashmap lookups and this
asserts causes noticeable overhead. The correctness of this code should
be well exercised both from testing and mathematical proofs, so gate
this assertion behind #ifdef DEBUG.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14168>
These are needed by RADV to enable mesh/task shader queries.
My last attempt was broken, for obscur reasons I used invalid hashes
and the dEQP build script didn't reject them. Hopefully now it should
fail if a hash is invalid.
The dEQP list changes introduced even more failures with some drivers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26079>
We already print all the detected target jobs from regex and its
dependencies. But for more complex regexes the list can be cumbersome,
and an aggregate list of dependencies and targets can be more value, so
add these prints as well.
This is what looks like:
```
Running 10 dependency jobs:
alpine/x86_64_lava_ssh_client, clang-format, debian-arm64,
debian-testing, debian/arm64_build, debian/x86_64_build,
debian/x86_64_build-base, kernel+rootfs_arm64, kernel+rootfs_x86_64,
rustfmt
Running 15 target jobs:
a618_gl 1/4, a660_gl 1/2, intel-tgl-skqp, iris-amly-egl, iris-apl-deqp
1/3, iris-cml-deqp 1/4, iris-glk-deqp 1/2, iris-kbl-deqp 1/3,
lima-mali450-deqp:arm64, lima-mali450-piglit:arm64 1/2,
panfrost-g52-gl:arm64 1/3, panfrost-g72-gl:arm64 1/3,
panfrost-t720-gles2:arm64, panfrost-t860-egl:arm64, zink-anv-tgl
```
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25940>
Modify the GraphQL query used to fetch all jobs within a pipeline,
transitioning from fetching data via stage nodes to a direct job
retrieval approach.
The prior method was not paginated, potentially overloading the server
and complicating result parsing due to the structure of stage nodes. The
new approach simplifies data interpretation and handles job lists
exceeding 100 elements by implementing pagination with helper functions
to concatenate paginated results.
- Transitioned from extracting jobs from stage nodes to a direct query
for all jobs in the pipeline, improving data readability and server
performance.
- With the enhanced data clarity from the updated query, removed the
Dag+JobMetadata tuple as it's now redundant. The refined query
provides a more comprehensive job data including job name, stage, and
dependencies.
- The previous graph query relied on a graph node that will (or should)
be paginated anyway.
Closes: #10050
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25940>
Make query support pagination by supplying the paginated key.
In the following toy example, the paginated key is:
["levels", "cars"]
```graphql
query vehicle_store($location: ID!) {
levels {
cars {
pageInfo {
hasNextPage
endCursor
}
...
}
}
}
```
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25940>
For some reason AIOHTTPTransport started to use MultiDict after doing
some adjustments in the GraphQL query, which made `filecache` fail
because MultiDict object are not pickle-able.
Changing the transport strategy from AIOHTTPTransport to
RequestsHTTPTransport, which dropped one requirement. We aren't doing
async anyway, all the calls were sync before.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25940>
Next patches will make of intel_device_info_pat_entry parameters other
than index, so here adding a function to return it.
While at it also renaming and adjusting parameter of
iris_pat_index_for_bo_flags() to match other functions in
iris_bufmgr.c/h.
No changes in behavior expected here.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>
When anv_device_map_bo() is called from anv_device_alloc_bo() it gets
VkMemoryPropertyFlags set to 0 so it ends up with a write-combine
caching for integrated platforms with LLC, see 'if (!(property_flags &
VK_MEMORY_PROPERTY_HOST_CACHED_BIT)))'.
Current approach also has issues when mapping with anv_MapMemory2KHR()
as it would not have information to know that BO is a scanout.
It was also not properly calculating mmap mode for platforms with PAT
uAPI before "anv: Change default PAT entry to WC".
So here storing alloc_flags to anv_bo so there is no mismatches
between different code paths then using it to properly
calculate the mmap mode.
alloc_flags in anv_bo will also be used to calculate PAT index in
future patches.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>
i915 mmap_calc_flags() is calculating WC caching for all MTL memory
types.
It will be fixed in the next patch but doing so causes tests to
fail due to incoherency in BOs not allocated with
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT.
So here switching the default/non-coherent BO allocation to a WC
PAT entry.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>
Integrated GPUs almost always works with write-back caching(only
scanout and external bos works in write-combine) but in platforms
without LLC the coherency is broken if not explict asked to KMD.
vkFlushMappedMemoryRanges and vkInvalidateMappedMemoryRanges()
don't do any flushing or invalidate for memory allocated with
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT.
So if an application asked for a memory coherent, the
ANV_BO_ALLOC_SNOOPED flag needs to be set in alloc_flags and that
will be passed to KMD backends to properly ask to KMD for coherent
buffer.
The other chunk here removes the assert(alloc_flags & ANV_BO_ALLOC_MAPPED),
that is needed otherwise application can't ask for a coherent and
mapped memory.
Tried to find a reason for that assert in git history but did not
found what was the reasoning of this assert.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>
mmap mode information will be used to properly calculate the mmap flags
in the i915 mmap uAPI and also will be used for BO creation when the
PAT uAPI lands in Xe KMD.
Xe KMD will also require the coherency mode during the BO creation.
So to avoid information duplication, adding this information to
intel_device_info platform entries.
No changes in behavior here.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>
Normally, we insert barriers automatically based on bind points. If
a resource is bound as both a shader image/SSBO (UAV) and other read-only
bind points, we'll choose the UAV state. This behavior is modified by
the memory barrier API - if the memory barrier indicates that a read
is desired, previously, we would've just not put any barriers to UAV
states. Now, we just use that state to disambiguate between read states
vs write states for a resource that's bound as both.
This turns out to be problematic in some circumstances, e.g.:
* PBO download, which does a memory barrier afterwards, since it uses
shader images to write, and those might be accessed immediately as
a texture afterwards.
* A user-issued draw that writes to a SSBO. This should indicate in the
state tracker that the SSBO is in the UAV state.
* Some other op, like a copy, that writes to the SSBO.
Before, the "pending memory barrier" state would cause the user draw to
not be put in the UAV state, which means that the copy wouldn't issue a
barrier *out* of the UAV state, resulting in the copy being unsynchronized.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26104>
From:
if (sel->info.uses_grid_size) {
if (sctx->screen->info.has_set_sh_pairs_packed) ...
... else ...
}
if (sel->info.uses_variable_block_size) {
if (sctx->screen->info.has_set_sh_pairs_packed) ...
... else ...
}
if (sel->info.base.cs.user_data_components_amd) {
if (sctx->screen->info.has_set_sh_pairs_packed) ...
... else ...
}
To:
if (sctx->screen->info.has_set_sh_pairs_packed) {
if (sel->info.uses_grid_size) ...
if (sel->info.uses_variable_block_size) ...
if (sel->info.base.cs.user_data_components_amd) ...
} else {
if (sel->info.uses_grid_size) ...
if (sel->info.uses_variable_block_size) ...
if (sel->info.base.cs.user_data_components_amd) ...
}
si_cp_copy_data is moved to the beginning because it's shared.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26095>
First, the following universal helpers are defined:
- radeon_set_reg_seq
- radeon_set_reg
- radeon_opt_set_reg
- radeon_opt_set_reg2
- radeon_opt_set_reg3
- radeon_opt_set_reg4
- radeon_opt_set_reg5
- radeon_opt_set_regn
- gfx11_push_sh_reg
- gfx11_opt_push_sh_reg
Then the config, context, sh, uconfig, push_gfx and push_compute helpers
are implemented calling the above.
A lot of macros were receiving sctx via a parameter, which is changed to
use sctx directly in the macro (and the parameter is renamed to "_unused").
The only functional change is that the perfctr registers that incorrectly
set the predicate bit now correctly set the RESET_FILTER_CAM bit.
The helpers no longer check info.uses_kernel_cu_mask.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26095>
This unblocks Xwayland with zink-on-venus + sommelier wayland proxy.
- For glamor, zink uses linear modifier.
- For Virgl clients, classic 3d resource is used and sommelier fixes
the modifier and stride infos no matter wl-drm or dma-buf protocol.
- For Venus clients:
- via the legacy wl-drm protocol, invalid modifier is passed via
sommelier, and host recovers the tiling in the way dealing with
modifier unaware clients (e.g. I915_GEM_GET_TILING). For hosts
unable to recover, they assume linear and venus forces linear on
legacy path.
- via the new zwp_linux_dmabuf_feedback_v1 (version 3/4) protocol,
explicit modifier is used, and zink handles that without issues.
This doesn't deserve a driconfig as zink-on-venus to support xserver
itself already requires special enough integration beyond a config.
Reported-by: Igor Torrente <igor.torrente@collabora.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10066
Fixes: 1c3db3e39a ("zink: blow up broken xservers more reliably")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Ryan Neph <ryanneph@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26082>
`libvdrm` is unconditionally linked to `libvulkan_freedreno` which isn't available without the `virtio` subdirectory being included. It has been gated behind the `virtio` KMD to prevent linking errors in other cases as it's not necessary.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26100>
The unconditional inclusion of the `virtio` subdirectory introduced by !24733 causes a full dependency on DRM, it breaks any systems that have `system_has_kms_drm = false`, including Turnip with just the KGSL KMD. A temporary solution to this is gating the inclusion when `system_has_kms_drm` isn't set but this should be replaced with more specific gating in the future.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26100>
We need update kernel often. We need test kernel changes often.
Introduced `KERNEL_EXTERNAL_TAG` to differ between `KERNEL_TAG` which is
also used to rebuild the containers. We don't need rebuild containers
for the external kernel, so this way we don't have to.
Updating kernel goes wruuuuuum.
Signed-off-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23563>
The lifetime of Mesa's vk_pipeline_layout may exceed that of the
VkPipelineLayout object as other objects on the device may hold references
to it. In other places in vk_pipeline_layout and vk_descriptor_set_layout,
the device allocation scope is used with this pattern, but there was an
inconsistency in vk_pipeline_layout_zalloc, which is fixed by this commit.
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24794>
When we skip the submit when there is no GPU work queued we must not
update the cmdstream timestamp with the fence from the submit request
as it will never be filled in by the kernel, effectively replacing
the cmdstream timestamp with 0. This causes following fence waits
to fail.
Fixes: 148658638e7f ("etnaviv: drm: Be able to mark end of context init")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26078>
Rather than hardcoding u0_u1, this lets drivers map texture handles in whatever
way is convenient. In particular, this makes textures work properly with merged
shader stages (provided one of the stages is forced to use bindless access), by
giving each stage an independent texture heap.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>
We can't move discards across side effects, or the side effect might not happen.
Fixes KHR-GLES31.core.shader_image_load_store.basic-allFormats-load-fs
regression. Sigh.
CI is up next.
Fixes: 119e5b9719 ("agx: Schedule for register pressure")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>
nir_index_ssa_defs(..) will re-index the function impl and will
update ssa_alloc. In almost all cases this will result in a lower
ctx->alloc number which reduces memory usage in compiler passes
that are using ctx-alloc to allocate memory.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26056>
Cross vendor is bogus because of modifier screw ups. We could allow it on
all devices from the same vendor, but for that we have to check if the
supported modifiers match.
Also verify the device in questions actually supports gl_sharing.
Fixes: 57dfc013a6 ("rusticl: Add functions to create CL ctxs from GL, and also to query them")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Antonio Gomes <antoniospg100@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26051>
Because a LDS load is 2 separate loads on gfx10+ with wave64, a different wave
can write LDS in between and cause a non uniform result. Use v_readfirst_lane
instead of p_as_uniform because it cannot be copy propagated.
Fixes a OpenCL CTS test with zink+rusticl.
Totals from 136 (0.17% of 78196) affected shaders:
MaxWaves: 3236 -> 3244 (+0.25%)
Instrs: 130069 -> 131221 (+0.89%)
CodeSize: 698048 -> 703436 (+0.77%)
VGPRs: 5464 -> 5440 (-0.44%)
SpillSGPRs: 94 -> 96 (+2.13%)
Latency: 5361017 -> 5363781 (+0.05%); split: -0.00%, +0.05%
InvThroughput: 883010 -> 884100 (+0.12%)
SClause: 3822 -> 3821 (-0.03%); split: -0.05%, +0.03%
Copies: 14220 -> 14314 (+0.66%); split: -0.01%, +0.68%
Branches: 4549 -> 4551 (+0.04%)
PreSGPRs: 4934 -> 4940 (+0.12%)
PreVGPRs: 4666 -> 4655 (-0.24%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25973>
This reverts commit fac60c140b.
Turns out there are pipelines that do not create any other jobs, and
Marge requires a pipeline to pass, which means a pipeline needs to
exist, which means a job needs to always exist.
There is no reason `sanity` would be the one, but it's there so let's
just use it instead of making another one.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26080>
For hypervisors that do not support host memory injection into the
guest address space, it's necessary to have guest-based blob alloc.
Therefore, use a new 'use_guest_vram' virgl capset to decide on
performing guest blob allocations from dedicated heap (Host visible
memory).
Signed-off-by: Andrew D. Gazizov <andrew.gazizov@opensynergy.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25889>
Before this, the render pass code or the driver combined the pipeline
create flags and the implicit flags from the render pass, but the
pipeline create flags will need to be sanitized when they are dynamic
state, so we need to do it in vk_graphics_state where we know that
information.
We also weren't combining pipeline flags correctly when linking, which
on turnip was being hidden by the lack of sanitizing for driver-provided
flags. We can't combine them correctly if they're part of the render
pass state, so they need to be pulled out into the overall pipeline
state.
For drivers using emulated renderpasses or tracking feedback loop
information themselves, this won't make a difference, but we have to
adapt turnip to not pass pipeline flags. This also means that we can
drop all handling of feedback_loop_input_only in turnip and just set it
in the runtime.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25436>
We directly look up in the format arrays here to save indirections. But
do you know what's even faster than a single indirection? Assigning a
compile-time constant!
So let's use the newly available pack-helper to pack directly what we
want here.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25968>
This makes the config file easier to read, and also shaves 5-10 seconds
of deqp-run per job, but that's cancelled out by all the random network
delays of everything else in the job so we'll never actually notice the
difference.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26039>
If the break in the original loop isn't in the first top-level if,
this would have re-inserted it in the wrong block.
Fixes this by re-inserting the break block to the corresponding break
block in the new loop by using the remap hashtable.
fossils-db (NAVI21):
Totals from 88 (0.11% of 79330) affected shaders:
Instrs: 109602 -> 109929 (+0.30%); split: -0.10%, +0.40%
CodeSize: 570968 -> 573332 (+0.41%); split: -0.08%, +0.49%
Latency: 1682510 -> 1682505 (-0.00%); split: -0.01%, +0.01%
Copies: 12832 -> 12746 (-0.67%); split: -1.54%, +0.87%
Branches: 2879 -> 2930 (+1.77%)
Deathloop and F1 2023 are affected but I'm not aware of any issues
for these two games.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10001
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26009>
Android libbacktrace is not available in Android 14
Fixes the following build error:
FAILED: src/util/libmesa_util.a.p/u_debug_stack_android.cpp.o
...
../src/util/u_debug_stack_android.cpp:28:10: fatal error: 'backtrace/Backtrace.h' file not found
^~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Cc: mesa-stable
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25963>
Android 14 uses prebuild clang version 17.0.2
By filtering these cflags there are no building errors on previous Android releases.
Fixes the following building errors:
../src/c11/time.h:54:8: error: redefinition of 'timespec'
struct timespec
^
/media/bigblissdrive/u-x86/out/soong/.intermediates/bionic/libc/libc/android_vendor.34_x86_x86_64_shared/gen/include/bits/timespec.h:46:8: note: previous definition is here
struct timespec {
^
1 error generated.
In file included from ../src/util/disk_cache.c:50:
../src/util/disk_cache.h:86:4: error: use of undeclared identifier 'Dl_info'
Dl_info info;
^
...
./src/util/disk_cache.h:114:30: error: incompatible integer to pointer conversion passing 'int' to parameter of type 'const void *' [-Wint-conversion]
_mesa_sha1_update(ctx, build_id_data(note), build_id_length(note));
^~~~~~~~~~~~~~~~~~~
10 errors generated.
../src/intel/perf/intel_perf.c:91:10: error: call to undeclared function 'major'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
maj = major(sb.st_rdev);
^
../src/intel/perf/intel_perf.c:92:10: error: call to undeclared function 'minor'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
min = minor(sb.st_rdev);
^
2 errors generated.
../src/intel/vulkan/anv_allocator.c:295:13: error: call to undeclared function 'futex_wake'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
futex_wake(&table->state.end, INT_MAX);
^
...
../src/intel/vulkan/anv_allocator.c:711:7: error: call to undeclared function 'futex_wait'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
futex_wait(&pool->block.end, block.end, NULL);
^
6 errors generated.
Cc: mesa-stable
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25963>
I'm still 100% a believer in clang-format enforcement, but given the difficult
constraints upstream CI has, I no longer believe it is productive or fair to do
this enforcement upstream. Instead, it can be (and effectively is) enforced much
more inexpensively in the Asahi tree. It is far better for me to insert a
"reformat asahi" commit once in a while when I rebase the Asahi tree, than to
shoot down an unrelated upstream MR because someone forgot to ninja
clang-format.
I regret adding the clang-format lint to CI. To those who have lost merges over
it, I am sorry. I'm learning from my mistakes and trying to do better.
I would encourage other drivers in the clang-format include to follow suit, but
doing this effectively requires a driver/hardware tree to do the enforcement. (I
would also encourage that, as it is much friendlier to upstream CI, but that's a
different discussion.)
For now, let's opt out asahi at the least.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26030>
bindgen 0.69.0 broke the `--version` switch, resulting in misleading errors about requiring at
least bindgen 0.62 or about unexpected arguments.
Ideally the build system would fetch the correct bindgen version automatically like cargo does.
Until then, provide a hopefully more helpful error message to the user.
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26046>
`git describe` tells you that you have N patches on top of tag T, but
not what these patches are, and the commit hash is useless as it is only
valid during the container build.
Replace this with:
- the tag that we are checking out
- the list of patches applied on top of it
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26013>
This reverts commit dcc4e1b4d7.
The hashes added there are incorrect, and fixing them regresses a bunch
of drivers, so let's just revert for now, and the next commit fixes the
bug that allowed incorrect backports to go in undetected.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26013>
"16TB ought to be enough for anybody."
- Probably some Intel graphics hardware engineer
TR-TT addresses are fixed regardless of the platform's gtt_size.
Unconditionally reserve this space for it: our total 48bit address
space is 256tb and TR-TT takes 16tb out of it (1/16th).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
This function will be able to transparently handle sparse binding
regardless of the backend: vm_bind ioctls or TR-TT. For now we only
support the vm_bind ioctls, but soon we'll have anv_sparse_bind_trtt()
as an option.
It is important to notice that even backends that support the vm_bind
ioctl may choose to do Sparse binding via TR-TT, that's why we're
adding the indirection at this specific point.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
If the next bind is just an extension of the previous one, join both
in the same bind operation. Due to how mip levels are laid in memory,
this can only happen for mip level 0.
As of today xe.ko doesn't try to join contiguous operations for us.
Due to how rebinds happen each additional rebind operation may end up
resulting in many extra things done, so these simple checks end up
saving us a lot of cycles the Kernel would otherwise waste. This will
be true even after we issue all binds in a single ioctl.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
Kill vma_cva and just toggle heap->alloc_high instead. This way,
client visible addresses will remain isolated in their own little
corner, except we have one less vma to deal with.
For TR-TT we'll need a special vma, and if we don't use the trick
above we'll need yet another trtt_cva_vma, increasing complexity even
more.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
This actually doesn't fix any bugs or leaks, because according to the
man page:
"In the LinuxThreads implementation, no resources are associated
with mutex objects, thus pthread_mutex_destroy actually does
nothing except checking that the mutex is unlocked.
still, it's better to have it than not to have it, especially since
other implementations may do something.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26036>
* Add support for PIPE_BUFFER in resource_from_handle.
* Flush batches after reallocate_resource_inplace:
If we're dealing with a PIPE_BUFFER, iris_flush_resource doesn't
flush the batch and we export the resource with pending commands.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21305>
For a dGPU, we should have:
1. RAM
2. RAM + write-combined CPU access
3. RAM + cached CPU access
4. VRAM
Eventually there'll be VRAM + write-combined CPU access after 4, using "GPU upload heaps"
For an iGPU, we should have:
1. RAM (declared as device-local)
2. RAM + write-combined CPU access (declared as device-local)
3. RAM + cached CPU access (declared as device-local)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26037>
Vulkan video has parameter overrides, where the driver can override the
driver provided parameters with an encoded bitstream where it has made
changes.
This is the support code to encode the bitstream headers for h265
from parameters (vps/sps/pps).
Acked-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25874>
Vulkan video has parameter overrides, where the driver can override the
driver provided parameters with an encoded bitstream where it has made
changes.
This is the support code to encode the bitstream headers for h264
from parameters (sps/pps).
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25874>
We've supported storageImageReadWithoutFormat for a while. Also, thanks
to the fact that the assert assumed image_deref_load and happened after
nir_rewrite_image_intrinsic, it would never trigger.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26035>
a630 has been completing jobs, and then corrupting the very last line of
UART output - the one where we pass the overall result back from the DUT
to the job. The bare-metal monitor will wait for this line to appear,
never see it, and then the job times out.
Since this line is the most critical one of all to get out, just spam
the prints to try to make sure they get through.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26032>
When using `MESA_LOADER_DRIVER_OVERRIDE=v3d` in real hw, use kmsro to
create the drm screen, which is actually what happens when not exporting
such variable.
This avoids confusions when using the envvar in real hardware and starts
to fail.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26007>
The spec says
A VkRenderPass or VkPipelineLayout object passed as a parameter to
create another object is not further accessed by that object after the
duration of the command it is passed into.
The object could have been destroyed if we get the pointer from a
pipeline library. Since it has no user now, let's remove it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26000>
If application requests to map surface that was most recently used
as decoder or postproc target and also doesn't explicitly set the
map flags (vaMapBuffer2) it's very likely the intent is to read from
this surface, so we need to map it as such.
This fixes regression on radeonsi where mapping NV12 surfaces for
reading would fail with applications using vaDeriveImage. The reason
for this is that the VA frontend doesn't allow vaDeriveImage for
interlaced surfaces so the applications would use vaGetImage fallback,
but radeonsi doesn't allocate NV12 surfaces as interlaced anymore.
This also fixes mapping other formats surfaces (P010, RGBx, ...)
for reading, which never worked before.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9935
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10048
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26008>
Previosuly when lowering to txd for sampler array the index would be
derived as well, therefore the resulting derivative would have been a
vec with one more component than what the txd instruction expects.
This patch truncates the coordinate vector in this case to make sure the
index is not derived.
Fixes: b154a4154b ("nir/lower_tex: rewrite tex/txb -> txd/txl before saturating srcs")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26012>
To get MacOS to build, some extra dependencies need to be added to a couple of build targets.
This mainly shows up when not installing the dependencies in the default prefix locations.
On MacOS, this happens when using a custom build of brew to install the dependencies to 'odd' locations.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26022>
I've repeatedly seen minor pixel changes due to changes that only affect
RA decisions, most recently in !22072. We changed the trace to hopefully
remove a use of texture() in control flow, but it seems there are more,
or the problem is something slightly different like reading
uninitialized values. On the other hand minetest has never actually
caught an issue for me that some other trace hasn't also caught. Just
remove it.
Cc: mesa-stable
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25864>
An "augmented tree" is a tree with extra data attached which flows from
the leaves to the root. An "interval tree" is a datastructure of
(potentially-overlapping) intervals where, in addition to inserting and
removing intervals, we can quickly lookup all the intervals which
overlap a given interval.
After describing red-black trees, CLRS explains how it's possible to
implement an interval tree using an augmented red-black tree where the
nodes are ordered by interval start and each node also stores the
maximum interval end for its entire subtree.
Implement the interval tree extension described by CLRS. Iterating over
all overlapping intervals is actually an exercise, so we have to solve
the exercise. The recursive solution has been re-written to use the
parent pointers to avoid needing a stack, similarly to rb_tree_first()
and rb_node_next().
For now, we only implement unsigned intervals, but the core algorithms
are all abstracted to allow other types. There's still some boilerplate,
but it's the best that can be done in C.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22071>
The changes are pretty straightforward. For vector splitting, we just
ignore those vectors for now. We could potentially handle array derefs
with a constant index (and probably should) but that's left for later.
For now, I'm mostly concerned with correctness of the pass.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22580>
If we're propagating a copy from a cast where the copy copies an entire
array, we end up with something like &((S *)ssa_N)->f[*] in the source
where a wildcard has a cast in its parent chain. If we then try to
propagate the read into a non-wildcard array load, we have to specialize
the wildcard. This breaks because nir_build_deref_follower() doesn't
handle casts. Since we know a priori that, because wildcards are only
generated by copy_deref on arrays, we cannot have a cast with a wildcard
parent so simply chasing the source deref to the first wildcard will
ensure that any casts in the deref are handled properly.
Fixes: ba2bd20f87 ("nir: Rework opt_copy_prop_vars to use deref instructions")
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22580>
To get MacOS to build, some extra dependencies need to be added to a couple of build targets.
This mainly shows up when not installing the dependencies in the default prefix locations.
On MacOS, this happens when using a custom build of brew to install the dependencies to 'odd' locations.
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25992>
Adapt gen_pack.py to generate an OpenCL compatible header, capable of both
packing and unpacking but not printing (due to no known use case and no fprintf
in CL). This is useful as a building block for manipulating descriptors from
shader code, for example in texture lowering or device-generated commands.
To accomplish this, we need to inline in some CL-compatible variants of mesa
util functions (no doubles, etc), avoid FILE * use in the CL path, and use
__constant pointers where applicable for performance. Otherwise, there are
surprisingly few changes required, thanks mainly to CL 2.0 generic pointers.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25498>
The loop is supposed to execute exactly once, but the previous logic
inadvertently executes 0 or 1 times depending on whether dst is NULL (it never
is). Reexpress the loop to execute exactly once, eliminating the unnecessary
branch in this hot path. Noticed when reading the NIR of generated pack code.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25498>
This new entrypoint takes in a SPIR-V blob and generates a header containing
a static inline nir_builder-family function for each function in the SPIR-V
library. The generated function will look for the function in the shader and, if
not found, insert a new nir_function with the appropriate signature -- to be
linked with the library later. Then, it will call the function, with the
appropriate gymnastics to handle return values as necessary.
This makes it super convenient to wrap CL libraries for use in a NIR pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25498>
This has been confusing for some time, as from a xml file with the
suffix v33 (so suggesting just one version) we were generating the
headers for v33, v40, v41 and v71.
So now there is a header for the vc4 driver, and one header for the
v3d/v3dv (so v3d "platform") drivers.
FWIW, this means that now the name of the original xml and the header
files generated doesn't maintain a so similar pattern, but again the
equivalence were not there anyway.
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25851>
Add two unit tests related to the LAVA job definition.
test_generate_lava_job_definition_sanity checks for the most important
fields, deploy actions, namespaces etc.
test_lava_job_definition compares the generated definition with static
skeleton YAML files committed inside tests/data folder.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25912>
Simplify both UART and SSH job definitions module to share common
building blocks themselves.
- generate_lava_yaml_payload is now a LAVAJobDefinition method, so
dropped the Strategy pattern between both modules
- if SSH is supported and UART is not enforced, default to SSH
- when SSH is enabled, wrap the last deploy action to run the SSH server
and rewrite the test actions, which should not change due to the boot
method
- create a constants module to load environment variables
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25912>
Break it to smaller pieces with variable size (fastboot has 3 deploy
actions and uboot only one) to build the base definition nicely in the
end.
Extract kernel/dtb attachment and init_stage1 extraction into functions
to be later reused by SSH job definition.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25912>
Fixes performance regression introduced by prior refactoring of
pipe control code that unnecessarily added CS_FLUSH to query start
and end. Issue was diagnosed by Ben L (thank you!)
Confirmed this restores performance on:
* Borderlands3 +2%
* Payday +3%
* Factorio +3%
* HogwartsLegacy +4%
* Ghostrunner +7%
Fixes: 6dc95685 (convert genX_query pipe controls to use pc helper)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25983>
A somewhat random collection of fossils:
N Min Max Median Avg Stddev
x 6 16.59 16.61 16.605 16.603333 0.0081649658
+ 6 15.99 16 16 15.998333 0.0040824829
Difference at 95.0% confidence
-0.605 +/- 0.00830327
-3.64385% +/- 0.0485573%
(Student's t, pooled s = 0.00645497)
I'm not sure if nir_opt_if and nir_opt_loop_unroll are actually idempotent
or not.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24197>
For example, in the loop:
while (more_late_algebraic) {
more_late_algebraic = false;
NIR_PASS(more_late_algebraic, nir, nir_opt_algebraic_late);
NIR_PASS(_, nir, nir_opt_constant_folding);
NIR_PASS(_, nir, nir_copy_prop);
NIR_PASS(_, nir, nir_opt_dce);
NIR_PASS(_, nir, nir_opt_cse);
}
if nir_opt_algebraic_late makes no progress, later passes might be
skippable depending on which ones made progress in the previous iteration.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24197>
When the pitch or slice pitch isn't properly aligned,
the SDMA HW is unable to copy between tiled images and buffers.
To work around this, we process the image chunk by chunk,
copying the data to a temporary buffer which uses supported
pitches, and then copy it to the intended destination.
The implementation assumes that at least one pixel row of the
image will fit into the temporary buffer, and will try to copy
as many rows at once as possible. Sadly, this still results in
a lot of packets being generated for large images.
A possibe future improvement is to copy the image slice by slice
when only the slice pitch is misaligned. However, that is out
of scope for this commit.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25831>
Some copy operations are poorly supported by the SDMA hardware,
meaning that the built-in packets don't support them, so we will
need to work around that by copying to and from a temporary BO.
The size of the temporary buffer was chosen so that it can fit
at least one full pixel row of the largest possible image.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25831>
Previously, RADV only had a simple implementation of
image to buffer copies using the SDMA for the PRIME copy.
This commit replaces that with a full-featured implementation
that includes buffer to image and image to buffer copies and
removes the assumptions that the PRIME copy had, as well as
adds new helper functions which will be shared with other copy
functions in upcoming commits.
Unaligned buffer/image copies require a workaround, which
will be implemented by a future commit.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25831>
This register seems needed to enable compute shader shader invocations
on GFX7. On GFX8+ it's working fine without emitting this register but
I think it doesn't hurt.
This fixes dEQP-VK.query_pool.statistics_query.*_cq on GFX7.
Fixes: a9945216ba ("radv: fix COMPUTE_SHADER_INVOCATIONS query on compute queue")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25957>
The following sequence is valid (although weird) but many other drivers
(including RADV) were broken:
- bind pipeline with some static state
- set state command for that static state (to a bad value)
- bind the same pipeline again
- draw
Fixes new dEQP-VK.dynamic_state.*.double_static_bind.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25954>
Another case of a game assuming XeSS is available since an
Intel ARC GPU is discovered by the game's executable binary.
With this, a warning will appear that GPU is unstable/not supported,
but a warning is preferable over the game crashing.
No other issues observed upon starting & playing the game.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25965>
Removes the link-time dependency on tgsi_get_gl_varying_semantic from
Gallium auxiliary.
ps_prim_id_input linkage removed due to redundancy - the SPI SID is
calculated for VARYING_SLOT_PRIMITIVE_ID on both sides.
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25695>
Hopefully this will get us more useful backtraces in CI (for ex, with
traces replay) while maintaining _most_ of the artifact size benefits of
stripping:
-rwxr-xr-x 1 robclark robclark 50M Oct 30 11:47 msm_dri.so.strip-debug
-rwxr-xr-x 1 robclark robclark 40M Oct 30 11:47 msm_dri.so.strip
-rwxr-xr-x 1 robclark robclark 129M Oct 30 11:47 msm_dri.so.orig
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25962>
So far we were packing by hand unnormalized coordinates at the
V3D41_TMU_CONFIG_PARAMETER_1 pack structure. To get this working we
hardcoded V3D_VERSION to 41 at v3dv_uniforms, that works for v71
because the structure are the same. But that is somewhat ugly, and
will not work if a new hw generation have a different structure.
Additionally, we found that for v3d this will be also needed.
So this commit adds a helper on the compiler. For now, and to simplify
it also use just one method for both generations. This solves the
problem of the same code needed on both v3d and v3dv.
But the idea is that in the future we need a similar need, but the
structure different on each generation, it would have used a similar
approach to other generation dependent function calls (like
v3d40_vir_emit_tex), having the implementation on a source file that
can safely include the hw generation headers.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25544>
Sandybridge uses this code and needs MRFs, but all other platforms send
from GRFs. Do that directly rather than relying on the MRF hack.
Ivybridge and later also use SHADER_OPCODE_SEND directly rather than
a virtual opcode that's handled in the generator, so we follow suit.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
We never set key->clamp_fragment_color when compiling the BLORP fast
clear shaders. Besides, we were setting saturate on an FB write opcode,
which...isn't even a thing. We would need it on the MOV, and weren't
setting it there. So it wouldn't have even worked.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
This is basically just once through the loop but copy and pasted. One
difference is that the single render target case used a headerless
message, and the multiple render target case always used headers.
Now we use headerless messages for the first render target, even in the
multiple render target case. While we already have it set up for the
other RTs, it's still 2 fewer registers to send. Minor improvement.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
A long time ago, we used a uniform for the clear color. Back in 2014,
we added support for using a flat input instead, as this was easier for
Vulkan, but we left the option of using a uniform for OpenGL.
Eventually nobody used the uniform approach anymore, but the compiler
code to handle it remained. Drop the dead code.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
This code is compiled out, but has been left in place in case we wanted
to use it for debugging something. In the olden days, we'd use it for
platform enabling. I can't think of the last time we did that, though.
I also used to use it for debugging. If something was misrendering, I'd
iterate through shaders 0..N, replacing them with "draw hot pink" until
whatever shader was drawing the bad stuff was brightly illuminated.
Once it was identified, I'd start investigating that shader.
These days, we have frameretrace and renderdoc which are like, actual
tools that let you highlight draws and replace shaders. So we don't
need to resort iterative driver hacks anymore. Again, I can't think of
the last time I actually did that.
So, this code is basically just dead. And it's using legacy MRF paths,
which we could update...or we could just delete it.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20172>
All commands that make queries available have feedback cmds batched
and stored during recording. At submission time, for each batch
(SubmitInfo) these feedback cmds are recorded in a cmd buffer that is
appended after the last original cmd buffer (but before
semaphore/fence feedback).
Query reset cmds are deferred as well and also remove any prior feedback
cmds for queries its resetting within the batch.
Cc: 23.3 <mesa-stable>
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25413>
Link the query feedback cmd lifecycle to a cmd in the batch so that when
that last cmd gets reset/freed, we assert its safe to reset the query
feedback cmd. The cmd is then placed on the free list for reuse.
Some edge cases if the the last cmd is simultaneous or gets resubmitted.
Cc: 23.3 <mesa-stable>
Signed-off-by: Juston Li <justonli@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25413>
These are not real rules, they are just strings that have an anchor that
can be referenced elsewhere in this file.
Having these fake bits in here is confusing, as revealed by the
reactions from the first version of this commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25920>
Only RDNA1-2 are affected because RADV needs to handle the legacy vs
NGG path for this query, and the NGG results are stored with 2 extra
64-bit values.
Fixes flakes with
dEQP-VK.transform_feedback.primitives_generated_query.* since VKCTS
1.3.7.0.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25862>
ldunifa works exactly the same as ldunif: the hw will prefetch the
next 4 bytes after a read, so if a buffer is exactly a multiple of
a page size and a shader uses ldunifa to read exactly the last 4 bytes
the prefetch will read out of bounds and spam the error on the kernel
log. Avoid that by allocating extra bytes in this scenario.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25752>
This allows the hardware to behave as if TBIMR was disabled until a
polygon is processed which spans at least one tile. This is a rather
heavy-handed heuristic meant to prevent regressions in heavily
geometry-bound workloads that render large numbers of tiny primitives
much smaller than a TBIMR tile.
A particularly bad example of this was observed in SoTR, where certain
draw calls with a long-running VS and a mostly trivial PS render more
triangles than pixels, filling up the URB and TBIMR batch pretty
quickly, which causes EU utilization to tank (since once the URB has
filled up the parallelism of the VS is limited by the number of
polygons that fit in a TBIMR batch at the completion of each tile
walk, which isn't a lot in relation to the total EU count of a DG2),
and causes the bottleneck to be the rate at which the tile sequencer
performs additional tile passes, each one processing a small number
(<1024 polygons) of the hundreds of thousands of triangles of the
draw call.
Enabling this heuristic seems effective at avoiding that scenario in
SoTR among other titles (e.g. Total War Warhammer 3), but it's a bit
of a compromise since one could imagine cases where TBIMR is helpful
even if the geometry doesn't pass the box check, so a better heuristic
or a driconf rule may be useful in the future.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This programs a TBIMR batch size equal to 128 polygons per slice in
order to match the hardware spec recommendation (BSpec 68436). This
has been confirmed to improve performance slightly relative to the
hardware default batch size of 256 polygons.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This enables a couple of TBIMR performance tunables in
CHICKEN_RASTER_2 that default to disabled. TBIMR fast clip appears to
help slightly with some geometry-bound workloads. TBIMR open batch
allows the rasterizer to start working immediately on the first tile
of the framebuffer, even before the batch has been closed, which helps
reduce the latency cost of the tile walk.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This sets up the basic parameters needed for tiled rendering based on
a back-of-the-envelope estimate of the amount of memory used by the
pixel pipeline during the tile pass. The actual cache footprint of a
tile can vary wildly based on runtime factors which aren't easily
predictable based on static analysis, so this is only intended to
provide a rough approximation within the right order of magnitude.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This sets up the basic parameters needed for tiled rendering based on
a back-of-the-envelope estimate of the amount of memory used by the
pixel pipeline during the tile pass. The actual cache footprint of a
tile can vary wildly based on runtime factors which aren't easily
predictable based on static analysis, so this is only intended to
provide a rough approximation within the right order of magnitude.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This may help debugging performance problems in the possible case that
TBIMR negatively impacts the performance of some application. It could
also allow applying application-specific band-aid fixes in the XML file
until a more general workaround is implemented.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
This implements a minimalistic algorithm that can be used to obtain an
approximate solution for the integer programming problem of finding
the optimal tile dimensions based on an estimate of the tile cache
consumption per pixel of the current graphics pipeline -- Including
the TC footprint of render targets, depth and stencil buffers and
their auxiliary surfaces. Considering other (less local) memory
accesses performed by the pipeline (like texturing and shader storage)
would be useful (and could be considered by this algorithm with little
modification), but it would be pretty difficult to estimate the L3
cache consumption per pixel of such accesses based on static analysis
of the pipeline state alone without some sort of dynamic feedback.
The present algorithm returns a config with tile area large enough to
utilize a target fraction of the L3, which can be adjusted to obtain
greater/lower utilization of the L3 at the cost of higher/lower risk
of L3 cache thrashing respectively. The aspect ratio of the tile
layout returned attempts to minimize the number of poorly utilized
tiles around the boundaries of the framebuffer (due to partial
coverage), since having the tile sequencer process additional tiles
comes at a cost due to the latency of the additional passes, even if
they're mostly empty. Finally, among the solutions with satisfactory
cache footprint and tile count, the tile aspect ratio closest to 1 is
returned where possible, since tiles with very high aspect ratios can
have a negative impact on cache locality.
The algorithm is primarily intended for TBIMR, but it could be used
for PTBR as well with little modifications, since the TBIMR-specific
assumptions are few and noted in comments below.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
L3 configurations with an ALL partition of 128 ways per bank or more
cannot be represented with the normal L3ALLOC partitioning mechanism
since the "All L3 client pool" field would overflow, instead the
L3FullWayAllocationEnable bit has to be set, which causes the whole L3
to be used in a unified cache configuration.
That's precisely the configuration we're currently using on recent
platforms, but previously we were relying on the L3 config tables
being empty and the selected L3 configuration being a NULL pointer to
detect this condition. This is about change, the L3 configuration
structure will be defined for gfx12.5+ platforms since they provide
useful information about the cache hierarchy to the drivers. Instead
of checking whether the pointer is NULL in order to apply a unified L3
cache configuration, use it when there is a single ALL partition
larger than can be represented via L3ALLOC.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
as has recently been exposed by ci, there are some cases where running
lots of tests simultaneously can temporarily result in depleted vram,
which torpedos everything
as this scenario is transient (vram will very soon become available again),
it makes more sense to add some retries at fixed intervals to try soldiering
onward instead of exploding and probably blocking a merge
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25938>
This resolves a memory leak when the application drops its last reference
to the queue, but never waits explicitly.
The problem was, that the queue was refed by QueueState::last and that ref
only gets dropped on a blocking wait. This is problematic as non user
Event objects also hold a ref on the Queue they are created on, therefore
causing a cyclic ref relation.
In order to resolve it, just use a weak reference. A failure of upgrading
the Weak ref is not an issue as in this case we'd only wait on an already
destroyed or processed event. The worker thread already makes sure
everything stays in sync.
Fixes: 5b3ff7e3f3 ("rusticl/queue: overhaul of the queue+event handling")
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: @LingMan <18294-LingMan@users.noreply.gitlab.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25926>
This is the original definition for ANYOPMASK, until we found that
clang raised a warning using the option shift-count-overflow, so we
used an alternative.
That warning was fixed with commit 6e2bb716b0, so let's restore
previous version.
Note that other good thing of that warning being fixed is that now we
can use without warning OP_RANGE with bit 63 (in the case that any
broadcom opcode used that bit)
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25776>
Without this, we replace vote_ieq(b) with vote_ieq(u2u32(b)) which is
wonky because we're doing a u2u on a 1-bit type. With this, we now
replace it with vote_ieq(b2b32(b)). For other subgroup ops, we replace
things like *scan[op](b) with *scan[op](b2b32(b)). For scan ops, this
assumes that b2b1(op(b1b32(x), b2b32(y))) = op(x, y) for all of the ops
iand, ior, and ixor. This is true on all the back-ends I'm aware of.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25894>
Only align resource dimensions on creation, not when importing existing D3D resource object.
Otherwise importing the resource fails since the resource descriptor does not match the aligned
dimensions passed in the template.
Fixes: 62fded5e4f ("d3d12: Allocate d3d12_video_buffer with higher alignment for compatibility")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25913>
glsl_type_is_vector_or_scalar would more accruately be called "can be an
r-value that isn't an array, structure, or matrix. This optimization
pass really shouldn't do anything to cooperative matrices. These
matrices will eventually be lowered to something else (dependent on the
backend), and that thing may (or may not) be handled by this or another
pass.
Fixes: 2d0f4f2c17 ("compiler/types: Add support for Cooperative Matrix types")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25871>
Doing something like this is allowed :
vkCreateGraphicsPipeline(.., scissorState, &pipeline);
vkCmdBindPipeline(pipeline);
vkCmdSetScissor(...)
vkCmdBindPipeline(pipeline)
If we don't reapply the pipeline dynamic state, the command buffer
runtime state will keep the dynamic state set in between the 2 binds.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25915>
This does two things:
1. It increases the contrast of the signatures
2. It ensures that there's some spacing when there's two signature
elements back-to-back (which happens when documenting structs, for
instance), making it easier to tell things apart.
Reviewed-by: Jani Nikula <jani@nikula.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507>
Use the hawkmoth c:auto* directives to incorporate isl documentation.
Convert @param style parameter descriptions to rst info field lists.
Add static stubs for generated headers. Fix a lot of references, in
particular the symbols are now in the Sphinx C domain, not C++
domain. Tweak syntax here and there.
Based on the earlier work by Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507>
Use the hawkmoth c:auto* directives to incorporate nir documentation.
Convert @param style parameter descriptions to rst info field lists.
Add static stubs for generated headers. Fix a lot of references, in
particular the symbols are now in the Sphinx C domain, not C++
domain. Tweak syntax here and there.
Based on the earlier work by Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507>
Use the hawkmoth c:auto* directives to incorporate vulkan documentation.
Convert @param style parameter descriptions to rst info field lists.
Add static stubs for generated headers. Fix a lot of references, in
particular the symbols are now in the Sphinx C domain, not C++
domain. Tweak syntax here and there.
Based on the earlier work by Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507>
Hawkmoth is a Sphinx-extension that uses Clang to directly parse C code
for automatic documentation of C/C++ code, similar to what Doxygen does.
However, Doxygen is rather clunky to integrate into the build process,
so let's start switching over to Hawkmoth instead.
As Sphinx does not have syntax for describing parameter direction, add
an rst_prolog with rst replacements for them.
Reviewed-by: Jani Nikula <jani@nikula.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507>
In order to document "typedef struct { ... } T;" as a struct in
hawkmoth, the structs need to have names. Similar for enums.
Note: This is no longer required with Hawkmoth 0.16.0+ and Clang 16 and
later. With the next Hawkmoth release, this should be fixed also for
Clang 15 and earlier.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507>
In function ‘generate_clipmask’,
inlined from ‘draw_llvm_generate’ at ../src/gallium/auxiliary/draw/draw_llvm.c:1975:24:
../src/gallium/auxiliary/draw/draw_llvm.c:1302:25: warning: ‘sum’ may be used uninitialized [-Wmaybe-uninitialized]
1302 | sum = lp_build_fmuladd(builder, planes,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1303 | (LLVMValueRef[]){cv_x, cv_y, cv_z, cv_w}[i], sum);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../src/gallium/auxiliary/draw/draw_llvm.c: In function ‘draw_llvm_generate’:
../src/gallium/auxiliary/draw/draw_llvm.c:1149:44: note: ‘sum’ was declared here
1149 | LLVMValueRef plane1, planes, plane_ptr, sum;
| ^~~
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25884>
Tessellation shader which are using indirect
addressing for tesslevels e.g
gl_TessLevelOuter[gl_InvocationID] = tessLevelOuter;
are crashing because gl_TessLevelOuter is now a
compact array variable and nir expects a constant
array index into the compact array variable.
This patch handles such cases.
This fixes MR 21940
Fixes: 84006587d7 ("glsl: Delete the lower_tess_level pass.")
Tested with glretrace
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25773>
We can end up in situation where we are dispatched with a multisample
framebuffer but not at per-sample. In this case we would request the
at_sample value with the wrong message configuration.
Relying on the BRW_WM_MSAA_FLAG_MULTISAMPLE_FBO flag superseeds
BRW_WM_MSAA_FLAG_PERSAMPLE_DISPATCH.
Fixes piglit tests :
spec@arb_gpu_shader5@arb_gpu_shader5-interpolateatsample*
With Zink on Anv
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 68027bd38e ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25854>
# Bring artifacts back from the NFS dir to the build dir where gitlab-runner
# will look for them.
cp -Rp /nfs/results/. results/
if[ -f "${STRUCTURED_LOG_FILE}"];then
cp -p ${STRUCTURED_LOG_FILE} results/
echo"Structured log file is available at https://${CI_PROJECT_ROOT_NAMESPACE}.pages.freedesktop.org/-/${CI_PROJECT_NAME}/-/jobs/${CI_JOB_ID}/artifacts/results/${STRUCTURED_LOG_FILE}"
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.