Iris in general doesn't really like checking the return value of its
allocations, but in some places it does assert that those pointers are
non-NULL. We've recently investigated a bug that could have been
coming from a failed bo->deps realloc(), so add the assert() here to
help give us more confidence over things the next time we're debugging
issues.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit 7c538b5ad8)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25404>
Keep most cases using the stack as it's cheaper, but fall back to the
heap when the size gets too big.
This should fix a stack overflow reported by @rhezashan for a case
where we had lots of iris_screens.
Credits to Matt Turner and José Roberto de Souza for their work on
this issue, which led us to find its root cause.
Cc: mesa-stable
Reported-by: rheza shandikri (@rhezashan in gitlab)
Credits-to: José Roberto de Souza <jose.souza@intel.com>
Credits-to: Matt Turner <mattst88@gmail.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit 3cec15dd14)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25404>
This is the only place that touches bo->deps but does not explicitly
lock it and is not a setup/teardown function where locking won't help
anything.
I'm confident we won't hit this assertion, but I've recently had this
lock as the suspect of a bug and had to check the callers to see if we
could be calling from any unlocked place. Having the assert helps
increasing our confidence.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
(cherry picked from commit 762b9aad01)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25404>
This is a modified version of a commit originally in !7698. This version
add the changes to brw_fs_copy_propagation. If the address passed to
fs_visitor::swizzle_nir_scratch_addr is a constant, that function will
generate SHL with two constant sources.
DG2 uses a different path to generate those addresses, so the constant
folding can't occur there yet. That will be addressed in the next
commit.
What follows is the commit change history from that older MR.
v2: Previously this commit was after `intel/fs: Combine constants for
integer instructions too`. However, this commit can create invalid
instructions that are only cleaned up by `intel/fs: Combine constants
for integer instructions too`. That would potentially affect the
shader-db results of each commit, but I did not collect new data for
the reordering.
v3: Fix masking for W/UW and for Q/UQ types. Add an assertion for
!saturate. Both suggested by Ken. Also add an assertion that B/UB types
don't matically come back.
v4: Fix sources count. See also ed3c2f73db ("intel/fs: fixup sources
number from opt_algebraic").
v5: Fix typo in comment added in v3. Noticed by Marcin. Fix a typo in a
comment added when pulling this commit out of !7698. Noticed by Ken.
shader-db results:
DG2
No changes.
Tiger Lake, Ice Lake, and Skylake had similar results (Ice Lake shown)
total instructions in shared programs: 20655696 -> 20651648 (-0.02%)
instructions in affected programs: 23125 -> 19077 (-17.50%)
helped: 7 / HURT: 0
total cycles in shared programs: 858436639 -> 858407749 (<.01%)
cycles in affected programs: 8990532 -> 8961642 (-0.32%)
helped: 7 / HURT: 0
Broadwell and Haswell had similar results. (Broadwell shown)
total instructions in shared programs: 18500780 -> 18496630 (-0.02%)
instructions in affected programs: 24715 -> 20565 (-16.79%)
helped: 7 / HURT: 0
total cycles in shared programs: 946100660 -> 946087688 (<.01%)
cycles in affected programs: 5838252 -> 5825280 (-0.22%)
helped: 7 / HURT: 0
total spills in shared programs: 17588 -> 17572 (-0.09%)
spills in affected programs: 1206 -> 1190 (-1.33%)
helped: 2 / HURT: 0
total fills in shared programs: 25192 -> 25156 (-0.14%)
fills in affected programs: 156 -> 120 (-23.08%)
helped: 2 / HURT: 0
No shader-db changes on any older Intel platforms.
fossil-db results:
DG2
Totals:
Instrs: 197780415 -> 197780372 (-0.00%); split: -0.00%, +0.00%
Cycles: 14066412266 -> 14066410782 (-0.00%); split: -0.00%, +0.00%
Totals from 16 (0.00% of 668055) affected shaders:
Instrs: 16420 -> 16377 (-0.26%); split: -0.43%, +0.17%
Cycles: 220133 -> 218649 (-0.67%); split: -0.69%, +0.01%
Tiger Lake, Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 153425977 -> 153423678 (-0.00%)
Cycles: 14747928947 -> 14747929547 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 8535968 -> 8535976 (+0.00%)
Send messages: 7697606 -> 7697607 (+0.00%)
Scratch Memory Size: 4380672 -> 4381696 (+0.02%)
Totals from 6 (0.00% of 662749) affected shaders:
Instrs: 13893 -> 11594 (-16.55%)
Cycles: 5386074 -> 5386674 (+0.01%); split: -0.42%, +0.43%
Subgroup size: 80 -> 88 (+10.00%)
Send messages: 675 -> 676 (+0.15%)
Scratch Memory Size: 91136 -> 92160 (+1.12%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 61c786bad5)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25377>
opt_copy_propagation can create invalid instructions like
shl(8) vgrf96:UD, 2d, 8u
These instructions will be cleaned up by opt_algebraic. The irony is
opt_algebraic converts these to simple mov instructions that
opt_copy_propagation should clean up. I don't think we want a loop like
do {
progress = false;
if (OPT(opt_copy_propagation)) {
OPT(opt_algebraic);
OPT(dead_code_eliminate);
}
} while (progress);
But maybe we do?
Maybe this would be sufficient:
while (OPT(opt_copy_propagation))
OPT(opt_algebraic);
OPT(dead_code_eliminate);
No shader-db or fossil-db changes (yet) on any Intel platform. This is
expected.
v2: Do opt_algebraic immediately after every call to
opt_copy_propagation instead of being clever. Suggested by Lionel.
Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
(cherry picked from commit 56e6186dcf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25377>
This reverts commit 6b494745be.
The logic is not entirely correct: the comparison is between two
static-analysis estimates of a dynamic system with variables that aren't
captured by the shader source, so using ">" will always have greater potential
to cause regressions whenever the performance difference between the two builds
is something not captured by the static model, no matter how much the model is
improved.
Reference: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9262
(cherry picked from commit d142c845d0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25377>
Move setting the default values from getEncParamPresetH264/5 as this
function is called on each frame which would result in overwriting
values set by application.
This fixes setting HRD parameters and max_qp/min_qp when
PIPE_VIDEO_CAP_ENC_QUALITY_LEVEL is not supported.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25355>
(cherry picked from commit dd2ef9a0e4)
The encoder is created in handleVAEncSequenceParameterBufferType and it
also sets some default parameters, so we need to make sure to handle
this buffer first because application may have already set those
parameters from earlier buffers.
This fixes setting HRD parameters with gstreamer vah264enc/vah265enc
when PIPE_VIDEO_CAP_ENC_QUALITY_LEVEL is supported.
Cc: mesa-stable
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25355>
(cherry picked from commit c970a9b663)
The index at which the dynamic descriptor offsets were being
stored was incorrect, leading to some offsets not being stored and
thus `0` being applied as the offset to the descriptors instead.
dEQP test fixed:
dEQP-VK.binding_model.shader_access.{primary,secondary}_cmd_buf
.uniform_buffer_dynamic
.{vertex,fragment,compute,vertex_fragment}
.multiple_discontiguous_descriptor_sets
.*_descriptor.offset_view_{,non}zero_dynamic_nonzero
Fixes: aa791961a8 ("pvr: Add support for dynamic buffers descriptors")
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25320>
(cherry picked from commit feafb8a256)
The `i` iteration variable was being used instead of `k` when
appending new mappings, and getting the current source rect.
dEQP test fixed:
dEQP-VK.pipeline.monolithic.image.suballocation.sampling_type
.combined.view_type.3d.format.*.count_1.
.{3x3x3,5x5x5,...(odd dimensions)}
Fixes: 060c3db4ef ("pvr: Complete pvr_generate_custom_mapping()")
Reported-by: James Glanville <james.glanville@imgtec.com>
Signed-off-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25321>
(cherry picked from commit bf17e4fe33)
wayland surfaces are likely to become unlinked in WSI implementations upon
retiring a swapchain, requiring the pending present to complete
in order to avoid invalid access
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25288>
(cherry picked from commit aaabb5b0f2)
I don't see why a p_logical_end is expected or required. It might not be
present in some situations, which causes an assertion failure:
s2: %19646:s[0-1] = p_reload %19701:v[8], 11
s2: %0:exec, s1: %8817:scc = s_andn2_b64 %19646:s[0-1], %0:exec
s2: %8818:s[20-21] = p_cbranch_z %0:exec BB1116, BB1114
No fossil-db changes (gfx1100).
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25244>
(cherry picked from commit 6dd751b3b9)
This is when the copies actually happen, not at the branch.
fossil-db (gfx1100):
Totals from 1 (0.00% of 79332) affected shaders:
Instrs: 424 -> 423 (-0.24%)
CodeSize: 2172 -> 2168 (-0.18%)
Latency: 2899 -> 2896 (-0.10%)
Copies: 24 -> 23 (-4.17%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25244>
(cherry picked from commit 8d5bd3ca48)
The AMD Vulkan driver uses LLVM by default, but it is possible to build
the driver without the LLVM dependency. In this case, we must explicitly
disable LLVM support, or else meson will die after failing to find LLVM.
The Android build system already knows when to link libLLVM, so forward
that information to meson.
Cc: mesa-stable
Acked-by: Mauro Rossi <issor.oruam@gmail.com>
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Change-Id: I7489d3811625b390aaaf2e84e666b4a8d98328b0
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24007>
(cherry picked from commit ec32619cb0)
Making the sub_cs not writeable switches the BO we're emitting to, which
causes an assert failure in tu_cs_end_sub_stream() because it looks like
we have a mismatched start/end. Just make it not writeable later.
Fixes: 97da0a7734 ("tu: Rewrite to use common Vulkan dynamic state")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25225>
(cherry picked from commit 64ed357699)
problem: max_poc means the number of bits used in poc lsb
in slice header, and it should not be related to GOP
size. When large GOP size used, it could generate
corrupted video, as the POC could not be correctly
decoded.
solution: use fixed value of max_poc (16) for now.
Cc: mesa-stable
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25214>
(cherry picked from commit fb0f51bc64)
in a stream like:
* set fb state (A)
* flush
* set fb state (B)
* draw -> driver query
* flush
the "driver query" should return the tc info corresponding to the most
recent fb state (B). previously this would increment to C because
the flag for incrementing at the start of a batch was set
Fixes: 07017aa137 ("util/tc: implement renderpass tracking")
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25206>
(cherry picked from commit 9399165bd4)
If we observe IDLE before COMPLETE, another queued image may have
presented with present ID 0 which would break the check fixed in this
commit. The original fix for present_id / signal_present_id split
forgot to take this into account.
Fixes: 32f7ff2c20 ("Fix present ID signal when IDLE comes before
COMPLETE").
Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25178>
(cherry picked from commit 08fee190aa)
For some specific texture sizes, notably some texture sizes with width
4096, block stride calculation could end up calculating stride 256 which
is an invalid value.
In those specific cases, this could cause rendering artifacts or
application/driver crashes.
Cc: mesa-stable
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25084>
(cherry picked from commit cb1c88d41f)
These patterns are broken in the following scenario:
%1 = f2fmp %0
%2 = fddx %1
%3 = ... // non quad uniform
if %3 {
%4 = f2f32 %2
...
}
Which would turn into
%3 = ...
if %3 {
%4 = fddx %0
...
}
Yet another example that shows why derivative instructions should be
be intrinsics, not alu.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25014>
(cherry picked from commit 136a698251)
previously non-separable progs had their libs owned exclusively by
the shaders, which meant it was possible for a background compile job
to crash while the context was being destroyed when accessing libs
which no longer had active shaders
fixes#9234
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25088>
(cherry picked from commit fd297ecf98)
to avoid splitting renderpasses, this subdata optimization handles the usual
driver dance of staging buffer -> gpu copy
if the pbo stride doesn't match the image format's stride, however, then
a direct copy will yield broken pixels and the image will misrender. to avoid this,
detect stride mismatch and translate the single subdata call into a sequence
of non-overlapping subdata calls that the driver can magically figure out
while continuing to not split renderpasses
fixes#9589
cc: mesa-stable
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24849>
(cherry picked from commit 51ad269198)
Copying the state below overwrote the ms.sample_locations we set,
so our new_sample_locations was never actually used and we were
accidentally doing a shallow copy. Turnip passes a stack-allocated
old_state, so this resulted in invalid stack pointers.
Fixes: f497cc9d56 ("vk/graphics_state: Add helpers for pre-baking state")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25031>
(cherry picked from commit 1cef1f02b5)
Otherwise, if a secondary CS is grown and then executed without IB2,
the INDIRECT_BUFFER packet would have been copied but it shouldn't.
This fixes a regression that introduced GPU hangs with
gl_vk_meshlet_cadscene on RDNA2.
Fixes: df0c742543 ("radv/amdgpu: rework growing a CS with the chained IB path slightly")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24891>
(cherry picked from commit e80fddf81f)
If a secondary cmdbuf has been grown and is executed without IB2
(eg. on compute queue or when it's not allowed), the ib size ptr
contains chaining info, which means the IB size was wrong.
This fixes CPU crashes when running gl_vk_meshlet_cadscene.
Fixes: 277b2afd70 ("radv/amdgpu: add support for executing DGC cmdbuf with RADV_DEBUG=noibs")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24891>
(cherry picked from commit 9206aeb077)
Issue: For texture with multiple planes, the planes will point to the
same BO with the total size, so current vcn dt_size is incorrect.
(gdb) p/x *((struct si_resource *)(((struct vl_video_buffer *)out_surf)->resources[0]))
...
buf = 0x5555558daa30,
gpu_address = 0xffff800101000000,
bo_size = 0xa2000,
...
}
(gdb) p/x *((struct si_resource *)(((struct vl_video_buffer *)out_surf)->resources[1]))
...
buf = 0x5555558daa30,
gpu_address = 0xffff800101000000,
bo_size = 0xa2000,
...
}
This is because: in function static struct si_texture *si_texture_create_object(),
if (plane0) {
/* The buffer is shared with the first plane. */
resource->bo_size = plane0->buffer.bo_size;
...
radeon_bo_reference(sscreen->ws, &resource->buf, plane0->buffer.buf);
resource->gpu_address = plane0->buffer.gpu_address;
}
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9728
Cc: mesa-stable
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25013>
(cherry picked from commit 7876a2f685)
Huge thanks to Tapani Pälli for debugging this issue, figuring out
what was going wrong, proposing fixes, and walking me through where
things were going off the rails.
BLORP always disables tessellation and geometry shaders. Our handling
tried to look at ice->shaders.uncompiled[] to determine whether the next
draw needed those shaders. If not, we can leave BLORP's residual state
that disabled those stages in place, and skip looking at it.
Unfortunately, predicting the future is a bit fraught, in part due to
the uncompiled[] and prog[] arrays being slightly out of sync at times.
Consider the following case:
1. Draw with tessellation shaders in place
=> uncompiled[TES] and prog[TES] will both point at valid shaders.
2. Gallium calls pipe->bind_tes_state(NULL).
=> This makes uncompiled[TES] point at NULL, and flags
IRIS_STAGE_DIRTY_UNCOMPILED_TES.
Because iris_update_compiled_shaders() hasn't happened yet,
uncompiled[TES] is NULL but prog[TES] has the stale TES from
the previous draw still.
3. BLORP operations happen
=> BLORP sees uncompiled[TES] == NULL and decides that tessellation
is off for the upcoming draws. So it skips flagging tess state.
4. Gallium calls pipe->bind_tes_state(shader from step #1).
=> uncompiled[TES] points at the original shader.
IRIS_STAGE_DIRTY_UNCOMPILED_TES gets flagged again.
5. Draw again
=> This calls iris_update_compiled_shaders(), which sees that
a TES is bound, and calls iris_update_compiled_tes(). But
because the same shader was bound as before, the program it
comes up with is identical to the one already bound at
ice->shaders.prog[TES]. So, it thinks it doesn't have to
flag any tessellation state dirty because it was already
set up for the last draw.
This random unbind and rebind between draws leads to a situation
where, at step #3, BLORP thinks it can skip flagging tessellation
state (nothing is bound), and at step #5, normal state handling
thinks it can skip flagging tessellation state (nothing changed
since last time). So nobody does, and things break.
This unbind appears to be happening when st_release_variants()
decides it wants to free some shaders. Then a rebind happens to
put back the actual shader for the draw. So, it's not theoretical.
To fix this, we change BLORP to look at ice->shaders.prog[] rather
than uncompiled[]. This is equivalent to thinking about the previous
draw, rather than the next. If the last draw had tessellation off,
then BLORP's disabling was a no-op, and the GPU is still in the same
state as the previous draw. This is more reliable than predicting
the future.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8308
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9678
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24880>
(cherry picked from commit d693027a00)
The msm driver reserves the actual DMABUF size in the memory map, while
TU can request smaller memory chunk to be allocated. This potentially
can lead to a situation when next allocation IOVA will be in the middle
of the address space which is reserved for the DMABUF. Pass the
`real_size' to TU allocator instead, so that kernel and userspace have
the same picture of memory allocations.
Cc: mesa-stable
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24861>
(cherry picked from commit 2fdcc00b01)
With -Dintel-clc=system, the build system will search for an `intel_clc`
binary and use it instead of building `intel_clc` itself.
This allows Intel Vulkan ray tracing support to be built when cross
compiling without terrible hacks (that would otherwise be necessary due
to `intel_clc`'s dependence on SPIRV-LLVM-Translator, libclc, clang, and
LLVM).
(cherry picked from commit 28c1053c07)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25228>
With certain build configuration that value can be a non empty string and
needs to be used.
This will also require distributions to rebuild mesa if and only if
CLANG_RESOURCE_DIR changes between clang rebuilds or updates.
Signed-off-by: Karol Herbst <git@karolherbst.de>
(cherry picked from commit e1c278ae82)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25228>
Make sure that both per-vertex and per-primitive attribute
ring stores are finished before position or primitive export
instructions are executed.
This is necessary because we need to ensure that mesh shader
waves work correctly when they have either vertex-only or
primitive-only waves.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit 93b4f200de)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25157>
Cleanup the code that generates the two channels of the
primitive export instruction, and move storing the built-in
per-primitive outputs out to match how vertex attributes work.
Prepares the mesh shader lowering for a workaround that
affect export instructions.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit 0721784b78)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25157>
This is a HW bug workaround for some (all?) GFX11 chips.
On these chips, rasterization can start before the attribute ring
stores are finished, which can cause issues.
As a workaround, wait for attribute ring stores to finish
before doing the position export.
Mesh shaders will be taken care of in another commit.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
(cherry picked from commit edd51655f0)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25157>
Up until now, the mesh pipeline assumed it would be always linked to the
fragment shader, and so the calculated MUE map would always be
available.
That is not the case for fast linked pipeline libraries, so the URB
setup needs to account for this. We do this by replicating what's done
for non-mesh pipelines, defining the URB based on the FS inputs, and
always assuming they will be laid out in order of varying number, except
that we also account for per-primitive attributes.
Fixes all GPL using tests under dEQP-VK.mesh_shader.ext.smoke.*
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit 4eddeea7bf)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25188>
The compaction introduced in a252123363 ("intel/compiler/mesh: compactify MUE layout")
is not suitable for the case where graphics pipeline libraries are fast
linked, as the fragment shader won't receive the mue_map to know where
to locate its inputs.
For that case, keep doing what we did before and lay things down in the
order varyings are defined, which is also how it works for the non-mesh
case.
Fixes dEQP-VK.fragment_shading_rate.*fast_linked_library*.ms
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit b200e5765c)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25188>
Move mesh URB allocations together with the other stages.
This fixes a hang that started happening with mesh enabled after
419531c5d9 ("intel/blorp: add a new flag to communicate PSS sync need")
Bspec 45352 says:
L3 Space allocation can only be changed when the GPU pipeline is
completely flushed.
It's likely that the PIPE_CONTROL added in that commit was breaking that
assumption and the URB allocation happening afterwards at the end of the
pipeline emission would then hang. And before that, we were probably
just getting lucky.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
(cherry picked from commit bcde58ea86)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25188>
Both per-primitive and per-vertex space is allocated in MUE in 8 dword
chunks and those 8-dword chunks (granularity of
3DSTATE_SBE_MESH.Per[Primitive|Vertex]URBEntryOutputReadLength)
are passed to fragment shaders as inputs (either non-interpolated
for per-primitive and flat vertex attributes or interpolated
for non-flat vertex attributes).
Some attributes have a special meaning and must be placed in separate
8/16-dword slot called Primitive Header or Vertex Header.
Primitive Header contains 4 such attributes (Cull Primitive,
ViewportIndex, RTAIndex, CPS), leaving 4 dwords (the rest of 8-dword
slot) potentially unused.
Vertex Header is similar - it starts with 3 unused dwords, 1 dword for
Point Size (but if we declare that shader doesn't produce Point Size
then we can reuse it), followed by 4 dwords for Position and optionally
8 dwords for clip distances.
This means we have an interesting optimization problem - we can put
some user attributes into holes in Primitive and Vertex Headers, which
may lead to smaller MUE size and potentially more mesh threads running
in parallel, but we have to be careful to use those holes only when
we need it, otherwise we could force HW to pass too much data to
fragment shader.
Example 1:
Let's assume that Primitive Header is enabled and user defined
12 dwords of per-primitive attributes.
Without packing we would consume 8 + ALIGN(12, 8) = 24 dwords of
MUE space and pass ALIGN(12, 8) = 16 dwords to fragment shader.
With packing, we'll consume 4 + 4 + ALIGN(12 - 4, 8) = 16 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(12 - 4, 8) = 16 dwords to
fragment shader.
16/16 is better than 24/16, so packing makes sense.
Example 2:
Now let's assume that Primitive Header is enabled and user defined
16 dwords of per-primitive attributes.
Without packing we would consume 8 + ALIGN(16, 8) = 24 dwords of
MUE space and pass ALIGN(16, 16) = 16 dwords to fragment shader.
With packing, we'll consume 4 + 4 + ALIGN(16 - 4, 8) = 24 dwords of
MUE space and pass ALIGN(4, 8) + ALIGN(16 - 4, 8) = 24 dwords to
fragment shader.
24/24 is worse than 24/16, so packing doesn't make sense.
This change doesn't affect vk_meshlet_cadscene in default configuration,
but it speeds it up by up to 25% with "-extraattributes N", where
N is some small value divisible by 2 (by default N == 1) and we
are bound by URB size.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit c1685f08dd)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25188>
Instead of using 4 dwords for each output slot, use only the amount
of memory actually needed by each variable.
There are some complications from this "obvious" idea:
- flat and non-flat variables can't be merged into the same vec4 slot,
because flat inputs mask has vec4 stride
- multi-slot variables can have different layout:
float[N] requires N 1-dword slots, but
i64vec3 requires 1 fully occupied 4-dword slot followed by 2-dword slot
- some output variables occur both in single-channel/component split
and combined variants
- crossing vec4 boundary requires generating more writes, so avoiding them
if possible is beneficial
This patch fixes some issues with arrays in per-vertex and per-primitive data
(func.mesh.ext.outputs.*.indirect_array.q0 in crucible)
and by reduction in single MUE size it allows spawning more threads at
the same time.
Note: this patch doesn't improve vk_meshlet_cadscene performance because
default layout is already optimal enough.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
(cherry picked from commit a252123363)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25188>
The vulkan spec says all conversions are correctly rounded, so if the input
is larger than the largest fp16 value, we need to return MAX_FLOAT/inf
instead of cutting off the msbs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24826>
(cherry picked from commit 6d949e18fd)
For z surfaces, flags.texture should be based on
RADEON_SURF_TC_COMPATIBLE_HTILE alone. Otherwise, addrlib could pick a
_X/_T swizzle mode for a MSAA depth texture, which is said to be broken:
When _X/_T swizzle mode was used for MSAA depth texture, TC will get zplane
equation from wrong address within memory range a tile covered and use the
garbage data for compressed Z reading which finally leads to corruption.
Fixes: de0885cdb8 ("amd/surface: add RADEON_SURF_NO_TEXTURE flag")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24767>
(cherry picked from commit e74c3dbb70)
It uses a poll function that waits for a second hoping for another thread
to catch up, which is not a reliable way to do synchronization. The test
has been spuriously failing merges on a regular basis recently.
This is issue #9222, which I'm leaving open until the author can fix the test.
Fixes: 3b69b67545 ("util/fossilize_db: add runtime RO foz db loading via FOZ_DBS_DYNAMIC_LIST")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24755>
(cherry picked from commit 4dfd306454)
It's unnecessary because earlier parts of the pass will ensure that a
mov of undef is turned into an undef. It's also wrong because
nir_op_mov has different semantics from nir_op_vecN when it comes to how
sources map to destination components.
Fixes: 5f26c21e62 ("nir: Expand opt_undef to handle undef channels in a store intrinsic")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24704>
(cherry picked from commit 408929289a)
These work in some circumstances (dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.input_output_float_16_to_64.scalar9_tessc),
but I'm not sure if they work in all, blending certainly doesn't work and
this probably wasn't intended in the first place.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes: 01bd012edd ("amd: fix 64-bit integer color image clears")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24400>
(cherry picked from commit 405f3bf990)
Adding GEM handles to the global list is necessary to allow
maintaining a single reference count for handles that are shared
between multiple buffer objects.
Since exported handles can end up being shared with other buffer
objects, as in the case that drmPrimeHandleToFD() and gbm_bo_import()
are called externally to Mesa, they too must be added to the global
list.
Unfortunately, doing this properly requires a new libdrm API. Use
the best possible option for now.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9552
Signed-off-by: Dor Askayo <dor.askayo@gmail.com>
Acked-by: Karol Herbst <git@karolherbst.de>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24648>
(cherry picked from commit daa1f789b5)
Partial revert of e516a0a94f ("egl: disable partial redraw when gallium
hud is active").
We shouldn't change the behavior of the application when the hud is
enabled, doing so could make it harder do diagnose issues. Instead, now
we warn and ask the user to manually disable the extension if he
considers it to be worth it.
Fixes: e516a0a94f ("egl: disable partial redraw when gallium hud is active")
Signed-off-by: Italo Nicola <italonicola@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23456>
(cherry picked from commit 2edf222abd)
The buffer data is not directly accessible to application and it's
internally used to only store VACodedBufferSegment struct.
Ignore the size requested by application and instead allocate
sizeof(VACodedBufferSegment). Use calloc to zero out the struct.
This can save significant amount of memory, for example FFmpeg
will request up to tens of MB for single buffer.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6462
Reviewed-by: Thong Thai <thong.thai@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24410>
(cherry picked from commit 7bcbfae87c)
The presentation timing extension is used for doing WaitForPresent
properly, but we accidentally bind it after an early return intended to
stop us from binding dmabuf when software rendering.
Remove the early return.
cc: mesa-stable
Signed-off-by: Derek Foreman <derek.foreman@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24588>
(cherry picked from commit 5ba5bcf2b6)
vkQueuePresentKHR might return VK_SUBOPTIMAL_KHR which is not VK_SUCCESS
but presentation succeeded anyway. We should capture a trace even if
VK_SUBOPTIMAL_KHR is returned.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24052>
(cherry picked from commit b8edd19358)
as in the producer case, big io needs to reserve the appropriate number
of slots
fixes:
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-float-index-rd-after-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-float-index-wr-before-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec2-index-rd-after-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec2-index-wr-before-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec3-index-rd-after-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec3-index-wr-before-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec4-index-rd-after-barrier,Fail
spec@arb_tessellation_shader@execution@variable-indexing@tcs-output-array-vec4-index-wr-before-barrier,Fail
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24568>
(cherry picked from commit ee6ba2bb57)
in this scenario, sample counting must happen before a2c, as a2c may eliminate
coverage if alpha is zero, leading to a sample count of zero
dEQP-VK.fragment_operations.early_fragment.sample_count_early_fragment_tests_depth_alpha_to_coverage_samples_4_maintenance5
cc: mesa-stable
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24589>
(cherry picked from commit ce09458917)
When replaying a RT pipeline, RADEON_FLAG_REPLAYABLE should be set.
The idea is that for capture, RADEON_FLAG_REPLAYABLE should be passed
when allocating a BO (ie. replay_va would be 0), and then for replay
the VA would be non-zero but the flag is also required.
Fixes
dEQP-VK.ray_tracing_pipeline.pipeline_library.configurations.multithreaded_compilation.*.
Fixes: 744357477e ("radv: Add utilities to serialize and deserialize shader allocation info")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24543>
(cherry picked from commit 1b66ebf09a)
We don't support any ASTC formats yet, but the textureCompressionASTC_LDR
feature was incorrectly set to true. Fix this by setting it to false and
don't advertise ASTC support for texture compression.
Fixes dEQP-VK.api.info.format_properties.compressed_formats
Fixes: 8991e646 ("pvr: Add a Vulkan driver for Imagination Technologies PowerVR Rogue GPUs")
Signed-off-by: Luigi Santivetti <luigi.santivetti@imgtec.com>
Reviewed-by: Karmjit Mahil <Karmjit.Mahil@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24448>
(cherry picked from commit c5a6e88c4e)
this otherwise underflows the array and provides a (probably huge) garbage
value for the binding id, which then causes the driver to massively overallocate
both the layout and set/pool/buffer
the main result of this is that on radv any simple test that should be near-instant
takes 2-3 seconds to execute, which somehow nobody noticed
Fixes: e3b746e3a3 ("zink: use GPL to handle (simple) separate shader objects")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24501>
(cherry picked from commit 652e87bc5d)
When we switch the channels by re-creating vec4 values we have to
take into account that the source values may be used in an ALU op,
and with that we have to take read-port limitations into account.
Fixes: 18a8d148d8
r600/sfn: Cleanup copy-prop into vec4 source values
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24519>
(cherry picked from commit 807c0d6bb7)
This fixes some CACHE_ERROR caused by proper multi-threading support. The
bug is a bit older though, just never triggered because there was only one
push buffer to begin with.
Without this change the compute initialization stayed unpushed in the
screen push buffer causing random issues.
Fixes: ff72440b40 ("nv50: implement a basic compute support")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24496>
(cherry picked from commit a9a30a7e09)
This reverts commit f9860a84b3. It's a
bit annoying having this scattered around but it's 100% a GLSL thing and
there's no reason why it should go in glsl_types.h. The fact that
glsl_print_type() even uses it is a bit sketchy.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24491>
(cherry picked from commit bf6d6a0934)
This reverts commit 1b836a52ea. This
patch, while claiming to decouple things, actually increases coupling
because it leaks two OpenGL state tracker limits and an OpenGL state
tracker fixed binding enum into the entire compiler. Nothing wants to
know these outside the OpenGL state tracker and the GL-specific compiler
passes. Put them back where they were.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24491>
(cherry picked from commit b89a48e00d)
It was not taken into account that without Offset decoration
the output is not written into XFB.
Aside from eliminating more outputs this change prevents gl_PerVertex
builtins generated by glslang from being kept alive in case when XFB
is enabled. Keeping such outputs alive may upset a driver.
VUID-StandaloneSpirv-Offset-04716:
"Only variables or block members in the output interface decorated
with Offset can be captured for transform feedback, and those
variables or block members must also be decorated with XfbBuffer
and XfbStride, or inherit XfbBuffer and XfbStride decorations from
a block containing them"
Additional info about glslang behavior could be found at:
https://github.com/KhronosGroup/glslang/issues/1526
Fixes: e95531e101
("radv: fix gathering XFB info if there is dead outputs")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24318>
(cherry picked from commit 81407797b9)
For TXQ we know make sure that we at least add one source. If the nir
instruction however didn't had any sources, we inserted a fake 0 source
ending up with two 0s for TXQ.
It's unclear to me if we have other ops where this would be necessary.
Fixes: 85a31fa1fc ("nv50/ir/nir: fix txq emission on MS textures")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Acked-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24373>
(cherry picked from commit 8d7f682bdb)
In GL and a lot of Vulkan if we end up with either a lod or an ms index.
Sadly in Vulkan we can end up with both and have to choose properly. For
TXQ we have to emit a zero LOD. For TXF we have to emit the ms index.
Fixes: bb032d8b62 ("nv50/ir/nir: implement nir_instr_type_tex")
Signed-off-by: Karol Herbst <git@karolherbst.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24343>
(cherry picked from commit 85a31fa1fc)
Currently we don't properly support using he two IDX registers in the
same ALU CF, so work around this by enforcing a new CF if both indices
are used.
Fixes: d21054b4bc
r600/sfn: Add pass to split addess and index register loads
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24297>
(cherry picked from commit 1d4dd664e0)
nir_const_value_for_int asserts signed bounds on the input, but we pass in an
unsigned value that would be out-of-bounds for 32-bit channels, causing the
assert to fail for 32-bit channel formats.
Fixes dEQP-VK.pipeline.monolithic.logic_op.r32_uint.* on AGXV (and probably
PanVK).
Fixes: dbd0615e7a ("nir/lower_blend: Avoid useless iand with logic ops")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24252>
(cherry picked from commit 9c0740211d)
Consider the snippet of NIR:
div 32 %447 = @load_reg (%442) (base=0, legacy_fabs=0, legacy_fneg=0)
div 32 %463 = @load_reg (%442) (base=0, legacy_fabs=0, legacy_fneg=0)
con 32 %409 = iadd %17 (0x3), %447
@store_output (%182 (0x601), %463) (base=0, wrmask=x, component=0, src_type=invalid...
@store_reg (%409, %442) (base=0, wrmask=x, legacy_fsat=0)
The load_reg's are trivial, so the %442 read will get folded into store_output.
But under the old definition, the store_reg is also trivial so it gets folded
into the iadd... causing a read-after-write hazard and invalid code generation.
The fix is to amend our definition of store_reg triviality to account for loads
getting folded in. It's not good enough that there's no intervening load_reg,
there can also be no intervening source that gets chased to a load_reg. Handle
that case as well.
Identified in dEQP-VK.geometry.input.basic_primitive.triangles_adjacency on
V3DV.
Fixes: d313eba94e ("nir: Add pass for trivializing register access")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reported-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
(cherry picked from commit 0655bada4b)
In order for a register load to be trivial, it cannot be used in any
block other than the one in which it is loaded. We're not currently
explicitly doing anything to ensure this invariant holds. It may be
that it holds regardless but I couldn't find any documented reason why
it should so let's explicitly handle that case. Worst case, the newly
added code does nothing.
Fixes: d313eba94e ("nir: Add pass for trivializing register access")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
(cherry picked from commit f8b69abbd4)
Because this pass is intended to be run after out-of-SSA and directly
before injesting the NIR into the back-end, it may come after divergence
analysis and needs to preserve the divergence information. Fortunately,
since all we ever do is insert nir_op_mov, this is easy.
Fixes: d313eba94e ("nir: Add pass for trivializing register access")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
(cherry picked from commit f1f05cc7cf)
This commit makes three changes:
1. Default all newly created registers divergent because this is the
safer default.
2. Make divergence analysis do something sane with register divergence.
It's not perfect because divergence analysis isn't able to prove
registers divergent based on stores but at least if someone uses
registers a bit they'll end up with safe defaults. This matches
what they'd get with nir_ssa_def_init().
3. Make the load_reg() helper automatically propagate divergence from
the register. Because the defaults for both nir_ssa_def_init() and
nir_decl_reg() are to mark everything divergent, this only means
that nir_load_reg() of a uniform reg is now uniform.
Putting all these together, nir_from_ssa should now be producing
load_reg intrinsics with the proper uniform information.
Fixes: 7229bffcb1 ("nir: Add intrinsics for register access")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
(cherry picked from commit 4fd257d20f)
ISL's state-machine of CCS_D describes full resolves as leaving the aux
buffer in the pass-through state. Hardware doesn't behave this way on
gfx8 however. On that platform, full resolves transition the aux buffer
to the resolved state. This was verified by dumping the CCS before and
after a full resolve on BDW (gfx7 is simply assumed to behave the same).
Ambiguate after resolving to match driver expectations.
Prevents iris from failing piglit's fcc-write-after-clear on BDW with a
future patch which relies on fast-clear encodings being removed after a
resolve. The avoided failure is:
Testing implicit read of partial block UNORM -> SNORM
Probe color at (0,1,0)
Expected: 1.000000 1.000000 1.000000 1.000000
Observed: 0.000000 0.000000 0.000000 0.000000
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23676>
(cherry picked from commit 1d12b29b3f)
In the case of:
halt
// succs: b9
if %618 {
block b3:// preds:
break
// succs: b6
} else {
block b4: // preds: , succs: b5
}
block b5: // preds: b4
32 %556 = iadd %617, %2 (0x1)
opt_constant_if() doesn't work because stitch_blocks() can't join blocks if the
before ends in a jump and the after isn't empty.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24235>
(cherry picked from commit 21f0aca948)
Same cause as for other R8G8 formats - msaa resolve via
blit event causes gpu fault.
Fixes:
dEQP-VK.api.image_clearing.*.clear_color_attachment.*.r8g8_srgb_*
Fixes: 029919f3c8
("tu: allow using resolve engine for SRGB MSAA resolves")
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24277>
(cherry picked from commit eeb1fd90fc)
Lavapipe has switched to layer push descriptor support atop descriptor
updates internally since 12a7fc51c7, so
it must skip retrieving immutable samplers from the write info even if
the update call itself is blessed by the spec to not hit that case.
Fixes: 12a7fc51c7 ("lavapipe: Rework descriptor handling")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24263>
(cherry picked from commit 8cb7bab341)
Before this change, anv_get_image_format_features2 reported support for
ASTC formats with any modifier (even those not supported by anv). But,
we didn't intend to support that compressed image format with modifiers.
With this change, the format feature function reports no support for
modifiers on ASTC-formatted images.
This prevents the next patch from causing assertion failures due to
unsupported modifiers.
Fixes: 355f318843 ("anv: Allow transfer-only linear ASTC images")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24120>
(cherry picked from commit e50af52e3d)
Even on Valhall, vertex_id is zero-based in a transform feedback program. Lower
that for transform feedback programs properly since it wouldn't happen
automatically on Valhall. Fixes assertion fails.
Fixes: 91ffd10351 ("pan/bi: Lower gl_VertexID in NIR")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24198>
(cherry picked from commit 64ff2b3ed6)
Without SP_FS_CTRL_REG0.LODPIXMASK quad ops don't get values from
helper invocations, but from the current one.
Fixes:
dEQP-VK.glsl.derivate.dfdxsubgroup.*
dEQP-VK.glsl.derivate.dfdysubgroup.*
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24211>
(cherry picked from commit a0d426370d)
That's the "real" name of the field.
It enables ALL helper invocations in a quad, which is necessary for
fine derivatives and quad subgroup ops.
While PIXLODENABLE by itself enables only 3 out 4 fragments in a quad.
Cc: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24211>
(cherry picked from commit 696f37f5c3)
Some Wayland compositors, notably Exo, do not always release buffers
fast enough, and not in sync with their frame callbacks, to guarantee
that a free buffer is available the next time a client calls
`eglSwapBuffers()`.
This currently leads to a crash in `dri2_wl_swrast_get_backbuffer_data()`
with the swrast backend. To avoid this, simply block until the
compositor releases a buffer eventually.
While arguably compositors should release buffers they don't need any
more for the next frame, this can be quite complex depending on
the architecture - notably multi-process/IPC in case of Exo.
cc: mesa-stable
Signed-off-by: Robert Mader <robert.mader@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24091>
(cherry picked from commit 74451ed3f0)
Do not try to determine the shader stage from the compiled shader
variant, which may be NULL after compile failure. Instead, get it
from the NIR shader.
Fixes a segfault when trying to evaluate etna_shader_stage(NULL)
after compile failure.
Suggested-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Fixes: 3d49619071 ("etnaviv: add support for performance warnings")
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24178>
(cherry picked from commit f626605cbf)
From OpenGL ES 3.0 spec, page 56:
"Binding more than one attribute name to the same location
is referred to as aliasing, and is not permitted in OpenGL
ES Shading Language 3.00 vertex shaders. LinkProgram will
fail when this condition exists. However, aliasing is
possible in OpenGL ES Shading Language 1.00 vertex shaders.
This will only work if only one of the aliased attributes
is active in the executable program, or if no path through
the shader consumes more than one attribute of a set of
attributes aliased to the same location. A link error can
occur if the linker determines that every path through the
shader consumes multiple aliased attributes, but implemen-
tations are not required to generate an error in this case."
So here we make sure to allow the optimisations before validation
for earlier ES shader versions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Fixes: 80c001013c ("glsl: do vs attribute validation in NIR linker")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9342
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24205>
(cherry picked from commit c64ad299e4)
It turns out the hardware doesn't save the whole state on a context
switch, as the kernel expects when it creates the golden context.
For some HW units, only the state that was explicitly programmed will be
part of it, so we need to make sure mesh shading is disabled on context
creation, or we risk being context switched with an application that
uses mesh, and when ours gets to run again, the mesh state won't be
reset, and submitting a legacy 3D pipeline while the HW thinks mesh is
enabled causes us to hang.
Cc: 23.2 <mesa-stable>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24150>
(cherry picked from commit 7b0ded0b23)
It turns out the hardware doesn't save the whole state on a context
switch, as the kernel expects when it creates the golden context.
For some HW units, only the state that was explicitly programmed will be
part of it, so we need to make sure mesh shading is disabled on context
creation, or we risk being context switched with an application that
uses mesh, and when ours gets to run again, the mesh state won't be
reset, and submitting a legacy 3D pipeline while the HW thinks mesh is
enabled causes us to hang.
Cc: 23.2 <mesa-stable>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24150>
(cherry picked from commit 50d68f74b5)
It happened because glCallList was restoring varying_vp_inputs, which
caused every glCallList to process the state change again.
This loosely reverts commit 3a294ff01f
"mesa: move the _mesa_set_varying_vp_inputs call to where the state changes".
Fixes: 3a294ff01f - "mesa: move the _mesa_set_varying_vp_inputs call to where the state changes"
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24165>
(cherry picked from commit c97961a855)
nir_opt_mov and nir_op_vecN are only the same if the mov is only a
single component. Otherwise the vec loop will try to access src[c]
where c > 0 which breaks for nir_op_mov. It's uncommon but scalar
back-ends can see vector movs so we need to handle this correctly.
Fixes: 6513c675ad ("nv50/ir/nir: implement nir_alu_instr handling")
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: M Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24167>
(cherry picked from commit 259ba104f7)
This was missed in 0bf6dcb785
There is a loop which iterates over a temp array. NIR optimization
moves the real work out of the loop and what remains are just ALU ops
with undefs. So after converting undefs to zero, the ALU ops are
optimized out and DCE kills the loop. This is a good thing in
general and we don't fail the linking due to the loop presence.
However than we hit the shader constants and ALU limits later :-(
So from dEQP POW we go from NotSupported to Fail.
Fixes: 0bf6dcb785
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24134>
(cherry picked from commit 0a2a7bfd19)
The 1st sync scope of vkCmdCopyQueryPoolResults is not sufficient to
cover transfer writes against query feedback buffer. We must ensure
ordering against prior query reset cmd where the feedback buffer fill
gets injected.
Fixes: de4593faa1 ("venus: add query pool feedback cmds")
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24130>
(cherry picked from commit ed79b30639)
uncollapsed_section_switch debian_cleanup "Cleaning up base Debian system"
apt-get purge -y "${EPHEMERAL[@]}"
. .gitlab-ci/container/container_post_build.sh
section_end debian_cleanup
############### Remove unused packages
. .gitlab-ci/container/strip-rootfs.sh
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.